Click here to


Are you sure ?

Yes, do it No, cancel

Data Shapely Based Auto-Labeling Algorithm

Ti Bai, Brandon Wang, Biling Wang*, Dan Nguyen, Steve Jiang, UT Southwestern Medical Center, Dallas, TX


(Sunday, 7/12/2020)   [Eastern Time (GMT-4)]

Room: AAPM ePoster Library

Purpose: deep neural networks for health care tasks requires a large amount of correctly labeled data. Manually annotation is very expensive in medical domain. In this study we developed an automated method to label data using the data Shapley algorithm.

Methods: classification task, given certain amount of well-labeled dataset, multiple copies of unlabeled dataset (each copy is assigned a temporary trial label) are firstly generated, and then combined with the above labeled dataset as the training dataset. Extra well-labeled dataset is used as the test dataset. One epoch of training from scratch would be performed based the above combined training dataset, during which, the relative test loss change would be recorded for each data point. The above one epoch training would be repeated for 1000 times. Then, the Shapley value of each data point in the unlabeled dataset can be calculated as the averaged relative loss change, where a negative Shapley value means a loss increase, indicating a wrong label. To validate the proposed algorithm, the Osteosarcoma Tumor Identification Dataset was used, where there are 344 images with three different labels. The above datasets are split into 220/30/94, corresponding to the labeled/test/unlabeled datasets. To explore the performance dependence on the size of the labeled dataset, different amount of labeled images are separately used as the labeled dataset. The top-1 accuracy is used to measure the auto-labeling performance.

Results: was shown that a 91% top-1 accuracy can be achieved when the sizes of the labeled/unlabeled datasets were comparable (which means 94 images for both) . The auto-labeling performance can be further improved as we added more labeled images.

Conclusion: auto-labeling algorithm was developed with more than 90% top-1 accuracy by using the data Shapley value.


Not Applicable / None Entered.


Not Applicable / None Entered.

Contact Email