Click here to


Are you sure ?

Yes, do it No, cancel

A Genomic Signature Classifier to Predict Locoregional Failure of Head-And-Neck Cancer

X Pan1,2*, H Yu1 , Y Yuan3 , J Huang1 , L Chen1 , X Qi3 , (1) School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, Shaanxi 710121, PR China(2) Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an University of Posts and Telecommunications, Xi'an, Shaanxi 710121, PR China(3)Department of Radiation Oncology, University of California Los Angeles, CA 90095, USA


(Sunday, 7/14/2019) 4:00 PM - 5:00 PM

Room: Exhibit Hall | Forum 2

Purpose: To develop a genomic signature classifier to predict the head-and-neck cancer(HNC) patients who may relapse after radiotherapy.

Methods: The gene expression data from Gene Expression Omnibus database for a set of 105 HNC patients were studied. Of these patients, fifty-five cases had local recurrence, the remaining cases had no local recurrence. A total of 12727 genes for each patient were collected. A feature selection method called TCGI, combining the advantages of T-test and Gini Impurity, was proposed to minimize the potential impact of irrelevant and redundant genes. T-test was utilized to analyze the differences in degree of the mean values between the two patient cohorts on each features, which could rapidly exclude the genes with weak discrimination ability and obtain the candidate subset of features with small scale. Random forest(RF) was then applied to rank the importance of genes in candidate subset according to Gini Impurity. Redundant genes were removed and core information subset were obtained. To evaluate the validity of selected feature genes, support vector machine(SVM),RF and multi-layer perceptron(MLP) were used to establish the prediction model. The 7-fold cross-validation was utilized to train and validate the model using these evaluation index: Accuracy, Precision, Recall, F1-score and AUC values. To evaluate the effect of this method, Mutual information, T-test and importance ranking of random forest features were adopted for comparison.

Results: The TCGI feature selection achieved a higher classification accuracy than other classifiers. Compared with the RF and SVM, the MLP classifier achieved the optimal value in each evaluation index. Among 12727 genes, a subset of 6 genes were identified that may mostly associated with HNC recurrence.

Conclusion: An efficacy feature selection method was developed to analyze high-dimensional gene expression data. The subset of core information genes were identified to stratify HNC patients for personalized treatment.

Funding Support, Disclosures, and Conflict of Interest: This work was supported by the National Natural Science Foundation of China (Grant No. 61702414), and the special funds for key disciplines in Shaanxi Universities and colleges.


Not Applicable / None Entered.


Not Applicable / None Entered.

Contact Email