Room: Track 2
Purpose: is the most common extracranial solid tumor in children with highly variable clinical behavior and outcome. The oncogene MYCN amplification is associated with younger age at diagnosis, aggressive disease course and worse survival. This study aims to: 1) develop a machine-learning-based pipeline which can comprehensively analyze metabolites to accurately predict the MYCN amplification status for risk stratification; and 2) decipher key metabolites for MYCN status prediction through developing a multi-classifier grouped ranking approach.
Methods: metabolomic dataset used in this study included 33 neuroblastoma cell lines. Targeted metabolomics identified 161 metabolites based on mass-over-charge ratio and retention time. The area under the mass spectrometry peaks normalized by protein quantification was used as a surrogate for relative abundance for each of metabolites. Among these cell lines, 22 had amplified status and 11 were non-amplified. The proposed pipeline for MYCN status prediction mainly consists of four key steps: SMOTE-based augmentation, recursive feature elimination based feature selection, training and testing. To decipher the key metabolites for MYCN status prediction, a multi-classifier grouped ranking (MCGR) approach was proposed, which consists of three phases: 1) joint feature selection; 2) performance quantification using multiple classifiers, including support vector machines (SVM), kernel extreme learning machine (kELM), and deep perceptron network (DNN); and 3) grouped ranking.
Results: obtained the best prediction results among three classifiers based on three-fold cross-validation, achieving the highest AUC of 0.89. The proposed MCGR approach identified twelve key metabolites that were most predictive for MYCN amplification status in neuroblastoma cell lines.
Conclusion: MCGR approach was developed as an unbiased, effective way to analyze metabolomics data, with potential to accelerate scientific discovery and enable clinical application of metabolomics.
Feature Extraction, Feature Selection
TH- Dataset Analysis/Biomathematics: Machine learning techniques