Systems Metabolomics for Biomarker Discovery
Creator
Fan, Ziling
Advisor
Ressom, Habtom W
Abstract
Metabolomics is defined as a quantitative analysis method applied to the metabolome in a biological sample. As the most downstream of biological systems, the metabolome has been referred to as the link between genotype and phenotype. As a consequence, exploring the metabolome potentially reveals important phenotypic changes caused by pathophysiological or medical abnormalities, making metabolomics a promising tool for biomarker discovery. Systems metabolomics is the simultaneous assessment and analysis of metabolomics datasets in conjunction with other Omics datasets such as genetic variation in DNA sequence, DNA epigenetic modification, and gene expression. The promise of this approach is to provide a holistic view of a biological system. However, many challenges, ranging from measurement technologies to analysis methods, have to be overcome before this approach can be used for biomarker discovery. This dissertation focuses on two primary challenges in metabolomics-based biomarker discovery: metabolite identification and selection of biologically important biomarker candidates.
Metabolite identification is a critical and challenging step in mass spectrometry-based metabolomic profiling. In a typical untargeted LC-MS/MS-based metabolomics analysis, metabolite identification is performed using orthogonal features such as m/z, retention time, and MS/MS spectrum. The latter uses experimental MS/MS spectra to match them against those in spectral libraries for metabolite identification. Yet, existing spectral libraries contain only a small percentage of spectra for compounds found in living organisms. A new method, MetFID (Metabolite Fingerprint IDentification), is introduced for identification of metabolites whose reference MS/MS spectra are not currently present in spectral libraries. This is accomplished with an artificial neural network (ANN) which is utilized to predict molecular fingerprints from experimental MS/MS spectra. To narrow the search space, MetFID retrieves candidate molecules from metabolite databases using molecular formulae or m/z values of precursor ions. The candidate whose fingerprint is most similar to the predicted fingerprint is used for metabolite identification. We observed that training separate models for a pre-specified narrow range of collision energies helps enhance model performance compared to a model that covers a wide range of collision energies. Evaluation was performed by training MetFID using MS/MS spectra from the MoNA repository and the NIST 17 library and testing with structure-disjoint (training spectra do not include spectra which have the same first part of InChIKey with testing spectra), MS/MS spectra from the NIST 17 library, the CASMI 2016 dataset, and in-house MS/MS data from a cancer biomarker discovery study. We demonstrated that MetFID attains greater accuracy with regard to metabolite identification compared to other tools such as ChemDistiller, CSI:FingerID, and MetFrag.
A great amount of effort has been devoted to investigating methods to select biologically important biomarkers which are generalizable for a wide population. Functional layers of the biological system include the genome, transcriptome, proteome, and metabolome. The integrative analysis of data from a large number of molecules involved in various layers of the biological system offers a promising approach to rank disease biomarker candidates. Furthermore, the relationship between biomolecules can be conceptualized as a network-based regulatory system within a single biological layer or across different biological layers. Considering this, we introduce a network-based method, MOTA, which analyzes multi-omics dataset aiming to rank candidate metabolite biomarkers. The network constructed by MOTA reveals the altered correlation (regulatory) relationship between biomolecules caused by a disease state. We evaluated the performance of MOTA in ranking disease-associated molecules from three sets of multi-omics data representing three cohorts of hepatocellular carcinoma (HCC) cases and patients with liver cirrhosis. MOTA identified more metabolite biomarker candidates within in the top 10 ranks shared by two different cohorts compared to traditional statistical methods. Moreover, the mRNA candidates top-ranked by MOTA contains more cancer driver genes compared to those ranked by other methods, such as Student t-test or iDINGO.
Together, the proposed methods for metabolite identification and biomarker selection will contribute to systems metabolomics by exploring the role of metabolomics in systems biology research for biomarker discovery.
Description
Ph.D.
Permanent Link
http://hdl.handle.net/10822/1060516Date Published
2020Subject
Type
Publisher
Georgetown University
Extent
107 leaves
Related items
Showing items related by title, author, creator and subject.
-
Addressing Thoracic Cancers: Biomarker and Pre-Clinical Drug Discovery in Non-Small Cell Lung Cancer and Thymic Carcinoma
Chen, Vincent (Georgetown University, 2020)Thoracic cancer is a term that encompasses numerous malignancies within the thoracic cavity; the term includes malignancies such as lung cancers and thymic epithelial tumors. Lung cancers are the leading cause of cancer ...