Network Analysis in Human Microbiome Study
Microorganisms coexist with complex interacting relationships that form dynamic stability and support health or cause a disease. Characterizing microbial interactions by statistical network analyses is useful to better understand the microbial biology and the etiology of a disease. For microbiome studies with data from multiple conditions, it is desirable to jointly estimate networks with the option to detect hub taxa. Existing methods based on penalized estimates of Gaussian graphic models are computationally prohibitive and have unacceptably high false positive rates.Based on the centered log-ratio transformed (CLR) data, I develop a computationally efficient and statistically comprehensive framework for microbiome network analysis that integrates data across multiple conditions and has the feature of detecting hub taxa with modest signals. I further develop a formal and efficient testing method for detecting network heterogeneity across conditions or subtypes of a disease. Finally, I extend these algorithms to samples selected under a complex design by appropriately integrating sampling weights, motivated by the oral microbiome study nested in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) at NCI. Analysis of the oral microbiome data of lung cancer in the PLCO cohort demonstrated the superior performance of my methods in network construction and heterogeneity detection that offer an interesting insight of the oral microbiome to the etiology of lung cancer. R package is publicly available in GitHub.While CLR-transformation is mathematically convenient for network analysis, compositionality is not completely adjusted, which makes interpretation difficult when the number of taxa is relatively small. To overcome this problem, I develop novel multivariate distributions to flexibly model the dependence structure and the compositionality of microbiome relative abundance data using inverse gamma distributions and copula techniques. The real data analysis of the gut microbiome data in the American Gut Project show that the proposed multivariate distributions perform well, which serve as the basis for extending my network analysis methods.
Showing items related by title, author, creator and subject.