Biostatistics Seminar Series - Kun Chen, PhD

Tuesday, October 26, 2021
3:30 pm - 4:30 pm
10/26/21 - 3:30pm to 10/26/21 - 4:30pm
Add to Calendar
Virtual BlueJeans Meeting
Title: An amalgamation-based statistical learning paradigm for microbiome dataAbstract: Compositionality, taxonomic tree structure, and high-dimensionality with excessive zero entries are among the most prominent features of microbiome data. In many applications, there is a genuine need to reduce the dimensionality of the compositional data to facilitate understanding and subsequent analysis. We propose an information-theoretic, taxonomy-guided statistical learning paradigm for compositional data based on the operation of amalgamation. In unsupervised learning, we propose Principal Amalgamation Analysis, which aggregates the compositions to a smaller number as guided by the taxonomic hierarchy, to preserve as much as possible a diversity or likelihood-based complexity measure of the microbiome data. In supervised learning, we propose Relative-Shift Regression that directly uses compositions as predictors. This approach offers a superior interpretation of how shifting relative concentrations between compositional components affects the response. Equi-sparsity and tree-guided regularization methods and an efficient smoothing proximal gradient algorithm are developed with theoretical guarantee. The effectiveness of the proposed methods is demonstrated in extensive simulation and several gut microbiome studies.