GGEB Dissertation Defense - Mackenzie Edmondson

Wednesday, June 16, 2021
1:00 pm - 2:00 pm
06/16/21 - 1:00pm to 06/16/21 - 2:00pm
Add to Calendar
Virtual
Real-world data, including electronic health records and administrative claims data, are widely used in modern healthcare research to generate real-world evidence for improving patient care. The widespread availability of observational data from a variety of institutions has prompted many large-scale, multi-site studies in recent years. Studies incorporating data from multiple institutions often attain results more generalizable than those from single-site studies and offer improved power for studying rare outcomes or exposures. Various challenges concerning patient-level data sharing, primarily those related to data privacy, have made distributed data analysis a practical alternative to analyzing centralized data in multi-site studies. Under a distributed data analysis framework, patient-level data are not shared across institutions. Instead, aggregated data are shared and communicated to a coordinating site to obtain analysis results. While methods for performing distributed analyses are increasingly available, analytical methods for analyzing binary and count outcomes are limited. In this work, we propose two distributed regression algorithms for modeling count outcomes in multi-site studies. The first algorithm uses distributed quasi-Poisson regression to model counts while accounting for institution-specific heterogeneity in the outcome. The second uses distributed hurdle regression to model counts subject to zero-inflation. Both algorithms are communication efficient and highly accurate, requiring at most two or three rounds of communication among participating institutions and achieving results close to those obtained using pooled regression of all patient-level data, a method usable only if data are centralized. We evaluate the performance of each method through simulations and applications to real-world clinical research networks. Finally, we illustrate a novel application of a distributed generalized linear mixed modeling algorithm with binary outcomes to study the effect of admitting hospital on racial disparities in mortality for patients hospitalized with COVID-19 via counterfactual modeling.