Abstract: In many real-world applications, obtaining labeled data is a significant challenge due to high costs and technical limitations. This scarcity of labeled outcomes presents a major obstacle for traditional statistical inference. To address this, we introduce a model-free approach for constructing prediction regions for new target outcomes. Our method leverages a labeled source distribution, which is different from the target but related through a distributional shift, to overcome the lack of target labels. When target data are fully unlabeled, our predictions rely entirely on the rich source data; when some labels are available, we seamlessly integrate them to boost efficiency. A key innovation in this new approach lies in how we handle the complexities of different data distributions. We tackle non-exchangeability and non-identifiability by estimating the likelihood ratio through a novel technique: matching the covariate distributions of the source and target domains using a B-spline basis. This powerful approach allows us to accommodate complex error structures, including asymmetry and multimodality. To this end, we construct the highest predictive density sets using a new weight-adjusted conditional density estimator. This estimator models the source conditional density and then transforms it through a weighting scheme to accurately approximate the target conditional density. We will discuss the theoretical guarantees of our method and demonstrate its strong performance. We validate our approach through comprehensive simulation studies and a compelling real-world application using the MIMIC-III clinical database. This is a joint work with Menghan Yi and Yanlin Tang.
About the speaker: Dr. Huixia Judy Wang is the William Marsh Rice Trustee Professor in Data Science and Chair of the Department of Statistics at Rice University. Her prior academic appointments include faculty positions at George Washington University and North Carolina State University. She also served as a Program Director at the National Science Foundation from 2018 to 2022. Dr. Wang's research interests include biostatistics, high-dimensional inference, quantile regression, and extreme value analysis. Her work has been recognized with the NSF CAREER award, the Tweedie New Researcher Award from the Institute of Mathematical Statistics (IMS), and the IMS Medallion Lectureship. She is an elected Fellow of the American Statistical Association (ASA) and the IMS, and an elected member of the International Statistical Institute (ISI). She is currently the co-editor of Statistica Sinica and serves as an Associate Editor for the Journal of the American Statistical Association.