Showcase & Special Sessions

Special Invited Session - Argentina

Organizer: Silvia Sühring, National University of Salta, Argentina

Statistical Analysis for Agriculture and Environmental Sciences: Current Developments and Applications

Abstract

Sustainable agriculture and livestock research play a vital role in the economic development of countries, especially in less developed countries. Currently, agri-food systems must integrate scientific and local knowledge to actively conserve and restore ecosystems. Digitalization is challenging research and education in agronomy. It allows the collection of large amounts of data that provide precise information on multiple variables to assess changes and understand how agri-food systems work. In agriculture, the availability of precision machinery promotes the implementation of on-farm experimental designs where a field is divided into plots receiving different treatments with large amounts of spatial data with plots, but without treatment replications. Treatment comparisons need to be updated to new data types and inference space. Digital data also pervades livestock science. Precision livestock rely on high throughput phenotyping for automating on-farm selection and management decisions. The use of digital cameras and sound monitors coupled with machine learning allows accurately phenotyping of large numbers of individuals in animal studies. However, simple aspects such as animal identification remain challenging. In both agricultural and livestock sciences, understanding the interconnections between data from multiple-variables is increasingly important for integrated management. Statistical approaches providing insight into variable interconnections as putative causal links defining functional networks are crucial.

About the Speakers

Pablo Paccioretti, PHD
Agronomist, Statistics and Biometry, Agricultural College, National University of Cordoba, and CONICET, Argentina. Department of Agronomy and Horticulture, University of Nebraska – Lincoln, Lincoln USA
Specialties: Biostatistics, Statistical computing

Juan P. Steibel, PHD
J. Lush Chair in Animal Breeding and Genetics at Iowa State University
Specialties: Genetics, Biostatistics, High Performance Computing

Nora María Bello, PHD, DVM
Statistician, Agricultural Research Service– Northeast Area, United States Department of Agriculture
Specialties: Animal Science, Statistical modeling

About the Discussant:

Raúl E. Macchiavelli, PhD
Professor of Biometry and Dean, College of Agricultural Sciences, University of Puerto Rico, Mayaguez, Puerto Rico
Specialties: Biostatistics, Statistical modeling

Biometric Showcase

Organizer: Geert Molenberghs, Hasselt University & KU Leuven

Multiwave Validation Sampling for Error-prone Electronic Health Records

Abstract

Electronic health record (EHR) data are increasingly used for biomedical research, but these data have recognized data quality challenges. Data validation is necessary to use EHR data with confidence, but limited resources typically make complete data validation impossible. Using EHR data, we illustrate prospective, multiwave, two-phase validation sampling to estimate the association between maternal weight gain during pregnancy and the risks of her child developing obesity or asthma. The optimal validation sampling design depends on the unknown efficient influence functions of regression coefficients of interest. In the first wave of our multiwave validation design, we estimate the influence function using the unvalidated (phase 1) data to determine our validation sample; then in subsequent waves, we re-estimate the influence function using validated (phase 2) data and update our sampling. For efficiency, estimation combines obesity and asthma sampling frames while calibrating sampling weights using generalized raking. We validated 996 of 10,335 mother-child EHR dyads in six sampling waves. Estimated associations between childhood obesity/asthma and maternal weight gain, as well as other covariates, are compared to naïve estimates that only use unvalidated data. In some cases, estimates markedly differ, underscoring the importance of efficient validation sampling to obtain accurate estimates incorporating validated data.

About the Speakers

Bryan E Shepherd, Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, USA

About the Authors:
Kyunghee Han, 
University of Illinois Chicago
Tong Chen, The University of Auckland
Aihua Bian, Vanderbilt University
Shannon Pugh, Vanderbilt University
Stephany Duda, Vanderbilt University
Thomas Iumley, The University of Auckland
William Heerman, Vanderbilt University
Pamela A Shaw, Kaiser Permanente Washington Health Research Institute 

A Spatial Bayesian Latent Factor Model for Image-on-Image Regression

Abstract

Image-on-image regression analysis, using images to predict images, is a challenging task, due to (1) the high dimensionality and (2) the complex spatial dependence structures in image predictors and image outcomes. In this work, we propose a novel image-on-image regression model, by extending a spatial Bayesian latent factor model to image data, where low-dimensional latent factors are adopted to make connections between high-dimensional image outcomes and image predictors. We assign Gaussian process priors to the spatially varying regression coefficients in the model, which can well capture the complex spatial dependence among image outcomes as well as that among the image predictors. We perform simulation studies to evaluate the out-of-sample prediction performance of our method compared with linear regression and voxel-wise regression methods for different scenarios. The proposed method achieves better prediction accuracy by effectively accounting for the spatial dependence and efficiently reduces image dimensions with latent factors. We apply the proposed method to analysis of multimodal image data in the Human Connectome Project where we predict task-related contrast maps using subcortical volumetric seed maps.

About the Speakers

Jian Kang, Department of Biostatistics, University of Michigan,  Ann Arbor, Michigan, USA


About the Authors:
Cui Guo, 
University of Michigan
Timothy D. Johnson, University of Michigan

Special Mentoring Program Panel

Organizer: Brenda Yankam, Department of Health, Malaria Consortium, Cameroon

Chair: Tarylee Reddy, Professor, Biostatistics Research Unit, South African Medical Research Council, South Africa

Fostering Professional Growth: The International Biometric Society (IBS) Mentor/Mentee Program

About the Session

Attendees of the session "Fostering Professional Growth: The International Biometric Society (IBS) Mentor/Mentee Program" can expect an insightful and engaging experience that highlights the benefit of mentorship in professional development within biometrics. This session will feature experienced mentors and early-career mentees who have actively participated in the IBS Mentor/Mentee Program and other mentorship program, sharing their career journeys and firsthand experiences. Through this session, attendees will gain valuable insights into the diverse career paths within biometrics and the significant impact of mentorship on individual growth. The session aims to inspire greater participation in the IBS Mentor/Mentee Program by showcasing the benefits and positive influence of mentorship, encouraging both potential mentors and mentees to engage and contribute to the professional growth of the IBS community.

About the Panel

Dr José Pinheiro is a Senior Director at Janssen Research and Development - Johnson & Johnson, Somerville, NJ.

Geert Molenberghs is Professor at the Faculty of Medicine at KU Leuven and at Hasselt University.

Scarlett Bellamy is Chair and Professor of Biostatistics at the Boston University School of Public Health. 

Tarylee Reddy is a Professor of Biostatistics Research Unit, South African Medical Research Council, South Africa.

Daniel Gyaase is a PhD student in Medicine (Injury Epidemiology) at the George Institute, Faculty of Medicine and Health, University New South Wales, Australia.

Clemence Taremwa is a Biostatistics and Public Health professional and a lecturer of Makerere University School of Public Health, Makerere University Uganda.

Brenda Yankam is a Statistician working with Malaria Consortium, Department of Health, Buea, Cameroon.

ISI Showcase

Organizer: Peter Doherty, Executive Director, International Biometric Society

Approximate Bayesian Inference for Biostatistics

Abstract

Various biostatistical models can be formulated as latent Gaussian models. For these class of models we can perform fast and accurate approximate Bayesian inference which implies that we can develop more complex models for larger datasets. In this talk I will present the methodology which is based on Laplace approximations and a low-rank Variational Bayes correction and show examples where this approach provides near real-time inference of various models such as joint models, disease mapping and epidemiology models, quantile survival models amongst others. This computational tool allows the practitioner to fit models fast and possibly use them for patient insights. The method is implemented in the INLA R library.

About the Speakers

Janet van Niekerk, King Abdullah University of Science and Technology Computer Electrical and Mathematical Science and Engineering Division



The Kumaraswamy Generalized Autoregressive Score Model for The Quantiles of Double-bounded Hydro-environmental Time Series

Abstract

This work addresses a new generalized autoregressive score (GAS) model for continuous random variables that assume values in the unit interval. For this type of data, the beta distribution provides the premier model, and, to the best of our knowledge, it is the only one explored under the GAS approach. However, it may not always be suitable, and the Kumaraswamy (Kw) distribution stands out as a well-known alternative for a wide range of applications. In this context, the proposed model is defined from the assumption that the conditional quantile of the Kw distribution is a time-varying parameter under the GAS framework. Since the beta GAS has been defined in terms of a time-varying conditional shape (or mean) parameter, our proposal pioneers the conditional quantile approach to analyze double-bounded time series from We present the conditional maximum likelihood (CMLE) method for parameter estimation and conduct a simulation study to evaluate their performance. We also discuss residual analysis, goodness-of-fit assessment, and forecasting for this new model. An empirical application illustrates the suitability of the proposed GAS by fitting eleven water reservoirs from the Southeast/Midwest subsystem of the Brazilian hydroelectric power plant. The proposed provided the highest number of best fits for both in-sample and out-of-sample forecasts. These results evidence the superiority of the Kw-GAS over the beta-GAS for modeling double-bounded hydro-environmental data.

About the Speakers

Renata Rojas Guerra, Universidade Federal de Santa Maria, Statistics



Stat In Practice Showcase

Organizer: Rafael de Andrade Moral, Maynooth University

Statistics in Practice: Selection of Variables and Functional Forms for Multivariable Models 1

Abstract

This "Statistics in Practice" session is presented on behalf of the STRATOS initiative’s Topic Group 2, which focuses on selection of variables and functional forms in multivariable model building.

In many scientific fields, statistical models play a crucial role in describing, predicting, and explaining outcomes with empirical data. Two key and interconnected challenges in model building are selecting the appropriate variables and determining the functional forms of continuous variables within the model.

The first session will delve into these challenges, with a strong focus on aligning the model-building process with the intended purpose of the model. We will emphasize that depending on the model’s objective, the use of variable selection algorithms may either be helpful or ill-advised. Various variable selection methods will be reviewed, highlighting how their application can affect the stability of a model and introduce additional, often overlooked, uncertainty. Additionally, we will explore techniques for handling non-linear functional forms of continuous variables. Although variable and functional form selection is a frequent issue in model building, few algorithms address this combined task, and there is limited knowledge about their relative performance. Some of the existing proposals will be presented and discussed.

About the Speakers

Georg Heinze, Center for Medical Data Science; Institute of Clinical Biometrics, Medical University of Vienna
Theresa Ullmann, Center for Medical Data Science; Institute of Clinical Biometrics, Medical University of Vienna

Statistics in Practice: Selection of Variables and Functional Forms for Multivariable Models 2

Abstract

This "Statistics in Practice" session is presented on behalf of the STRATOS initiative’s Topic Group 2, which focuses on selection of variables and functional forms in multivariable model building.

In many scientific fields, statistical models play a crucial role in describing, predicting, and explaining outcomes with empirical data. Two key and interconnected challenges in model building are selecting the appropriate variables and determining the functional forms of continuous variables within the model.

In the second session, we will take a deeper dive into the topic of variable selection, using a simulation study to empirically examine the properties of several popular algorithms across different scenarios. The results of this study will be presented via an interactive application, allowing for a visual comparison of methods and an overview of their performance. This dynamic presentation will help attendees grasp the nuances of variable selection algorithms. Finally, we will conclude with evidence-based recommendations for practicing statisticians, informed by our simulation results, on the effective and safe use of variable selection methods in real-world applications.

About the Speakers

Georg Heinze, Center for Medical Data Science; Institute of Clinical Biometrics, Medical University of Vienna
Theresa Ullmann,
Center for Medical Data Science; Institute of Clinical Biometrics, Medical University of Vienna

Special Young Researchers Panel

Organizer: Brenda Yankam, Department of Health, Malaria Consortium, Cameroon

Empowering Emerging Biometricians: Developing a Young Biometrician Group within the International Biometric Society (IBS)

About the Session

This session is a pivotal initiative aimed at addressing the unique challenges faced by early-career professionals in the field of biostatistics, including establishing professional identities, networking, and finding mentorship opportunities. Attendees of this session can expect an engaging and collaborative experience designed to foster the growth and integration of young and emerging biometricians within the IBS community. The session will feature a panel discussion with influential leaders from various IBS regions, who will share insights on forming and sustaining young biometrician groups, membership, and promoting regional involvement. This will be followed by an interactive open discussion where attendees can ask questions and gain practical advice. The session will culminate in a collaborative effort to establish the foundation for a dedicated young biometrician group within IBS, with the opportunity for participants to contribute their ideas and join a mailing list for ongoing discussions and formalization of the group. This unique initiative aims to create a vibrant, inclusive community that supports professional growth, networking, and mentorship for emerging biometricians.


About the Speakers

Irantzu Barrio is a Professor of Mathematic in the Department of Mathematics, University of the Basque Country UPV/EHU.

Brenda Yankam is a Statistician working with Malaria Consortium, Department of Health, Buea, Cameroon.

Daniel Mork is a Research Scientist in the Department of Biostatistics at Harvard T.H. Chan School of Public Health working with National Studies on Air Pollution and Health.

Anke Huels is an Assistant Professor of Epidemiology & Environmental Health at the Rollins School of Public Health, Emory University.

Young Statistician Showcase (YSS) 

Organizer: 

Multi-Omics Network Reconstruction: Collaborative Graphical Lasso

Abstract
In recent years, the availability of multi-omics data has increased substantially. Multi-omics data integration methods mainly aim to leverage different molecular data sets to gain a complete molecular description of biological processes. An attractive integration approach is the reconstruction of multi-omics networks. However, the development of effective multi-omics network reconstruction strategies lags behind. This hinders maximizing the potential of multi-omics data sets. With this study, we advance the frontier of multi-omics network reconstruction by introducing collaborative graphical lasso as a novel strategy. Our proposed algorithm synergizes graphical lasso (Friedman et al., 2008) with the concept of collaboration introduced by Gross and Tibshirani (2015), effectively harmonizing multi-omics data sets integration, thereby enhancing the accuracy of network inference. Additionally, to tackle model selection in this framework, we designed an ad hoc procedure based on network stability. We assess the performance of collaborative graphical lasso and the corresponding model selection procedure through simulations, and we apply them to a publicly available multi-omics study of sleep deprivation in mice from Diessler and others (2018). This application demonstrated collaborative graphical lasso is able to reconstruct known biological connections and suggest previously unknown and biologically coherent interactions, enabling the generation of novel hypotheses. We implemented collaborative graphical lasso as an R package, available on CRAN as coglasso.

Diessler, S. et al. A systems genetics resource and analysis of sleep regulation in the mouse. PLOS Biology 16, e2005750 (2018).
Gross, S. M. & Tibshirani, R. Collaborative regression. Biostatistics 16, 326–338 (2015).
Friedman, J., Hastie, T. & Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008).


About the Speaker
Alessio Albanese, Wageningen University & Research, Biometrics

About the Authors
Wouter Kohlen, Wageningen University & Research, Cellular and Developmental Biology

A Statistical Evaluation of ZINBMM and ZICMPMM Using Simulated Data

Abstract
In biomedical research, especially in microbiome studies utilizing Next Generation Sequencing (NGS) like 16S rRNA gene sequencing, analyzing count data poses significant challenges due to issues such as zero inflation and overdispersion. The conventional Zero-Inflated Negative Binomial mixed model (ZINBMM) has been widely used to address these complexities. However, a newer approach, the Zero-Inflated Conway-Maxwell-Poisson mixed model (ZICMPMM), has emerged as a potential alternative.

To compare their performance, a study evaluated both models using two datasets: one from a cross-sectional study on schizophrenia subjects and healthy controls, and another from a longitudinal study on pregnant and non-pregnant women's vaginal microbiomes. Various metrics including AIC, BIC, RMSE, Rootogram, and Vuong’s test were employed for comparison. Additionally, 2000 simulated datasets across 27 settings were used to assess model performance under different conditions.

In the cross-sectional dataset, ZICMPMM showed slightly better performance in terms of Rootogram, RMSE, AIC, and BIC, though Vuong’s test did not conclusively establish superiority. In the longitudinal dataset, while AIC and BIC favored ZICMPMM, RMSE values leaned slightly towards ZINBMM. Simulation results consistently demonstrated ZICMPMM's precision in parameter estimation, reflected in lower AIC, BIC, and Mean Absolute Bias.

Considering the microbiome data modeling's primary goal of identifying associated factors rather than prediction, coupled with ZICMPMM’s ability to handle varying dispersion levels, it appears that ZICMPMM offers superior or at least comparable performance to ZINBMM.

About the Speaker
Abhiram D B, 
National Institute of Mental Health and Neuro Sciences, Biostatistics

About the Authors
Binukumar Bhaskarapillai, National Institute of Mental Health and Neuro Sciences, Biostatistics

BSNMani: Bayesian Scalar-on-network Regression with Manifold Learning

Abstract
Brain connectivity analysis is crucial for understanding brain structure and neurological function, shedding light on the mechanisms of mental illness. To study the association between individual brain connectivity networks and the clinical characteristics, we develop BSNMani: a Bayesian scalar-on-network regression with manifold learning. BSNMani comprises two components: the network manifold learning model for brain connectivity networks, which extracts shared connectivity structures and subject-specific network features, and the joint predictive model for clinical outcomes, which studies the association between clinical phenotype and subject-specific network features while adjusting for potential confounding covariates. For posterior computation, we develop a novel two-stage hybrid algorithm combining Metropolis-Adjusted Langevin Algorithm (MALA) and Gibbs sampling. Our method is not only able to extract meaningful subnetwork features that reveal shared connectivity patterns, but can also reveal their association with clinical phenotypes, further enabling clinical outcome prediction. We demonstrate our method through simulations and through its application to real resting-state fMRI data from a study focusing on Major Depressive Disorder (MDD). Our approach sheds light on the intricate interplay between brain connectivity and clinical features, offering insights that can contribute to our understanding of psychiatric and neurological disorders, as well as mental health.


About the Speaker
Yijun Li, University of Michigan, Biostatistics

About the Authors

Ying Guo, Emory University, Biostatistics and Bioinformatics
Jian Kang, University of Michigan, Biostatistics

Developing Prognostic Models to Predict Renal Graft Survival: Comparison of Statistical and Machine Learning Models

Abstract
Renal transplantation is a critical treatment that can save the lives of individuals who are suffering from end-stage renal disease (ESRD), but graft failure remains a significant concern. Accurate prediction of graft survival after renal transplantation is crucial as it enables clinicians to identify patients at higher risk of graft failure. This study aimed to develop clinical prognostic models for predicting graft survival after renal transplantation and compare the performance of various statistical and machine learning models.
Methodology: The study utilized data from a retrospective cohort of renal transplant recipients at the Ethiopian National Kidney Transplantation Center from September 2015 to February 2022. Various statistical and machine learning models were evaluated based on their discrimination, calibration, and interpretability. The comparison of models included standard Cox, Lasso-Cox, Ridge-Cox, Elastic net-Cox, Random Survival Forest, and Stochastic Gradient Boosting.
Results: The study analyzed a total of 278 completed cases and observed the event of graft failure in 21 patients. The study found that the Random Survival Forest and Stochastic Gradient Boosting models demonstrated the best calibration and discrimination performance shown by an equal AUC of 0.97 and the overlapped calibration plots. On the other hand, the Cox proportional hazards model has the highest interpretability and established superior accuracy in estimating survival probabilities, as evidenced by its lowest Brier score of 0.000071. The current study indicates that an episode of chronic rejection, recipient residence, an episode of acute rejection, post-transplant urological complications, post-transplant nonadherence, blood urea nitrogen level, and number of post-transplant admissions were consistently identified as the top significant prognostic predictors of renal graft survival.
Conclusions: The Random Survival Forest and Stochastic Gradient Boosting models demonstrated superior calibration and discrimination performance, while the Cox proportional hazards model offered accurate estimation of survival probabilities and interpretability. Clinicians should consider the trade-off between performance and interpretability when choosing a model. Incorporating these findings into clinical practice can improve risk stratification, enable early interventions, and inform personalized management strategies for kidney transplant recipients.

About the Speaker
Getahun Mulugeta, Bahir Dar University

About the Authors
Temesgen Zewotir, University of KwaZulu-Natal

Awoke Seyoum Tegegne, Bahir Dar University