Invited Session Program

The IBC2024 Invited Session Program has been Confirmed. We are thrilled to announce that 25 Invited Sessions have been selected to present during the International Biometric Conference (IBC2024) live, and in-person in Atlanta, USA on 8-13 December 2024.

The 25 sessions include a wide range of topics, including ecology, clinical trials, general modelling approaches, health, epidemiology, and environmental health.  Congratulations to the following! 

Statistical Modeling for Complex Spatial Data: New Venues in Biometrics

Session Chair: Marcos Oliveira Prates, Statistics, Universidade Federal de Minas Gerais (Brazil)



Spatial-referenced data is a common feature in biometric applications, but despite extensive research in this area, there remain several crucial issues that require further attention. With the advancement of technology, spatial data has become more abundant, and this new perspective has allowed for the collection of diverse data of interest in the same region, leading to an increased focus on multivariate models.

While there are methodologies in the literature to deal with multiple responses, there is still ample scope for research and improvement, such as a deeper understanding of existing techniques and the development of scalable models to handle the growing volume of observed data.
Additionally, it is important to note that some response variables are not normally distributed, and non-Gaussian and discrete processes are essential for realistic modeling. Moreover, spatial data can be measured at different levels, such as point, satellite grids, or political divisions, creating a diverse range of data. Since the scale of these measurement levels is not necessarily uniform, techniques for dealing with spatial misalignment, data fusion, and prediction for area data are critical for creating models that utilize all available information effectively.

In summary, the research directions for spatial-referenced data in biometric applications include the development of spatially explicit models that incorporate spatial dependencies and environmental covariates, spatial-temporal modeling, and methods for dealing with non-normality, diverse measurement levels, and spatial misalignment.

Proposed Speakers

Luis Mauricio CeperoPontificia Universidad Catolica de Chile (Chile)

  • Modeling point referenced spatial count data: a Poisson process approach

Lola UgarteUniversidad Publica de Navarra (Spain)

  • On fitting multivariate spatial models to analyze several diseases in big domains

MacNab YingThe University of British Columbia (Canada)

  • Bayesian spatial and spatio-temporal models for infectious disease mapping and forecasting

Marcos PratesUniversidade Federal de Minas Gerais (Brazil)

  • Statistical Inferences and Predictions for Areal Data, Spatial Data Fusion, and Change of Support with Hausdorff–Gaussian Processes


Challenges and Opportunities of Statistical Analysis with Real-World Data: Inference and Prediction

Session Chair:  Michelle Shardell, University of Maryland School of Medicine (USA)



Real-world data (RWD), such as data from electronic health records, disease registries, administrative claims for billing, and cell lines, provide unprecedented opportunity to propel generation of real-world evidence (RWE) to advance precision medicine and improve health outcomes. However, RWD are rarely collected specifically for research purposes from a rigorously designed study; thus, RWE is vulnerable to multiple sources of bias leading to suboptimal decisions. Key challenges of research with RWD that may lead to bias include uneven data quality, data sparseness, participant heterogeneity, selection bias, and informative outcome-dependent follow-up. In this session, we will describe advances in statistical methods to address these challenges. Methods are motivated by and applied to settings such as observational longitudinal studies, risk prediction, hybrid-design randomized trials, and genetic testing.

This collection of talks will explore a breadth of topics in RWD, which will enable comparison and contrast of salient features of different RWD platforms. This topic is particularly relevant to a wide population of biostatisticians working in the health sciences owing to a proliferation of efforts to link RWD to more traditional research data resources, such as epidemiologic observational or intervention study data. Therefore, a session comprising discussion of different RWD platforms for the diverse purposes of observational studies, risk prediction, hybrid randomized trials, and genetic testing will enhance cross-talk among biostatisticians working in multiple applications. Ultimately, the audience will learn the opportunities as well as the challenges arising from RWD to generate RWE with the goal of improving health.

Session speakers are diverse across multiple dimensions: region (USA, Canada, UK), gender (three women, two men), career stage (from assistant to full professor), affiliation (school of medicine, mathematics/statistics, data science).

Proposed Speakers & Discussant

Janie Coulombe, Universite de Montreal (Canada)

  • Multiply robust longitudinal causal inference with informative monitoring times with an application to lifestyle interventions in electronic health records

Ying Lu, Stanford University School of Medicine (USA)

  • Quality assessment of real-world data and their impact on randomized trials linked to registry data with an application to amyotrophic lateral sclerosis

Hongsheng Dai, Newcastle University (UK)

  • Novel mixture models to address patient heterogeneity in real-world genetic data to advance precision medicine with an application to the Cancer Cell Line Encyclopedia

Jessica Gronsbell, University of Toronto (Canada)

  • Estimation and evaluation of semi-supervised learning models in settings with large proportions of unlabeled data with application to outcome prediction from electronic health records

Michelle Shardell, University of Maryland School of Medicine (USA)

  • Semiparametric models of longitudinal data with time-varying covariates and latent effects to address informative observation times with application to U.S. Medicare claims data


Experimental Design and Assessment of Genetic Gain for Multi-Environment Trials in Plant Breeding and Variety Testing

Session Chair:  Hans-Peter Piepho, University of Hohenheim (Germany)



Plant breeding programs need to be optimized in terms the resource allocation and the choice of experimental design that maximize the expected genetic gain. There are several components of the overall design of a breeding program, such as the number of locations, the number of genotypes, the specific set of genotypes, the number of replications in each given trial for each of the genotypes tested, the experimental designs chosen for the individual trials and the allocation of genotypes to locations across the trialling network. If the target population of environments involves a subdivision into agroecological zones, the number of locations must be chosen for each of the zones.

Classical experimental design optimization is one part of the optimization to be done for this purpose, and several good computer packages are available for this optimization task, focusing on optimality criteria such as D- and A-efficiency. Such optimization typically works with a given fixed number of genotypes, possibly characterized by marker-based kinship or pedigree, as well as with a given layout of the experimental units in each of the planned trials, and makes use of an assumed linear mixed model. Another aspect of the optimization is the selection of genotypes to be tested and their number. The more genotypes can be tested, the higher can be the selection intensity, which may benefit the realized genetic gain. Furthermore, the number of locations must be determined as well as the number of experimental units available or assigned to each location. This part of the optimization task cannot be tackled directly by the experimental design packages currently available, which poses a challenge in the overall optimization of a breeding program. That optimization requires consideration of the expected genetic gain as an objective criterion, which can be assessed or predicted in several ways, including by simulation or by making use of the breeder’s equation for response to selection and hence heritability, which in turn is a function of the multi-environmental experimental design. The main challenge is how to integrate the different objective criteria and how to perform the overall optimization.

The proposed Invited Session deals with these closely linked design problems. Each talk focuses on some or all of the aspects involved and puts them into the general context of the overall task to optimize a breeding or variety testing program in terms of the expected genetic gain.

Proposed Speakers

Elsa Gonçalves, University of Lisbon (Portugal)

  • Efficient experimental designs to achieve high genetic gains: the example of polyclonal selection within ancient grapevine varieties

Renata Sermarini, University of Sao Paulo (Brazil)

  • Impact on genetic gain from assuming different statistical models in generating designs for early generation plant-breeding experiments

Lucia Gutierrez, University of Wisconsin - Madison (USA)

  • Optimization of experimental designs for large genomic studies in multi-environment trials using sparse testing


Advances in Statistical Disease Modelling and Surveillance

Session Chair:  Rob Deardon, University of Calgary (Canada)



Following the COVID-19 pandemic, there has been an understandable increase in the interest in the infectious disease modelling and surveillance techniques. However, infectious diseases have always been of interest across public health, agriculture and ecology.

Inference for infectious disease models and surveillance techniques is made more complicated by the fact that we often have information required for the model that we do not directly observe. For example, infection times are usually not observed, rather dates of reporting are, and incidence counts are often plagued by underreporting, or other systematic biases.

Additionally, we often have complex heterogeneities in the population we wish to account for, since, for example, populations do not tend to mix homogeneously. This often leads to a need for spatial and/or network-based models. Typically, inference for such models is done in a Bayesian framework, accounting for latent or uncertain variables such as event times using data-augmentation.

Here, we propose a session to include four speakers each working on the cutting edge of disease modelling and surveillance methods. Their talks will cover topics ranging from: the inclusion of population behavioural change mechanisms in epidemic models; estimating and predicting the course of an epidemic from hospitalization counts using Markov switching models; and the use of weighted networks to better capture spatial correlation in conditional autoregressive models.

Proposed Speakers & Discussant

Christel Faes, Universiteit Hasselt (Belgium)

  • Nowcasting the reproduction number with Laplacian-P-Splines

Caitlin Ward, University of Minnesota Twin Cities (USA)

  • Behavioural change in epidemic models

Dirk Douwes-Schultz, McGill University (Canada)

  • Markov zero-inflated space-time multinomial models for comparing multiple infectious diseases

Renato Assuncao, Universidade Federal de Minas Gerais Instituto de Ciencias Biologicas (Brazil)

  • Inducing high spatial correlation with randomly edge-weighted neighborhood graphs

Innovative Clinical Trial Designs: Enhancing Efficiency and Precision in Medical Research

Session Chair:  Frank Bretz, Novartis (Switzerland)



In the ever-evolving field of clinical research, it is crucial to continually explore innovative trial designs. By enhancing various aspects of trial design, we can maximize the effectiveness of clinical research and expedite the development of new therapies. This session brings together leading experts who will share their insights, providing attendees with valuable knowledge and strategies to optimize trial designs for future studies.

Our first speaker addresses the challenge of determining the optimal evaluation time for ordinal score outcomes when there is limited information available, such as the WHO 8-point scale for COVID-19. This talk proposes and evaluates data-driven methods to select the most appropriate evaluation time, considering the risk of premature or delayed assessments that may compromise the accuracy of the outcome measurements.

The second speaker focuses on complex diseases, where clinical trials often need to consider heterogeneity in outcome priorities among patients. By leveraging desirability of outcome ranking approaches, this talk aims to construct composite endpoints that integrate various outcomes for diseases with common causes. Moreover, analysis strategies tailored to trial designs that encompass treatment switching are discussed, providing valuable insights into optimizing clinical trials for complex disease scenarios.

The third speaker proposes design strategies to assess the benefit within a biomarker positive sub-population in confirmatory clinical trials. The utilization of graphical gatekeeping methods and adaptive designs enables the assessment of treatment effects in both ‘all-comers’ and ‘biomarker positive patients.’ This talk also highlights the importance of decision rules to evaluate the treatment effect in biomarker-negative patients to ascertain the effect in all-comers. Practical considerations and challenges are elucidated through examples and simulations.

The fourth speaker explores the synergistic potential of combining covariate adjustment with group sequential and information-adaptive designs to enhance the efficiency of randomized trials while adhering to regulatory requirements. This talk proposes adjusted estimators that satisfy the independent increments property required for standard stopping boundaries. Moreover, the integration of information-adaptive designs allows trials to adapt to the amount of precision gain, resulting in faster and more efficient trials without compromising validity or power.

Proposed Speakers & Discussant

Michael Proschan, National Institute of Allergy and Infectious Diseases (USA)

  • Determining The Evaluation Time for an Ordinal Score Outcome

Ying Lu, Stanford University School of Medicine (USA)

  • Composite Endpoints in Clinical Trials for Complex Diseases

Bharani Dharan, Novartis Pharmaceuticals Corporation (USA)

  • Design strategies to assess benefit for biomarker sub-populations in Phase III clinical trials

Kelly Van Lancker, Universiteit Gent (Belgium)

  • Combining Covariate Adjustment with Group Sequential and Information Adaptive Designs to Improve Randomized Trial Efficiency


From Chaos to Clarity: Tackling Multiple Events in Clinical Trials

Session Chair:  Kelly Van Lancker, Universiteit Gent (Belgium)



Disease burden and progression are often characterized by the occurrence of multiple events. Most traditional analyses of clinical trials are restricted to the occurrence of the first event. For chronic diseases, such as chronic heart failure, this can lead to a substantial loss of information and does not capture all clinically relevant outcomes. In this session, we will address challenges that arise when analyzing randomized trials with multiple time-to-event outcomes and propose potential solutions.

The speakers will focus on issues related to defining, identifying, and estimating effects of practical importance. The aim is to bring together experts from the pharmaceutical industry and different academic fields. By bridging different domains, we hope to provide a valuable platform for knowledge exchange and discussions. Attendees, including trialists and applied researchers, will be exposed to ideas and results that can ultimately improve the quality and validity of their analyses.

More specifically, we will have three speakers -including a speaker from pharmaceutical industry- followed by a moderated discussion involving the three speakers.

The first speaker will share expertise and practical experience in survival analysis methods, drawing from studies of time-to-event data in clinical trials. She will consider commonly used statistical analysis procedures in the presence of recurrent and intercurrent events. Challenges related to competing events, such as death, will be addressed, along with guidance on interpreting classical statistical estimands as causal quantities.

The second speaker will discuss new results on causal inference methods, emphasizing the importance of defining estimands with causal interpretations prior to statistical analyses. He will showcase the application of causal directed acyclic graphs and single world intervention graphs to reason about identification conditions for different estimands. He will further elaborate on recent developments in interventionist mediation analysis and their relation to estimands advocated in the ICH9 addendum.

The third speaker will present a new method for estimating treatment effects on recurrent events in the presence of death. He will expand on the idea of “while-alive” estimands to adjust for the length of survival. Then he will introduce a robust nonparametric estimator and evaluate its performance through simulations and analysis of real data from a heart failure trial.

Proposed Speakers & Discussant

Mouna Akacha, Novartis AG (Switzerland)

  • Estimands for Recurrent Event Endpoints in the Presence of a Terminal Event

Mats Stensrud, Ecole polytechnique federale de Lausanne Faculte des sciences de la vie (Switzerland)

  • Causal inference with recurrent and competing events

Lu Mao, University of Wisconsin - Madison (USA)

  • Nonparametric inference of general while-alive estimands for recurrent events

Bayesian Machine Learning Approaches for Complex Problems in Causal Inference

Session Chair:  Michael Daniels, University of Florida (USA)



Large observational datasets are common and a great resource in biomedical and public health research. In recent years it has become increasingly popular to draw causal inference using these data. Three examples include whether differences in BMI contribute to racial/ethnic survival disparities in colorectal cancer patients, for estimating cognitive effects of a hypothetical intervention monitoring systolic blood pressure (sBP) at more optimal levels, and whether earlier initiation of cardiac-directed medications mediate the effect of anthracyclines in the treatment of childhood acute myeloid leukemia. A Bayesian approach is attractive for numerous reasons in these settings. First, assessing sensitivity to uncheckable assumptions, which are necessary to draw causal inferences from observed data, is crucial. The Bayesian paradigm offers a natural approach to this, by incorporating informative priors. Second, by specifying a parametric (generative) model, missingness - which is almost always a concern - can be automatically handled (assuming ignorability) for outcomes, confounders, mediators, and other variables. However, a frequent concern with Bayesian methods is model misspecification, in particular with a parametric model. Bayesian machine learning approaches are viable alternatives that minimize bias from model misspecification and allow complex relationships (e.g., interactions) to be uncovered by the data without needing to be explicitly specified.
This session will examine Bayesian machine learning approaches to address several complex problems in causal inference. The speakers will provide insight on the topic from different perspectives using Bayesian additive regression trees (BART) and variations of Dirichlet process mixtures (DPMs). The following themes will be covered: inference on causal mediation using densities, inference for heterogeneous treatment effects in observational data and clinical trials, causal inference on threshold interventions (based on natural value of treatment, NVT) in the presence of dropout and death, and longitudinal mediation analysis in continuous time. A variety of application areas, such as prediction of treatment effectiveness in clinical trials and long-term followup in prospective cohort studies, will be illustrated with empirical examples.

Proposed Speakers & Discussant

Maria Josefsson, Umea Universitet (Sweden)

  • A Bayesian semi-parametric approach for incremental intervention effects in mortal cohorts

Alejandro Jara Vallejos, Pontificia Universidad Catolica de Chile (Chile)

  • A Bayesian nonparametric regression approach for mediation analyses

Ioanna Manolopoulou,  University College London (UK)

  • Bayesian Causal Forests for heterogeneous treatment effects estimation from randomised and observational data

Jason Roy,  Rutgers The State University of New Jersey (USA)

  • Bayesian nonparametric models for longitudinal mediation in continuous time

Discussant: Michael Daniels, University of Florida (USA)


Statistical Methods and Considerations in the Design and Analysis of Vaccine Clinical Trials

Session Chair:  Sanne Roels, Janssen Pharmaceutica NV (Belgium)



In this session statistical methodology and practice in the design and analysis of preventive vaccine trials will be spotlighted. The diverse set of speakers will discuss statistical topics that emerged during the COVID-19 pandemic. The session will end with a speaker discussion.
The first speaker, Sanne Roels (Janssen), will discuss the double-blind phase III ENSEMBLE trial where viral variants resulted in vaccine efficacy (VE) heterogeneity. Also, based on several studies, it will be discussed how causal inference methods for controlled risk and VE improved understanding of how the immunologic biomarkers correlated with or impacted the risk of covid-19.
In the second talk, Peter Gilbert (Fred Hutch) will describe statistical methods for assessing how these biomarker correlates depend on genotypic features of SAR-CoV-2. These analyses aim (1) to understand whether a surrogate marker works differently for viral variants vis-à-vis the vaccine insert; (2) to develop optimal genotypic biomarkers for discriminating VE levels; and (3) to therefore guide optimal selection of SARS-CoV-2 vaccine inserts and to improve models of a vaccine’s population impact. Challenges of a two-phase biomarker sampling design, missing SARS-CoV-2 sequences, and high-dimensionality of genotype diversity will be discussed.
The third speaker, Daniel Backenroth (Janssen), will discuss opportunities and challenges that arose during the analysis of open-label ENSEMBLE data, and the challenges in interpreting vaccine efficacy after randomization is lost. Innovative methods for observational data analysis will be discussed, including negative controls, a tool to detect and correct bias using non-treatment related outcomes.
The last speaker, Tarylee Reddy (SAMRC, South Africa), will discuss statistical methods for assessing real world effectiveness of vaccines in resource-limited settings. The single-arm, open-label, Sisonke trial, conducted in South African health-care workers was designed to evaluate the effectiveness of a single dose vaccine regimen (same as ENSEMBLE) and required extensive simulations. This matched cohort clinical trial design involved three sets of analyses based on external real-world evidence data. Statistical approaches for combining estimates, accommodating data variability and the matching process, will be discussed. The talk will conclude with current challenges in the evaluation of vaccine effectiveness.

Proposed Speakers

Sanne Roels,  Janssen Pharmaceutica NV (Belgium)

  • Evaluating Vaccine Efficacy and Correlates of Risk and Protection in Ad26.COV2.S Vaccine Studies

Peter Gilbert,  Fred Hutchinson Cancer Center (USA)

  • Assessing How Immune Correlates of Risk Depend on Viral Genetics in Vaccine Efficacy Trials: Application to Ad26.COV2.S ENSEMBLE

Daniel Backenroth, Janssen Pharmaceutica NV (Belgium)

  • Innovative methods in observational data analysis applied to open-label data from ENSEMBLE study

Tarylee Reddy, South African Medical Research Council (South Africa)

  • Evaluation of the real world effectiveness of the Ad26.COV2.S vaccine in South Africa: The Sisonke Trial



Session Chair:  Jiebiao Wang, University of Pittsburgh (USA)



Spatially resolved transcriptomics (SRT) was named the 2020 Method of the Year by Nature Methods. By pairing transcriptional data with spatial data to create maps of gene expression, it enables researchers to spatially localize and quantify gene expression in the form of mRNA transcripts within cells or tissues in their native state. With explosive popularity, it provides valuable insights into the biology of cells and tissues while retaining information about the spatial context. Our first three talks will highlight recent advances in statistical methods and applications of SRT.

Dr. Mingyao Li will present methods to integrate gene expression with histology to computationally reconstruct SRT data that cover the entire transcriptome with near-single-cell resolution. Through comprehensive analysis of diverse datasets generated from both diseased and normal tissues, she will show that their super-resolution gene prediction is accurate and useful for different applications in tissue architecture inference.

Dr. Xiang Zhou will present a computational method, IRIS (Integrative and Reference-Informed tissue Segmentation), that can characterize the spatial organization of complex tissues through accurate and efficient detection of spatial domains. IRIS is unique in leveraging single-cell RNA-seq data for reference-informed spatial domain detection, integrating multiple SRT tissue slices jointly while explicitly accounting for the correlation both within and across slices, and taking advantage of multiple algorithmic innovations for highly scalable computation.

Dr. Julia Wrobel will discuss utilizing spatial summary statistics to explore inter-cell dependence as a function of distances between cells. Using techniques from functional data analysis, she will introduce an approach to model the nonlinear association between summary spatial functions and subject-level outcomes. They apply the proposed method to cancer data collected using multiplex immunohistochemistry (mIHC).

Besides spatial variation, there is huge heterogeneity across cell types in bulk tissue genomics data. To address this issue, Dr. Yun Li will introduce efficient computational deconvolution of bulk RNA-seq to reveal the cell-type specificity mechanism in Alzheimer’s disease.

All four talks will interconnect various research areas and inspire cross-area collaborations. They share similar statistical challenges and methodologies and promote biometric applications in biological and life sciences.

Proposed Speakers & Discussant

Mingyao Li, University of Pennsylvania (USA)

  • Integrating spatial transcriptomics with histology to infer super-resolution tissue architecture

Wenjing Ma, University of Michigan (USA)

  • Analysis of time-resolved spatial transcriptomics reveals dynamic functional domains

Julia Wrobel, Colorado School of Public Health (USA)

  • Analysis of immune cell spatial clustering using functional data models

    Yun Li, The University of North Carolina at Chapel Hill (USA)

    • Efficient computational deconvolution of bulk RNA-seq reveals cell-type specificity mechanism in Alzheimer’s disease

    Causal Inference in Real-world Studies with Missing Data

    Session Chair:  Margarita Moreno-Betancur, The University of Melbourne (Australia)



    Many modern health and medical research studies seek to address causal questions, about the effects of exposures, treatments or other interventions on health outcomes. Although there have been important advances in concepts and methods for causal inference, there has been little work on how to handle the widespread problem of missing data in this context. Firstly, despite the central role of assumptions in causal inference, the expanded assumptions required with missing data have received little attention, particularly when there are also other complexities such as interference and censoring. Secondly, with missing data due to processes such as death, the estimand and identifiability assumptions need to be reconsidered, but there is no consensus on how to address this problem. Finally, there has been little work regarding appropriate approaches to handle missing data at the causal effect estimation stage, which are required not only to obtain unbiased estimates, but also to increase precision.
    This session aims to present key advances addressing these gaps, with motivation in real-world studies. It will begin by presenting recent advances grounded on a paradigm-shifting approach to depict and assess multivariable missingness assumptions, which uses “missingness” graphs or directed acyclic graphs (DAGs). Emerging work has developed missingness DAGs to depict possible missingness mechanisms in epidemiological studies, and derived theoretical results demonstrating implications for causal effect identifiability, as well as for estimation using appropriately tailored multiple imputation procedures in the context of modern causal analytic methods. The session then presents theoretical advances expanding the graphical modelling paradigm to the context of interference, i.e. when the treatment of one individual affects the outcome of another individual. It is shown whether and how unbiased estimates of causal effects can be obtained in the presence of missing data and interference without resorting to simplifying missingness and “partial interference” assumptions. The session will then focus on the problem of causal inference with missing data in the context of censored survival data as well as death as an intercurrent event related to missingness. It is argued that a well understood real-world analysis should be pursued, with challenges and solutions with their bias-variance trade-off illustrated in a case study where the three challenges are handled jointly.

    Proposed Speakers & Discussant

    Margarita Moreno-Betancur, The University of Melbourne (Australia)

    • Identifiability and estimation of causal effects in epidemiological studies with multivariable missingness

    Karthika Mohan, Oregon State University (USA)

    • Causal Inference in the presence of interference and missing data

    Els Goetghebeur, Universiteit Gent (Belgium)

    • The triple hurdle of missing data in causal inference when censored survival data meet missing measurements: the case of impact on quality of life

    Discussant: Katherine Lee, Murdoch Children's Research Institute (Australia)


    Interactive Visualisation for Effective Decision-making in Agricultural and Environmental Sciences

    Session Chair:  Emi Tanaka, Australian National University (Australia) 



    Statistical computing and visualisation have evolved significantly in the last decade with the development of tools that allow us to create reports dynamically, facilitating reproducible research, and explore data interactively. While these tools are increasingly adopted in a range of areas, little mention of this topic is discussed in traditional statistical conferences. The proposed invited session aims to:

    1. fill the gap for discussion on the development of modern computing and visualisation, such as new frameworks that facilitate interactive visualisation and dynamic report creation, but also evoke conversation about the (lack of) timelessness of the tools and the role of a statistician in using such tools;
    2. demonstrate capabilities of current tools and inspire its use with innovative applications in agricultural or environmental sciences;
    3. present novel visualisations to facilitate understanding of biological or environmental phenomena; and
    4. review synergistic use of modern computing tools with traditional "hard" statistics to draw insights from data for non-statistical experts.

    Modern statistical computing tools have massive downstream influence into decision-making for the masses. For instance, many graphics in the R programming language are produced using the principles of layered grammar of graphics implemented in ggplot2. Additionally, interactive data visualisation and advanced analytics can be easily shared with non-experts using Shiny in R or Python, Dash in Python and Plotly Javascript across many languages. Moreover, the integration of reproducible reports enabled by Quarto, R Markdown or notebook style output (such as Jupyter notebook in Python). Collectively, these tools have garnered over 22,000 citations.

    Each talk presented will highlight specific applications in the agricultural or environmental sciences. The demonstrated tools coupled with appropriate statistical methodology have an immense potential to facilitate scientific discoveries and decision-making processes. The proposed session will help bridge the gap between statisticians and the consumers of statistics by inspiring discussion about various aspects of our profession.

    The invited speakers and discussant were carefully chosen to represent a range of regions, gender, and experience. Specifically, 40% are female, 3 regions are represented (East North America, Australasian, and Japan) and 40% are early career researchers, thereby contributing to the overall diversity.

    Proposed Speakers & Discussant

    Julien Diot, Tokyo Daigaku (Japan)

    • PlantBreedGame: A serious game to teach Genomic selection

    Lydia Lucchesi, Australian National University (Australia)

    • smallsets: Visual Documentation for Data Preprocessing in R and Python

    Gota Morota, Virginia Polytechnic Institute and State University (USA)

    • ShinyAnimalCV: Interactive computer vision tool for animal detection and three-dimensional visualization

      Garth Tarr, The University of Sydney (Australia) 

      • The cutting edge: dashboards for beef industry insights and tools to support experimental design

      Discussant: Emi Tanaka, Australian National University (Australia)


      The More, The Merrier: Leveraging Multiple Exposures Under Real-World Complexities for Causal Inference and Decision-Making

      Session Chair:  Nandita Mitra, University of Pennsylvania (USA)



      Classical causal inference has focused on assessing the direct effect of a single treatment on an outcome, with an underlying assumption that understanding this isolated effect in a vacuum will lead to optimal decisions in the real world. However, in practice, units are generally not exposed to a policy through a single mechanism or to a medical treatment at a single point in time. Utilizing information across multiple exposures and time points can generate insights not possible in these simple, unrealistic settings. In this session, we will present some of the latest developments in causal inference research where methods not only adjust for the multiple exposure effects seen in practice, but also exploit them to improve decision-making in real-world settings.

      The presenters will discuss advancements in complex causal inference settings where there is interference, unmeasured confounding, and changing population or patient characteristics across time and intervention status. In policy evaluations with interference, one must consider that the outcome of one unit is a function of the intervention status of multiple units to identify and fully understand the comprehensive effects of a policy. In such settings, the population of units receiving the intervention may differ from that on which the outcomes are measured or the effects on one region may not be transportable to a new region where geographical characteristics induce a different set of indirect policy exposures than in the study population. In medicine, treatment effects from observational studies are generally estimated and interpreted as if there is no unmeasured confounding. However, it is possible to derive treatment policies that leverage multiple decision-points to create “superoptimal” regimes in the presence of unmeasured confounding or to exploit multiple exposures on the same outcome and create a “synthetic instrument” to address unmeasured confounding. The presenters will demonstrate the use and significance of such methodology with important applications including the evaluation of air pollution, nutritional policies, obesity treatments, and other clinical treatments.

      Proposed Speakers & Discussant

      Mats Stensrud, Ecole Polytechnique Federale de Lausanne Faculte des Sciences et Techniques de l'Ingenieur (Switzerland)

      • Optimal Sequential Treatment Regimes in Settings with Unmeasured Confounding

      Fabrizia Mealli, Universita degli Studi di Firenze Scuola di Scienze Matematiche Fisiche e Naturali (Italy)

      • Causal Inference in the Presence of Interference when Intervention and Outcome Populations Differ

      Gary Hettinger, University of Pennsylvania Perelman School of Medicine (USA)

      • Transporting Policy Effects in the Presence of Heterogeneous Direct and Indirect Exposures using a Difference-in-Differences Framework

      Linbo Wang, University of Toronto (Canada)

      • The Synthetic Instrument: From Sparse Association to Sparse Causation by Utilizing Multiple Exposures for Unmeasured Confounding

      Advanced Data Science Approaches for Challenges in Large Scale Biomedical Data

      Session Chairs: Ran Dai, University of Nebraska Medical Center (USA)



       In an era defined by accelerating technological advances and an explosion of data, the intersection of big data and biomedical research is not just an opportunity—it's an imperative. Harnessing advanced data science approaches to understand and solve the challenges inherent in large-scale biomedical data offers a radical paradigm shift. From genomics to metabolomics, from electronic health records to medical imaging data, all data we collect holds the potential to revolutionize our understanding of life sciences. On the other hand, these large-scale biomedical data also bring us new data challenges with their potential large sample sizes, high dimensionality, complex structures and heterogeneity. In addition, we are seeking for more meaningful relationships than associations, causal inference results are highly desired. The speakers will present novel data science methodologies in transfer learning, causal inference and federated learning; as well as the application to large scale biomedical data such as genetics data, electronic health record data and imaging data.

      Proposed Speakers & Discussant

      Hongzhe Li, University of Pennsylvania Perelman School of Medicine (USA)

      • Methods and applications of transfer learning in biomedical research

      Ali Shojaie, University of Washington (USA)

      • Statistical Machine Learning for Analyzing the Brain Connectome

      Mladen Kolar, University of Chicago Booth School of Business (USA)

      • Confidence sets for causal discovery

      Ran Dai, University of Nebraska Medical Center (USA)

      • Controlling FDR in selecting simultaneous signals from multiple data sources with application to the National Covid Collaborative Cohort data

      Recent Advances in Statistical Methods for Microbiome Data

      Session Chair: Gen Li, University of Michigan (USA)



      The human microbiome plays a critical role in human health and disease. Modern technology enables the cost-effective acquisition of high-throughput, high-dimensional microbiome data. Microbiome data have unique features that pose significant challenges to statistical analysis. For example, data are usually normalized as relative abundances in a simplex. Most standard analytical methods are not directly applicable.<div>Moreover, data are typically inflated with zero, highly skewed (i.e., dominated by a few microbes), and over-dispersed. A tree structure between variables also exists that captures the phylogenetic or taxonomic information of the microbiome. Rigorous modeling of microbiome compositional data is challenging, and improper analysis may lead to low reproducibility and inconclusive results. To conquer these challenges, new statistical methods for microbiome data are burgeoning. The proposed session invites an excellent group of researchers and focuses on recent developments. The talks will address important issues such as association analysis, multiple testing, and integration of microbiome data with other omics and will have a profound public health impact. The session will raise awareness and stimulate research interests in the area.

      Proposed Speakers & Discussant

      Xiang Zhan, Peking University (China)

      • FDR-Controlled Variable Selection in Quantile Regression with High-dimensional Compositional Covariates

      Siyuan Ma, Vanderbilt University (USA)

      • Modelling the Joint Distribution of Compositional Microbiome Data

      Ekaterina Smirnova, Virginia Commonwealth University (USA)

      • Novel network-based models for multi-cohort longitudinal integrative studies

      Discussant: Gen Li, University of Michigan (USA)


      Modeling Environmental Chemical Mixtures and Health Outcomes

      Session Chair: Zhen Chen, National Institutes of Health (USA)



      Our proposal aims to highlight the development and application of novel methodologies for modeling chemical mixtures in environmental health research. The methods we will present encompass a machine learning approach for selecting exposure interactions, power and sample size analysis for studying exposure mixtures, a weighted quantile sum approach, and a Bayesian latent functional model. These methodologies are specifically designed and applied to environmental health datasets. A discussant will describe the latest research in modeling chemical mixtures and provide commentary on the above talks.

      Traditional approaches to environmental health have focused on assessing the risks associated with individual chemicals, whcih may fail to capture the complexity of our daily experiences, as we are jointly exposed to multiple chemicals known as chemical "mixtures." Accumulating evidence suggests that exposure to chemical mixtures poses a greater risk to health compared to exposure to a single chemical. Rencent advancements in exposure assessment technology have enabled researchers to measure the internal and external chemical and biomarker exposures that exist in high-dimensional spaces. Yet, there are still gaps in the statistical methods used in environmental health research. It is desired that methodologies can effectively characterize chemical mixtures by considering extreme dose responses and identifying similarities in exposure profiles within a family. Furthermore, it is crucial to identify synergistic relationships between individual chemical exposures, estimate the burden of exposure, and account for the complexities associated with multiple exposures. Our speakers will present their innovative methodologies to tackle these research questions and other related concerns. By addressing these gaps in statistical methods, we aim to advance our understanding of the health effects resulting from chemical mixtures and contribute valuable insights to the field of environmental health research.

      Proposed Speakers & Discussant

      Stefano Renzetti, Universita degli Studi di Brescia (Italy)

      • A Weighted Quantile Sum (WQS) approach to assess environmental mixtures effect on human health

      Zhen Chen, National Institutes of Health (USA)

      • A variance-based approach to selecting interactions in chemical mixtures modeling

      Paul Albert, National Institutes of Health (USA)

      • A Latent Functional Approach for Modeling the Effects of Multi-dimensional Biomarker Exposures on Disease Risk Prediction

      Phuc Nguyen, University of Technology Sydney (Australia) 

      • Power Analysis of Exposure Mixture Studies via Monte Carlo Simulations

      Discussant: Louise Ryan, University of Technology Sydney (Australia) 


      Simulation Studies for Good Practice in Biostatistics: The STRATOS Perspective

      Session Chair: Cécile Proust-Lima, Bordeaux Population Health Research Center, Inserm (France)



      The primary aim of the STRATOS initiative (STRengthening Analytical Thinking for Observational Studies) is to promote appropriate methods and accurate interpretations in statistical analyses of observational studies. Simulation studies are absolutely essential for achieving this goal. We propose to give an overview of the thoughts on simulation studies and their applications within the STRATOS initiative with:

      - An introduction to the typical four phases of methodological research in biostatistics in analogy to the well-known phases of clinical research in drug development, and the different roles of simulation in each phase.

      - A discussion about simulation strategies in missing data research. When evaluating analytic approaches for handling missing data, one must first generate complete data and then simulate missing values. Different simulation approaches will be discussed with the pros and cons outlined, and some guidance will be given on when one approach may be favored over another.

      - Two examples of simulation studies designed to illustrate potential biases induced by measurement and classification errors and provide recommendations for epidemiological studies: (1) concerning the inclusion of time-varying covariates measured at sparse times with error in survival models; (2) concerning the use of predetermined latent classes in secondary statistical analyses.

      - A discussion of the concept of neutral simulations from the open science perspective. When comparing the performance of a newly proposed method to existing alternatives in a simulation study, there are several factors that may bias the results in favor of the former: besides the apparent wish to present the proposed method in a favorable light, the authors will, in general, have more expertise with the method they propose and they may naturally imagine a data generating mechanism that is in accordance with this method. Open science practices can address the resulting lack of neutrality in simulation studies.

      - The presentation of a novel design of collaborative simulations exemplified through the comparison of three methods for estimating non-linear relationships between an outcome and a covariate measured with error. The design involves a data generation team who is in charge of generating the simulation data, and one separate team per method who estimates at best the outcome-covariate relationship and returns the results to the generation team for an impartial evaluation of the results.

        Proposed Speakers

        Georg Heinze, Medical University of Vienna (Austria)

        • Phases of methodological research in biostatistics—Simulation as a tool to build the evidence base for new methods

        Katherine Lee, Murdoch Children's Research Institute & University of Melbourne (Australia)

        • Designing simulation studies for evaluating missing data approaches

        Cécile Proust-Lima, Bordeaux Population Health Research Center, Inserm, (France)

        • Promoting good practice in handling measurement error and misclassification using simulations: two case examples

        Sabine Hoffmann, Ludwig Maximilian University of Munich (Germany)

        • Improving the neutrality of simulation studies through open science practices

        Laurence Freedman, Gertner Institute for Epidemiology and Health Policy Research (Israel)

        • A simulation study to compare methods of estimating non-linear relationships between an outcome and a covariate measured with error

        Analyzing Survival or EHR Data: Challenges, Estimation, and Deep Learning Approaches

        Session Chair: Grace Yi, Western University (Canada)



        Research on survival analysis has enticed extensive interest over the past four decades, and many modelling and inference methods have been developed to handle survival data with various features. However, with the advent of electronic health records and the abundance of big data sources, traditional methods may become cumbersome. Analysis of large-scale and complex survival data poses significant challenges that require innovative methods to overcome.

        This session brings together three distinguished experts to discuss the latest innovations addressing the challenges posed by those data. Jane-Ling Wang (UC Davis, USA) will deliver a talk on leveraging deep learning techniques to analyze event time measurements with censoring. She will address the interpretability and dimensionality limitations often associated with deep learning models. Ingrid Van Keilegom (KU Leuven, Belgium) will focus on estimating causal effects in the presence of endogeneity and right censoring. Her talk will introduce an instrumental variable approach and provide identification conditions for nonparametric estimation. Malka Gorfine (Tel Aviv University, Israel) will discuss the analysis of electronic health record (EHR) data, which are collected without specific phenotypes in mind. She will emphasize the importance of accounting for prevalent, incident, and censored observations, and present an approach to incorporate this information into the analysis.

        The proposed session is expected to generate great interest among researchers and practitioners in the field of survival analysis. It serves as a timely platform to exchange new research results, foster the discussion of emerging methodologies, and present fresh perspectives. By bringing together experts from diverse backgrounds and geographical locations, the session aims to enhance the attendees' deep understanding of the challenges inherent in survival analysis, and probably ignite new research methods and collaborations. As the field continues to evolve and confront the complexities of big data, we hope the insights shared in this session may spark creativity and facilitate the development of innovative methods that can effectively analyze complex data in the era of big data. Such advancements have the potential to significantly improve decision-making across various domains, including healthcare, epidemiology, and beyond.

          Proposed Speakers & Discussant

          Jane-Ling Wang, University of California, Davis (USA)

          • Hypothesis Testing for the Deep Cox Model

          Ingrid Van Keilegom, KU Leuven (Belgium)

          • Instrumental variable estimation of dynamic treatment effects on a survival outcome

          Malka Gorfine, Tel Aviv University (Israel)

          • Unlocking Prevalent Information in EHRs - a Pairwise Pseudo-likelihood Approach

          Grace Yi, Western University (Canada)

          • Graphical Proportional Hazards Measurement Error Models

          Innovative Theory and Applications of Hidden Markov Models for Longitudinal Data in Medicine

          Session Chair: Paul Albert, National Cancer Institute Division of Cancer Epidemiology and Genetics (USA)



          The use of hidden Markov models (HMMs) to describe the natural history of disease continues to be a fruitful area for methodological and applications oriented work. We have assembled an exciting set of talks that describe new theory and novel applications of HMMs that will have direct relevance to IBS members. The methodological contributions include extensions of HMMs to incorporate heterogeneity in longitudinal data, Bayesian methodology for estimating model parameters as well as the number of states, and the modeling of partially observed states where the observed state may not be directly measured, but only be that the process is in one of a series of states. Each of the talks present different ways to model heterogeneity of the HMM across individuals. The talks will focus on examples from different medical areas including respiratory disease, natural history of Alzheimer's disease, and the screening of cervical cancer.

            Proposed Speakers & Discussant

            Jordan Aron, Regents of the University of Minnesota (USA)

            • Hidden mover-stayer model for disease progression accounting for misclassified and partially observed diagnostic tests: Application to the natural history of human papillomavirus and cervical precancer

            Jonathan Williams, NC State University (USA)

            • A Bayesian Approach to Multistate Hidden Markov Models: Application to Dementia Progression

            Francesco Bartolucci, University of Perugia (Italy)

            • Parameterizations of transition matrices for hidden Markov models with covariates

            Xinyuan Song, The Chinese University of Hong Kong (Hong Kong)

            • A Bayesian double-penalization procedure for heterogeneous hidden Markov models

            Getting the Most Bang for Your Buck: Resource Preserving Study Design for Biostatistical Research

            Session Chair: Jonathan Schildcrout, Vanderbilt University Medical Center (USA)



            With the widening availability of clinical trials data, cohort study data, and electronic medical records there are enormous opportunities for medical and public health researchers to exploit existing resources to study novel scientific questions. In many cases, and often by design, covariate and outcome data are made available from a pre-existing study (e.g., a clinical trial); however, to address a novel hypothesis, available data must be augmented with biomarker data that is ascertained retrospectively. For example, one may need to assay stored biospecimen, originally collected at the time of study enrollment, to measure the novel biomarker. In many of these settings, study resources are limited and may only permit biomarker data collection on a subset of the original cohort. In such cases, efficient two-phase study designs (and data analysis procedures) allow researchers to use available outcome and covariate data (from the Phase 1 cohort) to identify an enriched subcohort (the Phase 2 cohort) that is most informative for estimating biomarker-outcome relationships. The goal of efficient two-phase study designs and analysis procedures is, to the extent possible, maximize precision under a fixed budget setting.

            This session will feature four speakers who will discuss efficient study designs and estimation procedures for ordinal and continuous response data. The first two talks will focus on efficient two-phase study designs for ordinal response data. Jonathan Schildcrout will introduce novel implementation of two-phase designs and analyses for scalar, ordinal response data. Chiara Di Gravio will then extend designs for ordinal data to the longitudinal ordinal data setting to estimate associations that evolve over time. Peter Mueller will then discuss two phase designs and novel Bayesian analysis procedures for a relatively new class of semiparametric generalized linear models (SPGLM) that extends exponential family models to permit non-standard response distributions. Paul Rathouz will then extend wo-phase designs for the SPGLM to address settings when multiple outcomes are of interest simultaneously.

              Proposed Speakers

              Jonathan Schildcrout, Vanderbilt University Medical Center (USA)

              • Two phase study designs for ordinal response data

              Chiara Di Gravio, Imperial College London (UK)

              • Analysis of Ordinal Longitudinal Data under case-control sampling: Studying the associations between glycocalyx degradation and mortality in critically ill patients

              Peter Mueller, The University of Texas at Austin (USA)

              • Two phase designs and Bayesian analysis for semiparametric generalized linear models

              Paul Rathouz, Dell Seton Medical Center at The University of Texas (USA)

              • Multi-outcome dependent sampling designs with semiparametric generalized linear models

              Recent Advances in Statistical Image Analysis in Biomedicine

              Session Chair: Cheng Cheng, St. Jude Children's Research Hospital (USA)



              The advances in both imaging and computing technologies have allowed biomedical researchers to obtain image data on a wide scale ranging from a single cell to the entire body. The wide scale and modalities have allowed the formation of much more diverse and deeper scientific questions, yet at the same time created many methodological challenges in analysis. It is important to timely review in diverse areas how some challenges may have been met and those that still remain. With four diverse presentations, we aim at stimulating conversations and foster collaborations whereby further advance the field of statistical image analysis.
              Signal co-localization in super high-resolution single-cell images represents the spatial co-distribution of proteins that critically affects cell function or survival. Dr. Sherry Liu will first speak on a novel unbiased quantification of signal co-localization in super high-resolution images. Then Dr. Ting Li will present a novel framework for image reconstruction and simulation intimately connected to ReLU activated neural networks. Application to an analysis of brain MRI data from UK Biobank to study the aging process in human brains has produced intriguing discoveries. Image analysis in oncology has gone far beyond diagnostics. Drs. Cai Li and Yimei Li will present a work on statistical models for neurocognitive outcomes after treatment in children with medulloblastoma. In this study longitudinal brain neuroimages indicate the treatment effect on changes in brain that may intern affect neurocognitive ability. This setting immediately creates a mediation analysis problem where the image mediator is of ultra-high dimensionality and observed longitudinally. A novel random field approach will be presented. Brain MRI is an effective tool to study the relationship between brain regions and intelligence, but the dimensionality of high-resolution images poses a significant challenge. We will close with a presentation by the Susan Dwight Bliss Professor of Biostatistics Dr. Heping Zhang on a novel approach using tensor quantile regression models to effectively deal with the ultra-high dimensionality. Application to the data collected from The Human Connectome Project has provided a few new findings.
              These four talks represent well the frontier of statistical image analysis, and will substantially help move this field forward.

                Proposed Speakers

                Cheng Cheng, St. Jude Children's Research Hospital (USA)

                • Recent Advances in Statistical Image Analysis in Biomedicine

                Xueyan Liu, University of New Orleans (USA)

                • Unbiased and robust analysis of co-localization in super-resolution images

                Ting Li, The Hong Kong Polytechnic University Faculty of Engineering (USA)

                • Conditional Stochastic Interpolation: A New Approach to Conditional Sampling

                Yimei Li, St. Jude Children's Research Hospital (USA)

                • Imaging Mediation Analysis for Longitudinal Outcomes

                Cai Li, St. Jude Children's Research Hospital (USA)

                • Imaging Mediation Analysis for Longitudinal Outcomes

                Heping Zhang, Yale University (USA)

                • Tensor Quantile Regression for Neuroimage Study of Human Intelligence

                Integrated Analysis Methods for Multiple Sources of High-Dimensional Data

                Session Chair: Anna Eames Seffernick, St. Jude Children's Research Hospital (USA)



                As technology improves and costs decrease, multiple sources of high-dimensional (high-throughput) data for a single study are increasingly available. This is often called multi-omics or multi-view data. While each data type can be analyzed separately, this limits the insight that the data can provide. Integrated analysis methods are statistically difficult, due to the high-dimensional, heterogeneous nature of the data, diversity of data modalities, and missingness present in heterogeneous patient samples (PMID: 36929070). However, these integration methods are essential to further our biological understanding of the human genome and of complex diseases such as cancer, human immunodeficiency virus (HIV), and cardiovascular disease (CVD).
                Integrated analysis methods for multi-view data increase power by combining data sources. They also allow for improved understanding of individual and combined contributions of high-dimensional data to classification or association with clinical outcomes. Identification of genes, pathways, and/or networks with multiple associated data types may provide more insight into biological mechanisms of disease. Additionally, these identified features might be useful diagnostic or prognostic biomarkers or targets for novel therapeutics.
                The integrated analysis of multi-view data is a rapidly growing area of statistical research. Advances have been made in dimension reduction and visualization (PMID: 33739448, PMID: 23745156, PMID: 31444786), clustering (PMID: 31999549, PMID: 36929074), classification (PMID: 30657866), association (PMID: 28482123, PMID: 23142963 ), variable selection (PMID: 29099853), and prediction (PMID: 28747816). These methods have been developed in many application areas such as pediatric cancer genomics (PMID: 19528086, PMID: 27766934), single-cell sequencing and spatial transcriptomics (PMID: 35258565), neuroimaging and genetics for drug dependence (PMID: 26484829), and environmental epidemiology (PMID: 29947894).
                This session will have four speakers present integrated analysis methods they have developed in short talks with time for audience questions. These experts have diverse statistical and application areas within the multi-view data field. This session will raise attendees’ awareness of the importance and utility of integrated analysis methods for multi-view data.

                  Proposed Speakers 

                  Anna Seffernick, St Jude Children's Research Hospital (USA)

                  • BEAM: Bootstrap Evaluation of Association Matrices to Integrate Multiple Source of Omics Data with Multiple Clinical Endpoints

                  Sandra Safo, University of Minnesota Twin Cities (USA)

                  • Multi-task Deep JIVE: A Multi-task Deep Learning Method for Joint and Individual Variation Explained with Feature Selection

                  Qianxing (Quincy) Mo, Moffitt Cancer Center (USA)

                  • Molecular cancer subtype discovery by integrative clustering analysis

                  Veera Baladandayuthapni, University of Michigan (USA)

                  • Bayesian strategies for multi-study integration using biological hierarchies

                  Bayesian Methods in the Design and Analysis of Clinical Trials

                  Session Chair: Shirin Golchi, McGill University (Canada)

                  SESSION INFORMATION


                  Traditionally, statistical methods in design and analysis of clinical trials have been predominantly frequentist. Despite the increasing popularity of Bayesian methods in clinical trials, the development and adoption of Bayesian methodology and computational techniques is faced with practical and regulatory challenges. It is therefore important to promote innovative Bayesian methodology that has proven to be advantageous in many respects in clinical trials.
                  Particularly, a Bayesian framework facilitates: Inference with small sample size: decision criteria in Bayesian clinical trials commonly rely on posterior probabilities which, unlike p-values that are obtained based on approximate large sample distribution of the test statistics, remain valid and interpretable regardless of the sample size; Sequential design and analysis in adaptive trials: the analysis results at interim analysis are obtained by updating the posterior distribution from the last interim point; Incorporating external information in the design and analysis: information from historical controls or real-world data can be incorporated via the prior distribution.
                  These advantages have resulted in common use of Bayesian methods in certain disease areas such as oncology and rare diseases. While these advantages hold regardless of the context and in many cases can result in more efficient designs as well as more informative, reliable and interpretable analyses, the broad use of Bayesian methods is mainly restricted to cases where conventional trials fail to achieve the desired precision and power. The main obstacles in broad adoption of Bayesian methodology in trials are inaccessibility of advanced Bayesian methodology and the requirement of assessing frequentist operating characteristics of Bayesian decision procedures.
                  The proposed session will highlight the advantages of employing Bayesian methodology in clinical trials and innovative developments to address the challenges. This session will bring together leading experts in Bayesian clinical trials. The proposed speakers are diverse with respect to career stage (1 senior researcher, 2 mid-career researchers, and 1 junior researcher), gender (2 women and 2 men) and geographical region representation (ENAR, WNAR, and ANed).

                    Proposed Speakers 

                    J. Jack Lee, The University of Texas MD Anderson Cancer (USA)

                    • Bayesian Model-Assisted Designs for Dose Optimization in Oncology Drug Development

                    Kelley Kidwell, University of Michigan (USA)

                    • Advancing rare disease clinical trials via Bayesian design and analysis

                    Joost van Rosmalen, Erasmus MC (Netherlands)

                    • Assessment of frequentist operating characteristics with historical controls

                    Shirin Golchi, McGill University (Canada)

                    • Modeling the sampling distribution of test statistics in Bayesian clinical trials

                    Prediction Under Hypothetical Interventions for Medical Decision-making

                    Session Chair: Ruth Keogh,  London School of Hygiene and Tropical Medicine (UK)

                    SESSION INFORMATION


                    Standard clinical risk prediction models aim to predict a person’s risk of an outcome (e.g. mortality) given their observed characteristics. In clinical care it is often of interest to use risk predictions to inform whether a person should initiate a particular treatment. However, when standard clinical prediction models are developed in a population in which patients follow different treatment strategies, they may not be suitable for informing treatment decisions. Such decisions should instead ideally be based on predictions of what an individual’s risk of the outcome would be under the different treatment options under consideration. We refer to this as “prediction under hypothetical interventions”. Models for prediction under hypothetical interventions provide estimates of a person’s risk of an outcome if they were to follow a particular treatment strategy, taking into account patient characteristics that are predictive of the outcome. Predictions under hypothetical interventions are already in use to inform treatment decision-making in the cancer context and are likely to be increasingly used in clinical practice to inform personalised or stratified medicine.

                    Developing models or algorithms that provide predictions under hypothetical interventions requires concepts and tools from both causal inference and prediction. Innovations in methodology for developing such models are rapidly emerging in the statistical, causal inference and machine learning/AI literature. Development of predictions under hypothetical interventions requires data on interventions of interest alongside individual prognostic factors for the outcome. Such data may come from randomized trials, but large scale longitudinal observational data, such as arising from routinely collected healthcare records, are increasingly being considered for this task.

                    This session will include three speakers and a discussant. The session will discuss methods for development of prediction under hypothetical interventions from the perspectives of both causal inference and machine learning/AI. Example applications will be referred to, including in the context of liver transplantation and acute kidney injury. The discussion will focus on the strengths and limitations of different approaches, as well as discussing the challenges of validation and quantification of uncertainty.

                      Proposed Speakers & Discussant

                      Hein Putter, Leids Universitair Medisch Centrum (Netherlands)

                      • Sequential prediction of survival with and without liver transplantation

                      Pawel Morzywolek, Universiteit Gent (Belgium)

                      • Orthogonal learners for prediction under hypothetical interventions, with application in acute kidney injury

                      Amanda Coston, Carnegie Mellon University (USA)

                      • Doubly robust methods for counterfactual prediction and counterfactual evaluation

                      Discussant: Karla Diaz-Ordaz, University College London (UK)


                      Statistical Machine Learning for Neuroimaging and Brain Connectivity Analysis

                      Session Chair: Ali Shojaie, University of Washington (USA)

                      SESSION INFORMATION


                      Brain imaging and monitoring devices are increasingly used to study cognitive processes and their aberrations in neurodegenerative diseases. In addition to investigating changes in brain activities at different spatial resolutions, from individual voxels to regions of interest (ROIs), neuroscientists increasingly use these data to interrogate the brain connectome, i.e., the comprehensive map of neural connections in the brain.

                      This session features three invited talks by leading experts in statistical machine learning methods for analyzing the brain connectome using various imaging and monitoring modalities. The speakers, who have confirmed their availability for the conference, have been selected to represent diverse backgrounds, including gender, race/ethnicity and geographic locations (US and international) and career stages (associate and full professors). The topics covered range from dimension reduction and tensor methods for brain networks, to inferring Granger causality interactions in brain (also known as effective connectivity). The session will conclude with a discussion of the presented papers in the context of the broader work in this area. Speaker affiliations and talk titles are given below.

                        Proposed Speakers & Discussant

                        Genevera Allen, Rice University (USA)

                        • Joint Semi-Symmetric Tensor PCA for Integrating Multi-modal Populations of Networks

                        Moo Chung, University of Wisconsin-Madison (USA)

                        • Topological Embedding of Dynamic Brain Networks in Twins

                        Hernando Ombao, King Abdullah University of Science and Technology (Saudi Arabia)

                        • Spectral Causation Entropy

                        Recent Advances in the Design and Analysis of Studies Reliant on Error-prone Data

                        Session Chair: Pamela A Shaw, Kaiser Permanente Washington Health Research Institute (USA)

                        SESSION INFORMATION


                        There is great interest in using routinely collected data, such as electronic health record (EHR) data, as a cost-effective resource to support biomedical research. These types of data enable investigations in real-world, practical settings with large, inexpensive datasets. However, there can be serious concerns with the use of these types of data due to the fact that these data can be prone to error. For example, electronic health records are collected primarily for clinical and/or billing purposes, and not for research. Routine laboratory tests may have imperfect sensitivity and specificity. Readily available data can also be missing important information, and results based on the available data can be biased if analyses are not adjusted for the inadequacies of the observed data. In some settings, validation or gold standard data may be available, but due to expense, would only be practical to obtain on a (phase 2) subset. Such data are necessary to allow for correct inference in the presence of measurement error, which in turn makes efficient validation (phase 2) study design and analyses methods an imperative.

                        In this session, we present statistical innovations that improve analytical studies reliant on error-prone data. Sarah Lotspeich will apply a measurement error framework to assess the associations between various health outcomes (coronary heart disease, diabetes, high blood pressure, and obesity) and neighborhood-level healthy food access, where information on food access is based on an easy to obtain, but error-prone straight-line distance versus a gold-standard map-based distance available only on a subset. Yong Chen will describe a study evaluating the long-term effectiveness of the BNT162b2 vaccine against various strains of the SARS-CoV-2 virus in a diverse pediatric population using a novel trial emulation pipeline that accounts for possible misclassification bias in vaccine documentation in EHRs. Pamela Shaw will discuss considerations for efficient design of validation studies when there are multiple parameters/outcomes of interest. Sarah Ratcliffe will discuss the implication of the methods and research discussed in the context of the broader challenges of the analysis of error-prone data in biomedical settings.

                          Proposed Speakers 

                          Sarah Lotspeich, Wake Forest University (USA)

                          • Overcoming computational hurdles to quantify the impact of food access on health: A statistical approach

                          Yong Chen, University of Pennsylvania Perelman School of Medicine (USA)

                          • Estimation of real-world effectiveness of BNT162b2 in children and adolescents against infection and severe diseases with SARS-CoV-2: Causal inference under misclassification in treatment status

                          Pamela Shaw, Kaiser Permanente Washington Health Research Institute (USA)

                          • Efficient validation design targeting multiple outcomes for analysis of error prone data from electronic health records

                          Discussant: Sarah Ratcliffe, University of Virginia (USA)