Short Course Program

Four half-day and four full-day Short Course proposals have been selected for presentation just before the International Biometric Conference begins.  All Short Courses will take place on 8 December 2024.  These courses are taught by experienced professionals who are experts in their fields, so you do not want to miss out! 

Short Course Time Frames (Subject to change):
Full-Day Courses:  9:00 - 18:00
Morning Half-Day Courses:  9:00 - 13:00
Afternoon Half-Day Courses:  14:00 -18:00 

Full day courses

SC01 - Surrogate Endpoints in Clinical Trials: From Concept to Practice

 Geert Molenberghs & Ariel Alonso


Surrogate endpoints have gained increasing importance in clinical trials as they provide a shortcut to evaluate the efficacy of a treatment in a faster, more efficient, and cost-effective way. However, surrogate endpoints have also generated controversy due to their limitations, such as the lack of direct clinical relevance or the risk of false-positive results. Given the growing interest and debate surrounding surrogate endpoints, it is important to provide a comprehensive course that covers the fundamentals concepts, applications, and challenges of surrogate endpoints in clinical research.

The proposed course aims to provide participants with a deep understanding of the principles behind the evaluation and application of surrogate endpoints in clinical trials. The course will start by introducing the definition and purpose of surrogate endpoints, the regulatory context, and a historical perspective about their use and methodological validation. In addition, it will cover several modern approaches for their evaluation within the two main schools of thought in this area, namely, the causal inference and meta-analytic schools. The course will also address the numerical implementation of these advanced methods in R and SAS.

the course will cover methodology developed within the meta-analytic and causal inference schools and will have and intermediate level of difficulty. Participants should have some prior knowledge and experience in the following areas:
--Basic statistical theory and applications
--Basic knowledge of Meta-analysis and their application
--Basic knowledge about Causal inference methods
--Clinical trials

It would be helpful if participants also have experience using statistical software such as R or SAS for data analysis. Overall, the content covered will be specialized and applicable to researchers and professionals working in the field of clinical research and healthcare.

Learning Objectives
At the end of the course:
1. Participants will have a clear understanding of the potential and limitations related to the use of surrogate
endpoints in clinical research
2. Participants will get familiar with the most modern methodologies proposed for the evaluation of surrogate
3. Participants will get familiar with the use of surrogate endpoints in several relevant clinical domains
4. Participants will get familiar with SAS packages and R libraries that implement the state-of-the-art methods for
the evaluation of surrogate endpoints
It is recommended that participants bring their laptop with R and the Surrogate package installed.

About the Instructors
Geert Molenberghs is Professor of Biostatistics at the Universiteit Hasselt and KU Leuven in Belgium. He received theB.S. degree in mathematics (1988) and a Ph.D. in biostatistics (1993) from the Universiteit Antwerpen. Dr Molenberghs published methodological work on surrogate markers in clinical trials, categorical data, longitudinal data analysis, and on the analysis of non-response in clinical and epidemiological studies. He served as Joint Editor for Applied Statistics, Co-editor for Biometrics, Co-editor for Biostatistics, Series Editor
of Wiley Probability & Statistics, and Wiley StatsRef. He is currently Executive Editor of Biometrics. He acted and acts as Associate Editor for several journals and undertook numerous refereeing tasks (for journals, faculty member promotion, faculty member appointments, etc.). He was President of the International Biometric Society. He was elected Fellow of the American Statistical Association and received the Guy Medal in Bronze from the Royal Statistical Society. He has held visiting positions at the Harvard School of Public Health (Boston, MA). He is founding director of the Center for Statistics at Hasselt University and currently the director of the Interuniversity Institute for Biostatistics and statistical Bioinformatics, I-BioStat, a joint initiative of the Hasselt and Leuven universities. He published, as editor and author of several books on longitudinal data analysis, possibly subject to missingness (with Geert Verbeke) and surrogate endpoints.
He has (co-)taught nearly 200 short and longer courses on the topic in universities as well as industry, in Europe, North America, Latin America, and Australia. He received research funding from FWO, IWT, the EU (FP7), U.S. NIH, U.S. NSF, UHasselt, and KU Leuven. He is member of the Belgian Royal Academy of Medicine. Since the beginning of the SARS-CoV-2 induced pandemic, he has served as an advisor to the Belgian government and has been a member of several official scientific boards in his home country. He has also taken up roles in science communication to the general public in the context of the pandemic.

Ariel Alonso Abad holds a bachelor's and master's degree in mathematics from the University of Havana, Cuba, obtained in 1992 and 1997, respectively. He later pursued a master's and a PhD degree in Biostatistics from Hasselt University in Belgium, which he completed in 1998 and 2004, respectively. Prof. Alonso has extensive professional experience, having worked as a consultant at the National Coordination Center for Clinical Trials in Havana, Cuba, for over ten years, as an Assistant Professor at the University of Maastricht in the
Netherlands for four years, and as a post-doctoral researcher and part-time professor at Hasselt University for six years. He is currently an Associate Professor at KU Leuven in Belgium, where he teaches introductory and advanced courses in various programs, including the Master in Bioinformatics, the Master in Nursing and Sexology, and the Master in Data Science.

Prof. Alonso's research focuses on the evaluation of surrogate endpoints, and he has developed information-theoretic approaches within the meta-analytic and causal inference paradigms. He is also the co-author of the R package Surrogate, which implements advanced methods for the evaluation of surrogate endpoints. He has also conducted research on the evaluation of rating scales, information theory, and inference under misspecification for hierarchical models. Prof. Alonso has published approximately 100 scientific papers and is the main author of the book Applied Surrogate Endpoint Evaluation with SAS and R. He has supervised 11 PhD students and 24 master students and taught 13 short courses globally. He has participated in about 50 national and international scientific congresses and has been an invited speaker at 20 of them. Additionally, he has given more than 20 seminars at scientific institutions worldwide.


Joseph G. Ibrahim


This full-day short course is designed to give biostatisticians and data scientists a comprehensive overview of informative prior
elicitation from historical data, expert opinion, and other data sources, such as real-world data, prior predictions, estimates, and
summary statistics. We focus both on Bayesian design and analysis and examples will be presented for several types of applications such as clinical trials, observational studies, environmental studies as well as other areas in biomedical research.

The first part of the course gives a brief but broad overview of Bayesian inference, examining concepts of Bayesian design and
analysis such as i) Bayesian type 1 error and power, ii) calculation of posterior and predictive distributions, iii) MCMC sampling
methods, iv) fundamental concepts in informative and non-informative prior elicitation, v) Bayesian point and interval estimation, and vi) Bayesian hypothesis testing. These topics will be presented in a general context as well in several contexts in regression settings.

The second part of the course will focus broadly on advanced methods for informative prior elicitation, including i) informative
prior elicitation from historical data using the power prior (PP) and its variations including the normalized power prior, the partial borrowing power prior, the asymptotic power prior, and the scale transformed power prior (STRAPP).

This course requires knowledge of graduate-level courses in statistics / biostatistics. Specifically, we assume knowledge of linear models (LMs), generalized linear models (GLMs), longitudinal data analysis (e.g., generalized linear mixed models [GLMMs]), and the analysis of time-to-event data. No prior knowledge of Bayesian statistics is assumed.

Learning Objectives
After this course, students should obtain theoretical and practical knowledge of Bayesian statistics, including basic Stan programming skills and interface with the R programming language, allowing course-takers to implement their own methods in practice. Students will acquire fundamental knowledge of the most used historical data priors, including the strengths and weaknesses of each prior. They will acquire the nomenclature necessary to communicate findings using Bayesian methods (e.g., credible intervals, Bayesian power, Bayesian type I error rates).

A laptop would be helpful to follow along, but not strictly required.

About the Instructors

Dr. Ethan Alt is an Assistant Professor in the Department of Biostatistics at the University of North Carolina at Chapel Hill. His research interests include informative prior elicitation, incorporation of historical data, and Bayesian methods for the design and analysis of clinical trials. He received his PhD from the University of North Carolina at Chapel Hill. He is the author of several R packages including bayescopulareg, bmabasket, and hdbayes.

Dr. Joseph G. Ibrahim, Alumni Distinguished Professor of Biostatistics at the University of North Carolina at Chapel Hill, is principal investigator of two National Institutes of Health (NIH) grants for developing statistical methodology related to cancer, imaging, and genomics research. Dr. Ibrahim is the Director of the Biostatistics Core at UNC Lineberger Comprehensive Cancer Center. He is the biostatistical core leader of a Specialized Program of Research Excellence in breast cancer from NIH. Dr. Ibrahim's areas of research focus are Bayesian inference, missing data problems, cancer, and genomics. He received his PHD in statistics from the University of Minnesota in 1988.

With over 30 years of experience working in cancer clinical trials, Dr. Ibrahim directs the UNC Laboratory for Innovative Clinical Trials (LICT). He is also the Director of Graduate Studies in UNC’s Department o Biostatistics, as well as the Program Director of the cancer genomics training grant in the department. Dr. Ibrahim has published over 350 research papers, most in top statistical journals. He has published graduate-level books on Bayesian survival analysis and Bayesian computation. He teaches courses in Bayesian Statistics, Advanced Statistical Inference, Theory and Applications of Linear and Generalized Linear
Models, and Statistical Analysis with Missing Data.


Virginie Rondeau and Catherine Legrand


While Joint models become very common in different fields, such as medical research, literature on extending joint models outside a “classical” framework of a longitudinal biomarker and a survival time has exploded over the last decade, followed over the last years by an increased availability of softwares allowing to analyze more complex joint models.

We therefore offer in this course a broad overview over various available “innovative joint survival models” and in particular: considering joint models for recurrent events, joint models to validate surrogate endpoints, joint models for an excess of zero in the longitudinal part, joint models for a mixture of cured and uncured patients and joint models for an ordinal biomarker (with at least one survival endpoint). The objective of this course is to address with a practical perspective all these models with unified notations and level of explanation. The course will be illustrated with the analysis of several real-life datasets from medical research, considering existing R packages.

The level of the course is adapted to academic statisticians and applied statisticians in medical research (but will also be accessible to statisticians in other field of application); as well as graduate students in (bio)-statistics having already followed a course on classical survival analysis.

The two authors of this intensive course are specialist of the domain, with several international publications, books and a large experience of teaching.

    The course will provide a broad overview of various innovative joint survival models, with enough details to understand how to apply them and interpret the results, and also aim at good understanding of the estimation methods principle without entering too much in technical details.

    Therefore, the level of the course is adapted to academic statisticians and applied statisticians in medical research (but will also be accessible to statisticians in other field of application); as well as graduate students in (bio)-statistics having already followed a course on classical survival analysis.

    Participants should have (at least) some basic knowledge in the analysis of “classical” survival analysis, including in particular the concepts of (right-)censoring, the Kaplan-Meier estimator of the survival function, the logrank test and the proportional hazards model.
    Of course, the participants are expected to have some basic knowledge of statistical data analysis and inference, and in particular, already a good background on standard maximum likelihood theory and standard regression models for a continuous. This comes of course with minimum prerequisites in mathematics (matrix algebra, concept of limit, derivatives and integrals, ...). We also assume basic familiarity with the use of R in particular with regards to data import, manipulation, and standard analysis techniques for continuous and time-to-event endpoint.

    Learning Objectives
    The objective of this course is to master modern statistical methods and to master how and when applying them on
    real-world clinical data from different settings.

    This course will provide to the audience a broad overview of different up-to-date innovative extensions of the classical
    joint model and the context in which they are relevant. In particular, what to do when we have to face a non-gaussian
    longitudinal biomarker, several biomarkers or when a part of the population will never experience the event of interest
    (cure) or when we want to use a joint model to validate a surrogate markers.

    At the end of the course, the participants should be able to
    (i) recognize these situations and acknowledge the limitations of the classical approach in these situations,
    (ii) understand the important features, the estimation principle and how to apply various innovative joint models and
    make an informed choice about the different models/methods available,
    (iii) identify and use an appropriate R package to perform the analyses,
    (iv) correctly interpret the results of his/her analysis.

    For a participant whose objective is to pursue with more methodological research on one of the topic covered by the
    course, we think that this course will provide him/her with a good introduction to the topic and a good overview of the
    available estimation/fitting techniques. References for more detailed methodological descriptions of the models
    discussed will be provided.

    Books from the authors recommended, but not mandatory:
    1. D. Commenges, H. Jacqmin-Gadda, A. Amadou, P. Joly, B. Liquet, C. Proust-Lima, V. Rondeau, and R. Thiébaut.
    Dynamical Biostatistical Models, volume 86. CRC Press, 2015.
    2. T. Emura, S. Matsui, and V. Rondeau. Survival Analysis with Correlated Endpoints : Joint Frailty-Copula Models.
    JSS Research Series in Statistics, 2019.
    3. C. Legrand. Advanced Survival Models. Published March 23, 2021 by Chapman and Hall/CRC. ISBN
    9780367149673. 1st Edition - Catherine Legrand - Routledge (

    The material of this course is largely based on the content of these books.

    Recommended but not necessary. 

    About the Instructors

    Virginie Rondeau is the director of research in Biostatistics at the INSERM institute in Bordeaux (France) since 2015. Joint Models for recurrent events and competing risks (e.g. death) was her main research topic recently and the development of joint models for longitudinal markers and/or multiple times-to-event.

    Catherine Legrand  is Professor at the Institute of Statistics, Biostatistics and Actuarial Sciences (LIDAM/ISBA) of the Université catholique de Louvain (UCLouvain, Belgium). Her area of research includes survival data analysis with a particular focus on frailty models and cure models.


    Andrew Lawson



    R is commonly use now for advanced Biostatistical applications. Bayesian spatial and spatio-temporal modeling of health data is an important topic which can be addressed using tools in R.  This course is designed for those who want to cover mapping methods, and the use of a variety of software and variants in application to small area health data.  The course will include theoretical input, covering selected  Bayesian spatial models, but also practical elements and participants will be involved in hands-on in the use of R, BRugs, Nimble,  and CARBayes  in disease mapping applications. Both human and veterinary examples will be covered in the course as well as simple space-time modelling. Examples will range over county level respiratory cancer incidence (spatial and spatio-temporal) and influenza and Covid-19 space-time modeling in South Carolina.  The course would be suitable for those with some R experience, but limited experience of spatial modeling in health applications. A recent text on this topic is

    Lawson, A. B. (2021) Using R for Bayesian Spatial and Spatio-temporal Health Modeling, CRC Press has appeared and forms the bass of this course delivery.


    • Some experience of use of R in data handling and data processing
    • Some prior Bayesian modeling experience is beneficial but not essential
    • Some spatial data analysis experience beneficial but not essential.

    Learning Objectives

    • Familiarity with Bayesian spatial health models
    • Familiarity with R use in application to spatial health data
    • Some competencies in applying Bayesian spatial models via McMC to health data problems
    • Basic familiarity with Bayesian ST modeling


    Recommended participants bring a laptop.

    About the Instructor
    Andrew Lawson Professor of Biostatistics in the Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, College of Medicine, MUSC and is an MUSC Distinguished Professor Emeritus and ASA Fellow. His PhD was in Spatial Statistics from the University of St. Andrews, UK.

    He has over 200 journal papers on the subject of spatial epidemiology, spatial statistics and related areas. In addition to a number of book chapters, he is the author of 10 books in areas related to spatial epidemiology and health surveillance. The most recent of these is Lawson, A.B. et al (eds) (2016) Handbook of Spatial Epidemiology. CRC Press, New York, and in 2018 a 3rd edition of Bayesian Disease Mapping; hierarchical modeling in spatial epidemiology  CRC Press. In 2021. a new volume entitled Using R Bayesian Spatial and  Spatio-temporal Health modeling CRC Press appeared. He has acted as an advisor in disease mapping and risk assessment for the World Health Organization (WHO) and is founding editor of the Elsevier journal Spatial and Spatio-temporal Epidemiology.  Dr Lawson has delivered many short courses in different locations over the last 20 years on Bayesian Disease Mapping with OpenBUGS, INLA, and Nimble, and more general spatial epidemiology topics.
    Web site: 

    Half day courses

    SC05 - Causal inference in drug development: Applications and methods

    Tianmeng Lyu, Dong Xi, and Frank Bretz



    Causal thinking and related inference methods are gaining increasing prominence in global drug development in light of the addendum to the E9 guideline on 'Statistical principles in clinical trials' by the International Council of Harmonization (ICH, 2019) and the guideline on covariate adjustment by the U.S. Food and Drug Administration (FDA, 2023). These guidelines refer to terminology, concepts and methods from the causal inference literature, such as potential outcomes, principal stratification, non-collapsibility and standardization.

    Although widely underutilized in drug development, causal inference methods can add value in many settings, including those outlined in the two guidelines above, but also for the use of external control data through the target trial framework and understanding cause and effect in pharmacometric and pharmacovigilance applications. In this course, we focus on causal inference methods tailored to the challenges commonly encountered in randomized controlled trials occurring in drug development. We start with an introduction to the ICH E9(R1) estimand framework and basic causal inference concepts, followed by a detailed discussion of causal inference methods targeting hypothetical estimands, as well as conditional vs. marginal treatment effects. We illustrate the methods with case studies and provide code examples to facilitate implementation in practice.



    The participants should have basic knowledge of the fundamentals of statistics including experience with common data types (continuous, binary, time-to-event) as well as the associated models and estimation methods, such as maximum likelihood, general linear models, and Cox models. Knowledge in causal inference is a plus, but not mandated. Moreover, participants are expected to have basic knowledge of clinical trial methodology and should be familiar with concepts and terms such as bias, randomization, and blinding.

    The difficulty level of the course is intermediate, at a second-year graduate course level. The focus will not be on the theoretical derivations and properties of the statistical methods, but on their application in clinical trial settings.

    Learning Objectives

    The difficulty level of the course is intermediate, at a second-year graduate course level. The learning objectives are four-fold:

    1. to describe basic concepts of causal inference (e.g., potential outcomes, main assumptions, confounders) and appreciate the role of causal inference in randomized clinical trials;
    2. to identify and apply common estimation methods of causal effects relevant to clinical trials in drug development;
    3. to implement appropriate analyses in practical settings; and
    4. to get an overview of basic functionality in R to design and analyze clinical trials.

    About the Instructors

    Dr. Tianmeng Lyu is an Associate Director Statistical Consultant in the Statistical Methodology group at Novartis Pharmaceuticals Corporation, based in East Hanover, NJ, USA. She received her PhD in Biostatistics from University of Minnesota, Twin Cities in 2018 and joined Novartis after that. Her research interests include survival analysis, recurrent events, estimands and causal inference. She has supported several consulting and research projects on causal inference at Novartis and was the main contributor in several training initiatives on estimands and causal inference both within and outside Novartis.

    Dr. Dong Xi is a Statistical Advisor in the Biostatistics Innovation Group at Gilead Sciences. He has been supporting development and implementation of innovative statistical methodologies in multiplicity and causal inference for drug development in various therapeutic areas. Before joining Gilead, he was a statistician at Novartis, and he received his PhD in statistics from Northwestern University (USA).

    Prof. Frank Bretz is a Distinguished Quantitative Research Scientist at Novartis. He has supported the methodological development in various areas of biostatistics, including dose finding, multiple comparisons, and adaptive designs. Frank is currently holding Adjunct professorial positions at the Hannover Medical School (Germany) and the Medical University of Vienna (Austria). He was a core member of the ICH E9(R1) working group on “Estimands and sensitivity analysis in clinical trials”. Frank was an Executive Board member of the International Biometric Society (IBS) and served as the President of the IBS Austro-Swiss Region (IBS-ROeS). He is a recipient of the Susanne-Dahms-Medal from the IBS German Region (IBS-DR) and a Fellow of the American Statistical Association.

    SC06 - Statistical and machine learning for big geospatial data

    Dr. Abhirup Datta


    Geospatial data, routinely encountered in environmental health, climate sciences, disease epidemiology, forestry, and ecology, have traditionally been analyzed using statistical models built on foundations of stochastic processes. Increasingly, practitioners are adopting machine learning methods for geospatial analysis. Should the decades of development in spatial statistics be abandoned for black-box machine learning? This short course demonstrates the pitfalls of naive machine learning for geospatial data and offers a tutorial on state-of-the-art hybrid machine learning methods for geospatial data that leverage well-established spatial statistics principles.

    We review traditional geospatial approaches like the linear mixed model using Gaussian process (GP), and non-linear machine learning methods like random forests and neural networks. We then present hybrid approaches that embed non-linear machine learning within traditional geospatial mixed effect models, relaxing the stringent assumption of linearity, while preserving Gaussian process random effects and thereby retaining the interpretability, flexibility, and parsimony for estimation and prediction. We present recent methods with different choices of machine learning algorithms (random forests and neural networks) and different data types (continuous and binary), provide demonstrations with published software on real and simulated data. The tutorial will equip practitioners with a suite of hybrid machine learning methods and software tailored for geospatial analysis.

    The course will cover topics of various levels of difficulty and is intended for audience with diverse quantitative backgrounds. Much of the course will focus on application of the new hybrid geospatial machine learning methods using published software and live demonstration using data examples. This content will be accessible to practitioners from a wide range of fields (environmental health, climate sciences, disease epidemiology, forestry, ecology) interested in learning modern tools for geospatial analysis. A thorough review will be provided to introduce the audience to both geostatistics as well as popular machine learning methods. Some of the advanced materials, particularly, methodological, and computational details of the new hybrid machine learning algorithms, will be taught at the level of a graduate course for students in statistics, biostatistics, computer science or related fields.

    While the course will primarily focus upon practical modeling, computing and data analysis, short course participants will benefit from some prior understanding of mathematical statistics and linear algebra at the undergraduate or advanced undergraduate level. We will not assume any significant previous exposure to geospatial methods or machine learning algorithms, although students with basic knowledge of the area will certainly face a gentler learning curve. All the computational tools and environments will also be introduced as necessary in the course, but some experience with the R-programming language for statistical analysis will be helpful. Experience with GIS or any other specialized software is not required.

    Learning Objectives
    From this course, the participants will acquire:
    1. understanding of the basics of geospatial data analysis and visualization
    2. understanding the strengths of spatial linear mixed effect models (interpretability, predictive capability) and their limitations (assumption of linearity)
    3. familiarity with popular non-linear machine learning algorithms like random forests and neural networks and the perils of naive applications of these methods for spatially correlated data
    4. understanding state-of-the-art hybrid methods that embeds non-linear machine learning within spatial mixed models (random forests (RF-GLS) and neural networks (NN-GLS))
    5. familiarity with newly developed software for analyzing large geospatial datasets with RF-GLS and NN-GLS for different types of geospatial data

    This course is based on a 1.5 day course developed and presented internally at our company. No textbook is recommended or required, but it is recommended to be familiar with ICH E9 addendum.

    About the Instructor

    Dr. Abhi Datta is Associate Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health (JHSPH). He received his PhD in Biostatistics from the University of Minnesota in 2016 and was an Assistant Professor in the Department of Biostatistics at JHSPH from 2016-2021. Dr. Datta have published 45 peer-reviewed articles (23 as first or senior author). His publications have appeared in prestigious journals, including the Journal of the American Statistical Association, the Annals of Statistics, the Annals of Applied Statistics, Biometrika, Biometrics, Biostatistics, Atmospheric Environment, and the Proceedings of the National Academy of Sciences. His research as principal investigator has been supported by grants from the National Science Foundation (NSF), the Bill and Melinda Gates Foundation (BMGF), and the National Institutes of Environmental Health Sciences (NIEHS R01). Dr. Datta’s research has been recognized via multiple national and international awards including the Young Statistical Scientist Award (YSSA) by the International Indian Statistical Association, Abdel El-Shaarawi Early Investigator's Award by The International Environmetrics Society, and Early Investigator Award from the American Statistical Association Section on the Environment.

    SC07 - A practical course in difference-in-differences

    Laura Hatfield


    This course aims to provide participants with a solid grounding in methods for Difference-in-Differences (DID), a popular method for quasi-experimental causal inference in the social sciences.

    The course begins with a potential outcomes approach to constructing target estimands. Various approaches to construct the counterfactual will be discussed. Then we will explore DID methods in depth, including the required causal assumptions. Selecting appropriate comparison groups is crucial for ensuring the plausibility of these assumptions. The course will provide strategies for identifying suitable comparison groups, with special attention to matching, weighting, and regression approaches.

    Participants will learn how to align the estimation method with the target estimand. Analyses for staggered adoption will be discussed. We will also discuss inference, especially for small numbers of clusters and highlight approaches such as aggregation and permutation.
    For sensitivity analyses, we will cover non-inferiority/equivalence tests as an alternative to conventional parallel trends testing, placebo tests, event study plots, negatively correlated control groups, and worst-case differential trends. Relate methods such as synthetic controls, lagged dependent variables, and remixes of existing techniques, will also be introduced. The course concludes with a literature round-up, highlighting new developments and useful reviews.

    Attendees should be familiar with statistics on the level of a year-long graduate-level course in statistical
    modeling/inference, especially regression-based estimation and inference. Familiarity with basic concepts of causal
    inference (e.g., potential outcomes, confounding, target estimands, identification assumptions) will also be helpful. The
    course includes no programming, but the tools recommended to implement the techniques discussed in the course will
    primarily be in R.

    Learning Objectives
    This course is intended for researchers who want to be responsible users of difference-in-differences methods but do not have time to keep up with the deluge of new methods developments. At the end of the course, participants will be able to
    1. Formally define the target estimand and the required causal assumptions of a difference-in-differences study
    2. Choose plausible comparison groups and address potential confounding
    3. Estimate the causal target using methods that are compatible with the causal assumptions
    4. Perform statistical inference using flexible methods
    5. Conduct principled sensitivity analyses
    6. Be familiar with recent methodological developments

    Not needed.

    About the Instructor
    Laura Hatfield, PhD, is an associate professor of health care policy (biostatistics) in the Department of Health Care Policy at Harvard Medical School. Her methods research focuses on causal inference in non-randomized settings, especially using difference-in-differences, and quantifying variation in health care utilization, outcomes, and quality using clustering and hierarchical Bayesian models. Hatfield earned her BS in genetics from Iowa State University and her MS and PhD in biostatistics from the University of Minnesota. Dr. Hatfield developed and taught this short course for the Harvard Catalyst program in Feb 2023; more than 430 people registered for that course.

    SC08 - Statistical and Computational Methods for Microbiome Data Analysis

    Gen Li


    This short course is motivated by the transformative impact the microbiome holds for human health. Understanding its role has the potential to revolutionize precision medicine. With vast amounts of microbiome data generated from sequencing techniques and curated in public databases, there is an urgent need for appropriate analysis to gain biological insights. The burgeoning field of microbiome data analysis has seen numerous method developments.

    This course will provide a comprehensive overview of statistical and computational methods for microbiome data analysis. It will cover data acquisition, processing, normalization, and visualization using state-of-the-art analytic pipelines. Detailed presentations explore statistical and computational methods for tasks like differential abundance analysis, regression, generative models, and network analysis. We will also discuss emerging research topics like  longitudinal data analysis and data integration.

    Participants will gain a strong understanding of existing methods, future directions, and be able to perform basic microbiome data processing and analysis. The course emphasizes practical application by demonstrating the use of state-of-the-art software tools. By empowering participants to unlock the potential of microbiome data, this course aims to shape the future of microbiome research and its impact on human health.

    The course aims to be inclusive and welcomes researchers with diverse backgrounds and research experience. Its primary focus is to provide an overview of the current state of method development in microbiome research, rather than delving into technical details of specific topics or methods. Prior experience or knowledge in microbiome research is not necessary to participate.

    However, participants should have a basic understanding of multivariate analysis and R programming. Familiarity with high-dimensional data analysis is advantageous, although not mandatory. The course is designed to cater to a wide range of individuals interested in microbiome research, ensuring accessibility and encouraging interdisciplinary collaboration.

    Learning Objectives
    The course has the following learning objectives:

    1. Comprehensive Understanding of Microbiome Research. Participants will develop a solid understanding of the key concepts and foundational knowledge in microbiome research. By grasping these fundamental aspects, participants will be able to navigate the intricacies of working with microbiome data.

    2. Knowledge of State-of-the-Art Methods. Participants will become familiar with the latest statistical and computational methods used in microbiome research. By gaining knowledge of cutting-edge techniques, participants will be equipped with the tools to extract meaningful insights from microbiome data.

    3. Awareness of Advanced and Trending Research Topics. Participants will gain an understanding of the challenges, opportunities, and future directions in microbiome research.

    4. Practical Skills in Microbiome Data Analysis. The course will provide hands-on experience with implementing microbiome data analysis using R and relevant software packages. Participants will work with real data examples, learning how to preprocess, analyze, and interpret microbiome data. By acquiring these practical skills, participants will gain the confidence to independently conduct basic analysis of microbiome data.

    About the Instructor

    Dr. Gen Li, is a tenured associate professor in the Department of Biostatistics at the University of Michigan. He has extensive experience in microbiome research. He has developed novel dimension reduction, association analysis, cluster analysis, and network analysis methods for microbiome data. His microbiome-related work has been published in top statistical journals (e.g., Biometrics and the Annals of Applied Statistics) and scientific journals (e.g., American Journal of Respiratory and Critical Care Medicine). His methodological research has been supported by several NIH-funded grants.

    Dr. Li is also a highly regarded instructor with a strong teaching track record. He has taught graduate-level courses at
    Columbia University and the University of Michigan, consistently receiving excellent evaluations. He recently taught a
    well-received short course on microbiome data analysis at the 2023 International Chinese Statistical Association (ICSA)
    Applied Statistics Symposium held in Ann Abor, Michigan, on June 11, 2023.
    With Dr. Li's expertise and teaching proficiency, participants can expect to receive high-quality instruction and gain
    valuable insights into microbiome research during the course.