You are here

Spring 2025 Louisiana ASA Chapter Meeting

Monday 19 May 2025
Lions' Eye Center room 632
Louisiana State University Health Science Center
2020 Gravier St, New Orleans, Louisiana
New Orleans, Louisiana

Titles and Abstracts
(final - last updated 6 May 2025)

Two-sample bi-directional causality between two traits with some invalid IVs in both directions using GWAS
Siyi Chen
Department of Biostatistics and Data Science
Louisiana State University Health Science Center
New Orleans, Louisiana

Mendelian randomization (MR) is a widely used method for assessing causal relationships between risk factors and outcomes using genetic variants as instrumental variables (IVs). While traditional MR assumes uni-directional causality, bi-directional MR aims to identify the true causal direction. In uni-directional MR, invalid IVs due to pleiotropic effects can violate MR assumptions and introduce significant biases. In bi-directional MR, while traditional MR can be performed separately for each direction, the presence of invalid IVs poses even greater challenges than in the one-directional case. Therefore in this research, we introduce a new Bi-directional MR method incorporating StepWise selection (Bidir-SW) designed to address these challenges. Our approach leverages public Genome-wide association study (GWAS) datasets for two traits and uses model selection criteria (such as BIC and AIC) to identify invalid IVs iteratively by stepwise model selection. This method accounts for the potential bi-directional nature of causality and the presence of common invalid IVs for both directions, even if only GWAS summary statistics are provided. Through simulation studies, we demonstrate that our method outperforms traditional MR techniques, such as MR-Egger and IVW, with uncorrelated SNPs. We also provide additional simulations to compare our approach with existing transcriptome-wide association study (TWAS) to show its effectiveness. Finally, we apply the proposed method to genetic traits such as CRP levels and BMI, to explore possible bi-directional relationships among these traits. We also used the proposed method to discover causal protein biomarkers for these traits. Our findings suggest that the Bidir-SW approach is a powerful tool for bi-directional MR or TWAS, which can provide a valuable framework for future genetic epidemiology studies.

A multi-bin rarefaction method for evaluating alpha diversities in TCR sequencing data
Mo Li
Department of Mathematics
University of Louisiana at Lafayette
Lafayette, Louisiana

T cell receptors (TCRs) constitute a major component of our adaptive immune system, governing the recognition and response to internal and external antigens. Studying the TCR repertoire diversity via sequencing technology is critical for a deeper understanding of immune dynamics. However, library sizes differ substantially across samples, hindering the accurate estimation/comparisons of alpha diversities. To address this, researchers frequently use an overall rarefying approach in which all samples are sub-sampled to an even depth. Despite its pervasive application, its efficacy has never been rigorously assessed. In this paper, we develop an innovative “multi-bin” rarefaction approach that can effectively control the confounding effect of library size on alpha diversities and significantly reduced the loss of samples and sample TCR sequence reads, particularly for samples with larger sequence total reads. Extensive simulations using real-world data highlight the inadequacy of the overall rarefying approach in controlling the confounding effect of library size. The proposed method outperformed competing rarefaction strategies by achieving better-controlled type-I error rates and enhanced statistical power in association tests.

Latent Classification of Multi-model Non-homogeneous Continuous-time Markov Chains
Joonha Chang
Department of Biostatistics and Data Science
Louisiana State University Health Science Center
New Orleans, Louisiana

Continuous-time Markov chain (CTMC) models are commonly used to analyze longitudinal categorical outcomes in health research, but their assumption of constant transition rates can be restrictive. To address this, we use non-homogeneous CTMC (NH-CTMC) models with closed-form transition probabilities, allowing rates to vary over time. We further introduce a latent class clustering approach to account for subject-level heterogeneity by grouping individuals with similar transition patterns. This approach offers greater flexibility for modeling time- and group-varying state transitions. We demonstrate its utility using data from an ambulatory hypertension monitoring study.

Clustering Spatial Data with a Mixture of Skewed Regression Models
Junho Lee
Department of Experimental Statistics
Louisiana State University
Baton Rouge, Louisiana

A single regression model is unlikely to hold throughout a large and complex spatial domain. A finite mixture of regression models can address this issue by clustering the data and assigning a regression model to explain each homogenous group. However, a typical finite mixture of regressions does not account for spatial dependencies. Furthermore, the number of components selected can be too high in the presence of skewed data and/or heavy tails. Here, we propose a mixture of regression models on a Markov random field with skewed distributions. The proposed model identifies the locations wherein the relationship between the predictors and the response is similar and estimates the model within each group as well as the number of groups. Overfitting is addressed by using skewed distributions, such as the skew-t or normal inverse Gaussian, in the error term of each regression model. Model estimation is carried out using an EM algorithm, and the performance of the estimators and model selection are illustrated through an extensive simulation study and two case studies.

Action-based phylogenetic likelihood calculations for large state-space models
Xiang Ji
Department of Mathematics
Tulane University
New Orleans, Louisiana

Computing the phylogenetic likelihood is the most time-consuming step for many important phylogenetics algorithms. When the rate matrix is sparse, computing the action of the matrix exponential on partial likelihood vectors using the algorithm of Al-Mohy and Higham is faster than separately computing the matrix exponential followed by matrix-vector multiplications. This is particularly useful when the state-space is large such that the latter becomes computationally infeasible. In this talk, I will introduce the action-based likelihood algorithm and its application in learning branch-specific selection pressure dynamics.

Tests for Comparing Several Normal Quantiles with Applications
Justin Dunnam
Department of Mathematics
University of Louisiana at Lafayette
Lafayette, Louisiana

Comparing several groups or populations is an important problem in statistics and is commonly studied by comparing means of the groups. There are numerous solutions for testing equality of means of several populations. However, researchers understand that the mean or the median does not determine the entire distribution. The means of several distributions may agree, but the tails of the distributions could be different. In this talk, we describe some existing and new statistical tests for detecting differences among percentiles of several normal populations. Tests for comparing means when the variances are unknown and arbitrary (Behrens-Fisher problem) can be obtained as a special case. Proposed tests are evaluated in terms of error rates and power, and compared with other existing tests. The methods are illustrated using data related to a cancer study and latency data.

Characterizing Interlocus Gene Conversion in Segmentally-Duplicated Regions of Primate Genomes
Yufei Zou
Department of Mathematics
Tulane University
New Orleans, Louisiana

Interlocus gene conversion (IGC) is a mutation process that homogenizes repeated sequences in genomes. While previous work has illuminated the role of IGC in the evolution of multigene families post-whole genome duplication events, its impact on segmentally-duplicated regions - particularly in primates - remains largely unexplored. To address this, I am applying a composite likelihood approach to account for spatial correlations caused by multiple mutations experiencing the same IGC event such that they are copied from the donor paralog to the equivalent region of the recipient paralog simultaneously. We apply the method to estimate the IGC initiation rate and average tract length in the evolution of several primate species. This work represents an early but promising step toward quantifying IGC’s role in primate genome evolution beyond whole genome duplication events.

An Association Test Based on Kernel-Based Neural Networks for Complex Genetic Association Analysis
Tingting Hou
Department of Experimental Statistics
Louisiana State University
Baton Rouge, Louisiana

The advent of artificial intelligence, especially the progress of deep neural networks, is expected to revolutionize genetic research and offer unprecedented potential to decode the complex relationships between genetic variants and disease phenotypes, which could mark a significant step toward improving our understanding of the disease etiology. While deep neural networks hold great promise for genetic association analysis, limited research has been focused on developing neural network-based tests to dissect complex genotype-phenotype associations. This complexity arises from the opaque nature of neural networks and the absence of defined limiting distributions. We have previously developed a kernel-based neural network model (KNN) that synergizes the strengths of linear mixed models (LMM) with conventional neural networks. KNN adopts a computationally efficient minimum norm quadratic unbiased estimator (MINQUE) algorithm and uses kernel-based neural network structure to capture the complex relationship between large-scale sequencing data and a disease phenotype of interest. In the KNN framework, we introduce a MINQUE-based test to assess the joint association of genetic variants with the phenotype, which considers non-linear and non-additive effects and follows a mixture of chi-square distributions. We also construct two additional tests to evaluate and interpret linear and non-linear/non-additive genetic effects, including interaction effects. Our simulations show that our method consistently controls the type I error rate under various conditions and achieves greater power than a commonly used sequence kernel association test (SKAT), especially when involving non-linear and interaction effects. When applied to real genetic data from the UK Biobank, our approach identified genes associated with hippocampal volume, which can be further replicated and evaluated for their role in the pathogenesis of Alzheimer's disease.

Joint Modeling Approach for censored predictors in generalized linear model due to detection limit with applications to metabolites data
Fengxue Li
Department of Biostatistics and Data Science
Celia Scott Weatherhead School of Public Health & Tropical Medicine at Tulane University
New Orleans, Louisiana

Censoring due to limits of detection is a frequent issue in biomarker research, where measured values often fall below assay thresholds. When censored biomarkers are used as predictors to study associations with health outcomes, biased and/or inefficient estimates often result from common practices such as replacing the censored data with the detection limits or deleting the censored data to handle censored data. The issue of censoring becomes more complex when these left-censored values arise either from non-exposed individuals (true zeros) or from exposed individuals with undetectable levels, resulting in a mixed population that complicates statistical inference. In such cases, the mixed population needs to be disentangled for valid inference, as the two subpopulations may exhibit different relationships with health outcomes. In this talk, I will propose a joint modeling approach to address both the censoring issue and the mixture population issue within the framework of generalized linear models (GLMs). We will conduct extensive simulation studies to assess the performance of the proposed joint modeling approach and apply it to examine the associations between plasma metabolites and hypertension in the Bogalusa Heart Study.

Testing Multivariate Normality of a Large Number of Populations
Nurudeen Ajadi
Department of Mathematics
University of Louisiana at Lafayette
Lafayette, Louisiana

We provide a new approach to test multivariate normality for K independent samples simultaneously. The number of samples, K, is allowed to be arbitrarily large and independent of sample size. The proposed test is based on the energy test for multivariate normality in A New Test for Multivariate Normality by Szekely and Rizzo (2005). Asymptotic normality of the test statistics has been established to perform the test as K tends to infinity. The test has been shown to be consistent against all fixed nonnormal alternatives. Simulation studies are conducted to examine the performance of the new K-sample test.

Bayesian Analyses and Design of Aggregated Group Sequential N-of-1 Clinical Trials
Md Abdullah Al-Mamun
Department of Biostatistics and Data Science
Louisiana State University Health Science Center
New Orleans, Louisiana

N-of-1 trials offer a personalized approach to clinical research, allowing for the evaluation of individualized treatments through repeated crossover designs. While aggregating multiple N-of-1 trials increases statistical power, existing methods often fail to account for treatment heterogeneity across individuals. Current Bayesian approaches primarily focus on hierarchical models, which assume a common distribution of treatment effects and may overlook distinct patient subgroups. To address heterogeneity among patient subgroups, we propose a Bayesian mixed modeling approach in N-of-1 trials that identifies subgroups of patients with similar treatment responses while allowing for individual variation. Our clustering approach dynamically groups patients based on treatment effects, while the mixed approach integrates hierarchical and clustering structures to enhance flexibility. We implement adaptive Markov Chain Monte Carlo methods, including Metropolis-Hastings and Gibbs sampling, for efficient posterior inference. To validate our methods, we propose to conduct extensive simulation studies under varying treatment effect scenarios. The hypothesis is that compared to the hierarchical method, the clustering and mixed approaches can improve the estimation of treatment effects by accurately detecting patient subgroups with similar responses. Adjusting grouping thresholds affects clustering accuracy, with the mixed approach consistently achieving the best balance between reducing bias and identifying subgroups. By addressing limitations in existing Bayesian N-of-1 trial models, this research can advance statistical methodologies for personalized treatment evaluation and provide a scalable framework for precision medicine.

Multivariate Modeling Techniques for Estimating Assistance Impacts from Daily Food Security Surveys
John Argentino
Tulane University
New Orleans, Louisiana

Data collection for food security monitoring in both developed and developing countries is increasingly conducted via phone-based surveys. In particular, the World Food Programme partners with polling agencies to contact a daily random selection of households in countries of interest. Respondents are asked a variety of questions related to demographics, diet, and other indicators of food insecurity, poverty, and household stability. While the standard application of these surveys is to simply look for long-term trends in food security, the richness of the data also invites the development of multivariate models to assess the impacts of interventions and mitigation strategies. Conventional regression methods assume temporal independence in the residual noise given that the daily samples are essentially independent, but this ignores external factors that induce autocorrelation. Adding a latent time series to the model enables such temporal correlation to be estimated and extracted, thereby improving estimation of model parameters. We demonstrate the utility of this approach in the analysis of survey data collected between 2020 and 2022.