You are here

Louisiana ASA Chapter Fall 2022 Meeting - Titles and Abstracts

18 November 2022

Title and Abstracts (in order of presentation)

  1. Joint modeling approaches for censored predictors due to detection limits
    Hua He
    Department of Biostatistics
    Tulane University School of Public Health & Tropical Medicine
    Abstract: Measures of substance concentration in urine, serum or other biological matrices often have an assay limit of detection. When concentration levels fall below the limit, the exact measures cannot be obtained, and thus are left censored. When the censored data are used as predictors, common practices for handling censored data, such as replacing the censored data with a constant value or deleting the censored data, often yield biased and/or inefficient estimates. The problems become more challenging when the censored data come from heterogeneous populations consisting of exposure and non-exposure sub-populations. When the censored data come from non-exposed subjects, their measures are not observed and thus form a latent class resulting from a different censoring mechanism than that involved with exposed subjects. It is very often that the exposed subjects and non-exposed subjects may have different disease traits or different relationships with outcomes of interest, we need to untangle the two different sub-populations for valid inference. However, there is no statistical methods available to address the issues when the censored data is from either an exposure population alone or a mixture of both exposure and non-exposure populations. In this paper, we aim to fill this methodological gap by proposing joint modeling approaches for both the outcome and censored predictor to address the issues. Simulation studies are performed to assess the numerical performance of this new approach when sample size is small to moderate. Real data examples are also used to demonstrate the application of the methods.
  2. Cluster detection of regression coefficients for spatial data
    Junho Lee
    Department of Experimental Statistics
    Louisiana State University
    Abstract: Spatial cluster detection is an important problem in various scientific disciplines such as environmental sciences, epidemiology, and sociology. Scan statistic and its variants have been popular approaches to finding geographical hot spots in the last three decades, and most of them are all defined in terms of the responses. However, in regression analysis for spatial data, identifying clusters of spatial units in a regression coefficient could provide insight into the unique relationship between a response and covariates in certain subdomains of space relative to the other parts of the domains. Here, I introduce my recent studies that addressed the cluster detection problem of regression coefficients for spatial data. Proposed methods detect a potential cluster of regression coefficients based on hypotheses testing against the null hypothesis that the regression coefficients are the same over the entire spatial domain. For illustration, the proposed methods are applied to a cancer mortality dataset and an air quality dataset.
  3. A New SVM-Based Promotion Time Cure Model
    Suvra Pal
    Department of Mathematics
    University of Texas at Arlington
    Abstract:In this talk, I will present a new promotion time cure model (PCM) that uses the support vector machine (SVM) to model the incidence part. The proposed model inherits the features of the SVM and provides flexibility in capturing non-linearity in the data. Furthermore, the new model can incorporate potentially high dimensional covariates. For the estimation of model parameters, I will discuss the steps of an expectation maximization algorithm where I will make use of the sequential minimal optimization technique together with the Platt scaling method. Next, I will present the results of a detailed simulation study and show that the proposed model outperforms the existing logistic regression-based PCM model, specifically when the true classification boundary is non-linear. I will also show that the proposed model's ability to capture complex classification boundaries can improve the estimation results related to the latency part. Finally, I will analyze a data from leukemia cancer study and show that the proposed model results in improved predictive accuracy.
  4. Hypothesis Testing for Two Sample Comparison of Network Data
    Han Feng
    Tulane Research Innovation for Arrhythmia Discovery
    Tulane University School of Medicine
    Abstract: Network is a major type of object data with broad applications in various research fields like neuroimaging. Such data contain numeric, topological, and geometrical information and may be appropriately parameterized in a non-Euclidean space considering their unique statistical properties. However, the development of statistical methodologies for network data is challenging and currently in its infancy; for instance, the non-Euclidean counterpart of basic two-sample tests for network data is scarce in literature. This study presents a novel framework, NEPTUNE, for comparing two independent samples of networks. Specifically, we propose an approximation of the quotient Euclidean distance, and then combine it with network spectral distance to quantify both local and global dissimilarity of networks. PERMANOVA is then used to test the distributional equality of two independent groups of networks characterized by the proposed non-Euclidean distance. Comprehensive simulation studies and real applications are conducted to demonstrate the superior performance of our method over other alternatives. Asymptotic properties of the proposed test are investigated, and its high-dimensional extension is explored as well.
  5. Smooth coalescent prior for scalable Bayesian phylogenetic demographic inference
    Yuwei (Wei) Bao
    Department of Mathematics
    Tulane University
    Abstract: Coalescent-based inference methods are essential in estimating population genetic parameters directly from gene sequence data under a variety of scenarios. In the last two decades, there have been several non-parametric expansions of the coalescent model for more flexible treatment towards demographic changes. The Bayesian Skygrid model is currently the most popular nonparametric coalescent model that discretizes continuous effective population size changes over an array of predefined time epochs. The effective population size in an epoch is constant and represented by a single parameter. Therefore, the change points of the effective population size parameters introduce discontinuities with respect to time and cause difficulties in the application of dynamic-integration-based samplers such as the Hamiltonian Monte Carlo method. In this talk, we introduce the original Skygrid coalescent prior, demonstrate the aforementioned discontinuities and introduce our preliminary thoughts on solving them with a new smoothed version of the Skygrid coalescent prior.
  6. High-Dimensional Multivariate Linear Regression with Weighted Nuclear Norm Regularization
    Li-Hsiang Lin
    Department of Experimental Statistics
    Louisiana State University
    Abstract: We consider a low-rank matrix estimation problem when the data is assumed to be generated from the multivariate linear regression model. To induce the low-rank coefficient matrix, we employ the weighted nuclear norm (WNN) penalty defined as the weighted sum of the singular values of the matrix. The weights are set in a non-decreasing order, which yields the non-convexity of the WNN objective function in the parameter space. Such objective function has been applied in many applications, but studies on the estimation properties of the estimator from the objective function are limited. We propose an efficient algorithm under the framework of alternative directional method of multipliers (ADMM) to estimate the coefficient matrix. The estimator from the suggested algorithm converges to a stationary point of an augmented Lagrangian function. Under the orthogonal design setting, effects of the weights for estimating the singular values of ground-truth coefficient matrix are derived. Under the Gaussian design setting, a minimax convergence rate on the estimation error is derived. We also propose a generalized cross-validation (GCV) criterion for selecting the tuning parameter and an iterative algorithm for updating the weights. Simulations and a real data analysis demonstrate the competitive performance of our new method. (This talk is based on a joint work with Dr. Namjoon Suh and Dr. Xiaoming Huo from Georgia Institute of Technology)