You are here

2015 Roeling Conference Abstracts

Titles and Abstracts as of 7 November 2015

  • Comparison of Drug Dissolution Profiles: A Proposal Based on Tolerance Limits
    Thomas Mathew
    Department of Mathematics and Statistics, University of Maryland Baltimore County, Baltimore, MD

    Comparison of the dissolution profiles between the reference and test formulations of a drug is critical for assessing similarity between the two formulations. A dissolution profile is a vector consisting of the percentage of the active drug ingredient dissolved (based on one dosage unit) at multiple pre-specified time points. Dissolution profile comparison is required by regulatory authorities, and the criteria used for this include the widely used difference factor f1 and a similarity factor f2, recommended by the FDA. The former is a function of the average absolute difference between the dissolution profile vectors, and the latter is a function of the average squared differences. In spite of their extensive use in practice, the two factors have been heavily criticized on various grounds; the criticisms include ignoring sampling variability and ignoring the correlations across time points while using the criteria in practice. We aim to put f1 and f2 on a firm statistical footing by developing tolerance limits for the distributions of f1 and f2, so that both the sampling variability and the correlations over time points are taken into account. Both parametric and nonparametric approaches will be discussed, and a bootstrap calibration will be used to improve accuracy. The proposed methodology will be illustrated using the analysis of real dissolution data.
    This work is joint with Shuyan Zhai and Yi Huang.

  • Network analysis using protein-protein interaction data with results from Genome-wide Association Studies in Whites and African-Americans identify common genes underlying late-onset Alzheimer's Disease
    Shubhabrata Mukherjee
    Department of Medicine, University of Washington, Seattle, WA

    Background: Recent genome-wide association studies (GWAS) have identified late-onset Alzheimer's disease (LOAD) susceptibility loci in Whites and African-Americans. Except genes in chromosome 19, these studies have failed to discover genes associated with LOAD common to both Whites and African-Americans. We performed a network analysis incorporating human protein-protein interaction data mined from 12 different databases with summary GWAS of LOAD results to identify candidate genes or networks of interacting genes associated with LOAD in these populations.
    Methods: We performed separate gene-wide analyses using summary data from GWAS of LOAD obtained from the International Genomics of Alzheimer’s Project for Whites and the Alzheimer’s Disease Genetics Consortium (ADGC) for African-Americans. For each of the populations, we combined gene-wide association results with human protein-protein interaction data using a dense module searching (DMS) method to identify candidate LOAD sub-networks. This approach identifies networks of interacting genes enriched with low p-values by searching the entire interactome and exhaustively examining the combined effect of multiple genes
    Results: The network analysis identified several of the known LOAD risk loci as well as other genes such as UBC, and ALB to be strongly associated with LOAD in both populations.
    Conclusions: We were able to identify a significant module of interacting candidate genes, including some well-studied genes not detected in the single-marker analysis of GWAS for LOAD common to both Whites and African-Americans. This approach provides complementary data to a GWAS of a complex disease phenotype by incorporating biological knowledge derived from protein-protein interactions. Further functional enrichment analysis is needed to determine whether these novel loci may provide targets for interventions to ameliorate LOAD.
    This work is joint with David W. Fardo, Thomas J. Montine, and Paul K. Crane.

  • Multiple Comparison Procedures for Binomial Distributions
    Jie Peng
    Department of Finance, Economics and Decision Science, St. Ambrose University, Davenport, Iowa

    Multiple tests for comparing several binomial proportion are compared with respect to type I error rates and powers. For testing equality of several proportions, we apply Simes (1986, Biometrika) approach and compare it with the usual chi-square test. Our comparison study indicates that the Simes method performs slightly better than the chi-square test in controlling type I error rates. For multiple comparison procedures, we compare the Bonferroni method, Marascuilla approach, Tukey type approach and Holm method. Our numerical studies indicate that the Holm method and the Bonferroni method perform similarly with respect to type I error rates, and the former is slightly more powerful than the latter in some cases. Between Marascuilla method and the Tukey type approach, the former is overly conservative while the latter is too liberal for some cases. Overall, Holm's approach and the Bonferroni approach are preferable for multiple comparison purpose. The methods are illustrated using an example.

  • Controlling for Systematic Bias in Allelic Imbalance Estimation Using a Negative Binomial Bayesian Model
    Luis Leon Novelo
    University of Texas School of Public Health-Health Science Center at Houston

    We say that a gene is in allelic imbalance (AI) when the two alleles of this gene have a different level of expression. We propose a Bayesian model to detect genes in AI using RNA next generation sequencing data. The experiment consists in crossing two near isogenic lines of Drosophila melanogaster, a naturally derived genotype from female (line) to W1118 male flies (tester). The F1 heterozygous female offspring are separated by sex immediately after enclosure and half of the flies are kept virgin while the other half are mated to W1118. Three independent replicated were conducted for 69 lines. For every gene, the number of reads aligning to the tester allele, to the line allele and to both alleles is recorded. The gene is in AI if the proportion of reads produced by the tester is different from the proportion of reads produced by the line. The AI can be present in mated only, virgin only or both mated and virgin. The challenges are: (i) the sample proportion of reads aligning to the tester (line) are not necessary unbiased estimates of the proportion of reads produced by the tester (line) due to potential systematic bias, (ii) the amount of information (coverage) can be very different between mated and virgin environments, and (iii) RNA next generation sequencing data is usually overdispersed (with respect to the Poisson distribution, i.e. the variance is greater than the mean). We address (i) using simulation to estimate the degree of systematic bias and incorporate this information into the model, (ii) by including the information that the reads aligning to both alleles provides to adjust for difference in coverage between mated and virgin, and (iii) by using a negative binomial sampling distribution to address overdisperion. Credible intervals for the proportion of reads coming from tester is then estimated. A gene is flagged as in AI based on these credible intervals.
    This is joint work with Lauren McIntyre, Alison Gerken, Alison Morse, Justin Fear, and Sergey Nuzhdin.

  • Varying-Coefficient Single-Index Signal Regression
    Brian D. Marx
    Department of Experimental Statistics, Louisiana State University, Baton Rouge, LA

    The Penalized Signal Regression (PSR) approach to Multivariate Calibration (MVC) assumes a smooth vector of coefficients for weighting a signal or spectrum to predict the unknown concentration of a chemical component. P-splines (i.e. B-splines and roughness penalties, based on differences) are used to estimate the coefficients. In this paper we allow the PSR coefficient vector to vary smoothly along a covariate (e.g. temperature), which results in a smooth surface on the wavelength-temperature domain. Estimation is performed using two-dimensional tensor product P-splines. As such, a slice of this surface effectively estimates the vector of coefficients at any arbitrary temperature. As an added generalization, we further relax the implicit assumption of an identity link function by allowing an unknown, but explicit, link function between the linear predictor and the response. Again, we allow the signal's link function to vary smoothly along a covariate, which produces a two-dimensional {\it link surface}. The unknown link surface is also estimated using two-dimensional P-splines, which is sliced at the same arbitrary temperature to bend prediction. Typically we use a common covariate (e.g. temperature) to vary the associated link function, as with the signal coefficients, but nothing prohibits the use of two different ones. We term our method: varying single-index signal regression (VSISR). The methods presented are grounded in penalized regression, where difference penalties are placed on the rows and columns of the tensor product coefficients. Each row and column of each surface has its own tuning parameter. An application to ternary mixture data illustrates that both the varying-coefficient and varying-nonlinearity (due to the link) are present. External prediction performance comparisons are made to both the identity link varying-coefficient penalized signal regression (VPSR) and partial least squares (PLS).

  • Summation-Extendable Survival Functions
    Scott E. Smith
    Department of Mathematical Sciences, University of the Incarnate Word, San Antonio, Texas

    A note on the construction of Archimedean copulae provides the foundation for the classification of a type of survival function extendable through summation of scaled variables. A few examples, including extensions of the Weibull and Lomax distribution studied previously in the literature, and unstudied distributions such as a bivariate Gompertz distribution and a multivariate Log-Gamma-Gompertz distribution, are derived from this formulation. For all functions of this type, a common formula for stress-strength reliability, the minimum property, the distributions of order statistics, conditions for multivariate increasing failure rate and failure rate average, and a simulation method are shown.

  • Overlap, distance, and similarity measures
    Madhuri S. Mulekar
    Department of Mathematics & Statistics, University of South Alabama, Mobile, AL

    The idea of similarity is not a new one to scientists. Philosophers and mathematicians have argued over this idea over centuries. The overlap, distance, and similarity between different objects, observations, or distributions can be defined using different measures. They are different concepts and yet somehow related to each other. As a result, we have multitudes of measures available in the literature applied to varied situations. Applications of such measures are seen from fine arts and language arts to social and physical sciences. Although many measures have been developed and applied in practice, the properties of many are yet to be developed. This talk will focus on some such measures, their properties, and usefulness in practice.

  • Survival model selection with missing data and correlated covariates
    Sydeaka P. Watson
    The University of Chicago, Biostatistics Laboratory

    In this talk, I will describe an algorithm used to develop a survival prediction equation for pulmonary arterial hypertension patients awaiting lung transplantation. The transplant registry dataset featured censored survival times, missing covariate data, and a large number of highly correlated candidate predictor variables. We used a novel combination of existing methods to select a subset of the candidate variables which could be used to predict survival probabilities for each patient. In our approach, we repeatedly applied penalized weighted least squares regression in bootstrap resamples of multiply imputed data and selected a parsimonious model that satisfied internal validation criteria of clinical interest. Simulation studies under various degrees of predictor variable missingness, survival time censoring, effect size, and proportion of variables unrelated to survival have shown that this method accurately recovers the true list of Cox regression predictor variables.

  • Graphical EDA for Ball Games in Sports
    Seongbaek Yi and Daeheung Jang
    Pukyong National University, Busan, Korea

    In this paper graphical exploratory data analyses are proposed for ball games in sports. We show examples of gold medal match results for men's volleyball and men's basketball at the London 2012 Summer Olympics.

  • Visualizing and Testing the Multivariate Linear Regression Model
    Lasanthi C. R. Pelawa Watagoda
    Department of Mathematics, Southern Illinois University, Carbondale, Illinois

    Recent results make the multivariate linear regression model much easier to use. This model has m >= 2 response variables. Results by Kakizawa (2009) and Su and Cook (2012) can be used to explain the large sample theory of the least squares estimator and of the widely used Wilks' Lambda, Pillai's trace, and Hotelling Lawley trace test statistics. Kakizawa (2009) shows that these statistics have the same limiting distribution. This work reviews these results and gives two theorems to show that the Hotelling Lawley test generalizes the usual partial F test for m = 1 response variable to m >= 1 response variables. Plots for visualizing the model are also given, and can be used to check goodness and lack of fit, to check for outliers and influential cases, and to check whether the error distribution is multivariate normal or from some other elliptically contoured distribution.

  • Bootstrapping Analogs of the Two Sample Hotelling's T^2 Test
    Hasthika S. Rupasinghe Arachchige Don
    Department of Mathematics, Southern Illinois University, Carbondale, Illinois

    Suppose there are two independent random samples from two populations or groups. A common multivariate two sample test of hypotheses is H0 : mu_1 = mu_2 versus H1 : mu_1 not= mu_2 where mu_i is a population location measure of the ith population for i = 1,2. The two sample Hotelling's T^2 test is the classical method, and is a special case of the one way MANOVA model if the two populations are assumed to have the same population covariance matrix. This work suggests using the Olive (2015a) bootstrapping technique to develop analogs of Hotelling's T^2 test. The new tests can have considerable outlier resistance, and the tests do not need the population covariance matrices to be equal.

  • Shakin' Things Up: Modeling Extreme Earthquake Activity
    Audrene Edwards
    Lamar University

    The study of extremes has attracted the attention of scientists, engineers, actuaries, policy makers, and statisticians for many years. Extreme Value Theory (EVT) deals with the extreme deviations from the median of probability distributions and is used to study rare but extreme events. EVT's main results characterize the distribution of the sample maximum or the distribution of values above a given threshold. In this study, EVT has been used to construct a model on the extreme and rare earthquakes that have happened in the United States from 1700 to 2011. The primary goal of fitting such a model is to estimate the amount of losses due to those extreme events and the probabilities of such events. Several diagnostic methods (for example, QQ plot and Mean Excess Plot) have been used to justify that the data set follows generalized Pareto distribution (GPD). Three estimation techniques have been employed to estimate parameters. The consistency and reliability of estimated parameters has been observed for different threshold values. The purpose of this study is manifold. First, we investigate whether the data set follows GPD, by using graphical interpretation and hypothesis testing. Second, we estimate GPD parameters using three different estimation techniques. Third, we compare consistency and reliability of estimated parameters for different threshold values. Last, we investigate the bias of estimated parameters using a simulation study. The result is particularly useful because it can be used in many applications (for example, disaster management, engineering design, insurance industry, hydrology, ocean engineering, and traffic management) with a minimal set of assumptions about the true underlying distribution of a data set.

  • Robustness of Inference for One-way and Two-way ANOVA with Correlated Observations
    Avishek Mallick
    Marshall University

    In many experiments several observations are taken over time or with several treatments applied to each subject. These observations tend to be highly correlated, particularly those observed adjacent to each other with respect to time. In this presentation I will talk about the effect of correlations among the observations in one-way and two-way ANOVA. A modification of the standard tests suitable for AR(1) correlation structure is proposed and its properties are investigated. We also apply the approximations to the distribution of F tests as suggested by Andersen, Jensen, and Schou (1981) and carry out the analysis. The modified procedure allows us to have a better control of the nominal significance level. Consequently, the multiple comparisons and multiple tests based on this modified procedure will lead to conclusions with better accuracy. This is a joint work with Dr. Perla Subbaiah.

  • Robust analyses of over-dispersed counts with varying follow-up in small samples and rare diseases
    Frank Konietschke
    Department of Mathematical Sciences, The University of Texas at Dallas

    In this talk, we consider inference methods for count data, such as the number of relapses and magnetic resonance imaging (MRI) lesion counts in multiple sclerosis (MS), or exacerbations in chronic obstructive pulmonary disease (COPD). In such clinical trials, the number of exacerbations and the follow-up time is recorded for each patient. Due to the heterogeneity of patients, the number of exacerbations cannot be assumed to follow a Poisson distribution, and over-dispersion has to be taken into account for valid statistical inferences. We derive statistical inference methods for testing null hypotheses as well as for constructing confidence intervals for the underlying treatment effects. For small sample sizes, a studentized permutation approach will be investigated. Extensive simulation studies show that the permutation based statistics tend to maintain the nominal type-1 error level or coverage probability very satisfactorily. A real data set illustrates the application of the proposed methods. The project is in cooperation with Professor Tim Friede, University of Göttingen, and Professor Markus Pauly, University of Ulm.

  • On Analysis of Incomplete Field Failure Data
    Hon Keung Tony Ng
    Department of Statistical Science, Southern Methodist University

    Most commercial products in the market place are sold with warranties and they are sold indirectly through dealers. This results in serious missing data problems in the analysis of field return data because of the unobserved sales date for the unreturned units. Our purpose here is to systematically investigate the parametric inference for such data. This study considers a general setting for field failure data with unknown sales dates and a warranty limit. A stochastic expectation-maximization (SEM) algorithm is developed to estimate the distributions of the sales lag (time between shipment to a retailer and sale to a customer) and the lifetime of the product under study. Extensive simulations are used to evaluate the performance of the SEM algorithm and to compare with an imputation-based approach. Real examples are used to illustrate the proposed methodology. This work is in collaboration with Z. S. Ye (National University of Singapore).

  • Improving the Efficiency of Randomized Response Techniques by a Two-Stage Model for Binary Responses
    Husam I. Ardah and Evrim Oral
    LSUHSC, School of Public Health, Biostatistics Program

    It is known that respondents, when asked sensitive questions in surveys, tend to over-report socially desirable attitudes and under-report socially undesirable ones, which creates social desirability bias (SDB). Randomized response techniques (RRTs) are one of the solutions developed to elicit better estimates of the prevalence of sensitive behaviors. RRTs reduce the SDB by providing privacy protection for respondents; however, variances from RRTs are inflated with respect to the variances from direct questioning technique (DQT). Thus, if the sensitive question of interest is not considered as really sensitive by most of the respondents, using an RRT instead of the DQT will inflate the variance of the estimates unnecessarily. Motivated from this fact, we propose a two-stage technique where one can accurately estimate the prevalence of the sensitive characteristic under study without paying the price of the inflated variance. We support our theoretical results with several simulations

  • Multivariate Inverse Prediction with Mixed Models
    Lynn R. LaMotte
    Biostatistics Program, LSUHSC School of Public Health, New Orleans, Louisiana

    Insects visit a dead body left outdoors. Their characteristics (measurements of size and development and combinations of species) can provide a biological clock useful in estimating the time since death. Given the multivariate measurement $\bm{y}_*$ from a mystery specimen sampled at the scene, the objective is to devise reasonable and defensible statistical methodology to support an estimate of the age of the specimen. For that purpose, training data are available from rearing experiments for the species in question. They comprise independent observations on $\bm{y}$ at ages spanning the development cycle, under controlled (principally temperature) conditions. Central features of such data are that the $\bm{y}$ - age relation is not linear and the variance-covariance matrix evolves steeply with age.

    Inverse prediction, also known as calibration, has a reputation of being computationally difficult, particularly with a multivariate response. Methods for heteroscedastic multivariate responses are practically unknown. In the forensic sciences literature, most developments have modeled age as a function of $\bm{y}$, in reverse cause-effect order, with multiple regression, and ignored the inconstant variance.

    These relations can be modeled within the context of mixed models, with separate models for the mean vector and the variance-covariance matrix in terms of age and temperature. At each potential age, comparison of $\bm{y}_*$ to the model fit to the training data gives a p-value for the test of $\bm{y}_*$ as a multivariate outlier at that age.

    The methodology and computations of mixed models are well-developed and widely available in standard statistical computing packages. In this talk, I shall illustrate the formulation and implementation of multivariate inverse prediction in terms of mixed models.

    Research reported in this talk was supported by Award 2013-DN-BX-K042, U. S. Department of Justice, National Institute of Justice.

  • Inference in Binary Regression with Stochastic Covariates
    Denise Danos and Evrim Oral
    LSUHSC, School of Public Health, Biostatistics Program

    In a simple generalized linear model (GLM), covariates are traditionally assumed to be non-stochastic. However, in numerous real-life applications, covariates are stochastic in nature. Extending the work of Oral (2006), Sazak et al. (2006) and Islam and Tiku (2010), we develop new estimators and test statistics for a binary GLM with bivariate stochastic covariates. We obtain simulated Type I error rates and power values of asymptotic likelihood ratio tests for various sample sizes. We compare the performance of the asymptotic test to parametric bootstrap tests at small sample sizes.

  • A statistical method to correcting cross-annotations in NGS metagenomic functional profiling
    Zhide Fang
    LSU Health Sciences Center at New Orleans

    Accurate functional profiling is one of the important steps of many metagenomic studies. Profiling approach based on read counts by next-generation sequencing techniques may cause the problem of cross-annotation. We propose a statistical method to address the issue of cross-annotation in functional profiling of a metagenome. The applications on in vitro-simulated metogenomic samples, those simulated by a bioinformatoic tool, and a real-world data, show that the method is successful in correcting the cross-annotations.

  • Approximate small-sample tests of fixed effects in nonlinear mixed models
    Julia Volaufova_1 and Jeffrey Burton_2
    _1LSUHSC School of Public Health, New Orleans
    _2Pennington Biomedical Research Center, LSU, Baton Rouge

    Nonlinear mixed effects models show up most frequently in pharmacokinetic and pharmacodynamic applications. The distributional challenge that is driven by nonlinearity in the random part results in several possible approaches to solve the maximum likelihood estimation problem.

    Here we focus on investigating the performance of commonly-applied tests of linear hypotheses about the fixed effects parameters under different approximations to the likelihood function and to the estimated covariance matrix of the estimators. Included are the first order approximation (FIRO), first order conditional approximation (FOCE), and Gaussian quadrature approximation (AGQ) estimation methods. There is no straightforward way to mimic the approximations and adjustments to the estimation covariance matrix taken in linear mixed models, such as the Kackar-Harville-Jeske-Kenward-Roger approach. By simulations, we illustrate the accuracy of p-values for the tests considered here. The observed results indicate that the FOCE and AGQ estimation methods outperform FIRO. The test with an adjustment coefficient that takes into consideration the number of sampling units and the number of fixed effects parameters (Gallant-type) seems to perform closest to desirable even for small sample sizes.

    The second possible approach is the two-stage approach for the case when the number of observations per sampling unit is large enough. The approximate $F$-test is developed based on a normal approximation to the distribution of nonlinear least squares estimates of subject-specific individual parameters, which constitute the response for the second stage. The second-stage model results in a mixed model with covariance matrix dependent on the unknown variance components as well as on the fixed effects population parameters.

    For the two-stage approach, we suggest the use of an approximate $F$-test based on approximate maximum likelihood estimates of all model parameters. Here we also focus on comparing the performance of approximate tests, the accuracy of p-values, for two types of pharmacokinetic models.

  • Minimum Distance Estimators in Linear Regression with Autoregressive Errors
    Jiwoong Kim
    Michigan State University

    This paper discusses the behavior of Koul's minimum distance estimators of the regression and autoregressive parameters in the linear regression model with symmetric autoregressive errors. Asymptotic distributional properties of these estimators are discussed. A simulation study that compare the performance of some of these minimum distance estimators with the generalized least squares and the ordinary least squares estimators is also included. This simulation shows some superiority of the minimum distance estimator over the other estimators.

  • A nonparametric test on the dependence of two continuous variables
    Bin Li
    Louisiana State University

    It is important to test dependence between two variables. We propose a distribution-free approach to test relationship between two continuous variables based on the rank of the concomitants of order statistics. Using both the simulated cases and two microarray datasets, we show that the proposed statistics can identify a wide range of associations, both functional and not, both linear and nonlinear, which could not be detected using conventional correlation measures.

  • Testing Monotonic Renewal Variance Residual Life Under Shock Models
    Mohammad Sepehrifar
    Mississippi State University

    The main objective of this work is to study the lifetime of a random phenomenon through the concept of variance residual life distribution. In this study we provide a better understanding on the life-behavior of a system whose life is exposed through a random number of shocks which are governed by a homogeneous Poisson process. This investigation has broad applications in studying the behavior of the economic and social sciences phenomena and many other practical situations. A U-statistic test procedure driven by the Laplace Transform approach is introduced to test the hypothesis that the uncertainty of the remaining life of a system remains unchanged against the alternative statement when the residual life of a random phenomenon has decreasing renewal uncertainty over time. A tabulated upper percentile values and simulated empirical powers of the proposed test statistic are presented as well.

  • Identification of SNP-SNP Interaction Patterns
    Hui-Yi Lin
    LSUHSC, School of Public Health, Biostatistics Program

    It has led to the general recognition that targeting individual single nucleotide polymorphisms (SNPs) is not sufficient to explain the complexity of cancer causality. The predictive power of cancer risk for the SNPs identified in the genome-wide association (GWA) studies is limited with the median per-allele odds ratio of 1.2 based on a recent review. SNP-SNP interactions may be the key to overcome bottleneck situations of genetic association studies. Although a growing number of studies evaluate SNP-SNP interactions to complement the findings from univariate analyses, statistical methods for detecting SNP-SNP interactions are still under-developed. The objective of this study is to propose a new statistical approach for evaluating 2-way SNP-SNP interactions. For the binary outcome, a logistics model with two main effects and their interaction are commonly used and each SNP is treated as an additive model based on the minor allele. However, this conventional approach was not sufficient. Our propose method is composed of 45 interactions models, which take SNP reference allele, inheritance mode, and model structure into considerations. The conservative Bonferroni method was applied to adjust for multiple comparisons. The best interaction pattern was selected based on the model with the lowest value of the Bayesian information criterion (BIC). The power of SNP-SNP interaction identification was based on the Wald test p-value of the interaction term in the best pattern. The simulation study with different scenarios was conducted to compare our proposed method with the conventional approach. The results show that this new approach has higher power than the conventional approach and can overcome instability of interaction patterns for SNPs with a minor allele frequency close to 0.5. We also applied this new approach to a large-scale prostate cancer consortium data.
    This work is joint with Dung-Tsa Chen, Po-Yu Huang, Chia-Ho Cheng, and Jong Y. Park.

  • Modeling Extreme Aviation Accidents Using Generalized Pareto Distribution Asim Dey and Kumer Pial Das, University of Texas at Dallas and Lamar University

    Generally, air travel is considered a safe means of transportation. But when aviation accidents do occur they often result in fatalities. Fortunately, the most extreme accidents occur rarely. However, 2014 was the deadliest year in the past decade causing 111 plane crashes, and among them worst four crashes cause 298, 239, 162, 116 deaths. In this study we study the risk of the catastrophic aviation accidents by studying the number of fatalities from aviation accidents in the 33-year period from 1982 to 2014. Applying a generalized Pareto model that allows the distribution of extreme fatal injuries, we estimate the probabilities of serious aviation accidents. We also predict the maximum fatalities from an aviation accident in future. And the uncertainty in the inferences are quantified using simulated aviation accident series, generated by bootstrap sampling