Workshop on Big Data and Machine Learning
Thursday - Saturday, April 11 - 13, 2019
Maxim Doucet Hall
Department of Mathematics
University of Louisiana at Lafayette, Lafayette, Louisiana
This workshop will focus on recent developments related to big data and machine learning.
Big Data
Professor Erniel B. Barrios
School of Statistics
University of the Philippines Diliman, Philippines
There will be four sessions on big data.
- Topics in High Dimensional Data
- Models of Customer Survival Data
- Analyzing Multiple Time Series Data
- Some Open Topics
Machine Learning
Professor Gerhard Dikta
University of Applied Science
FH Aachen, Germany
There will be two sessions on machine learning.
-
Probability of damage of electronic systems due to indirect lightning strikes
A real application example I did years ago for the German insurance industry. -
Bootstrap approximations in model checks
New model validation approaches that can be used in statistical learning.
Registration (required)
Register here. There is no registration fee. However, to aid in planning please complete the registration form as soon as possible. The registration deadline is Monday, 8 April 2019.
Schedule
Thursday Afternoon, 11 April 2019
Maxim Doucet Hall room 211
Time | Topic and speaker |
---|---|
4:30 - 4:45 | Refreshments |
4:45 - 6:00 |
Probability of damage to electronic systems due to indirect lightning strikes Gerhard Dikta |
Friday Afternoon, 12 April 2019
Maxim Doucet Hall room 211
Time | Topic and speaker |
---|---|
12:45 - 2:00 |
Topics in High Dimensional Data Erniel B. Barrios |
2:00 - 2:30 | Refreshments |
2:30 - 3:45 |
Bootstrap approximations to check parametric regression models Gerhard Dikta |
3:45 - 4:15 | Refreshments |
4:15 - 5:30 |
Models of Customer Survival Data Erniel B. Barrios |
Saturday Morning, 13 April 2019
Maxim Doucet Hall room 211
Time | Topic and speaker |
---|---|
8:45 - 9:00 | Refreshments |
9:00 - 10:15 |
Analyzing Multiple Time Series Data Erniel B. Barrios |
10:15 - 10:45 | Refreshments |
10:45 - 12:00 |
Some Open Topics Erniel B. Barrios |
Abstracts
Probability of damage to electronic systems due to indirect lightning strikes
Gerhard Dikta
11 April 2019
German household insurance covers damage to an electronic system if the damage is caused by a lightning strike. In the years 2002-2005, a sharp increase in claims of this kind was observed among insurance companies. In order to meet this increasing demand, the GDV supported a project with the aim of analyzing the distance between a lightning strike and the location where the damage occurred. In this lecture a model for the distribution of these distances is discussed and applied to real data from the insurance companies. The modelling is based on about 75000 damage reports from the year 2005.
Topics in High Dimensional Data
Erniel Barrios
12 April 2019
The data generating process resulting to big data is often characterized by complex dependence structure. The data exhibits heterogeneity as a result of pooling together data coming different sources. Representation of such data would require large number of variables (features), often labelled as high dimensional. Two approaches in dealing with high dimensional data will be discussed. First is dimension reduction where high dimensional features will be translated into lower dimensions. A method that accounts for data characteristics arising from heterogeneity of pooled data will be discussed. The second approach will do away with dimension reduction as pre-analysis prior to modeling and proposes to develop a model that will simultaneously select features of the data while fitting a predictive model. This method is then applied to quality of life index.
Bootstrap approximations to check parametric regression models
Gerhard Dikta
12 April 2019
Suppose we observe a series of binary data along with explanatory variables and we suspect that these observations belong to a parametric regression model. To verify this assumption, we use Kolmogorov-Smirnov and Cramér von Mises type tests based on a maximum likelihood estimate of the parameter and a marked empirical process introduced by Stute. We determine the critical values for the tests with a special bootstrap procedure in which the resampling scheme is adapted to the parametric setup. The approach presented is discussed in the context of machine learning and how it can be applied to generalized linear models, distinguishing between semi-parametric and parametric GLMs. Finally, this approach is applied to simulated and real data. In the latter case, we review parametric model assumptions of some right censored data.
Models of Customer Survival Data
Erniel Barrios
12 April 2019
In a highly competitive sector like the telecommunications industry, recruitment of new customers is more expensive than strategies that induce loyalty and patronage among existing customers. Survival models are used in characterizing customer behavior, the models are then used in Customer Lifetime Valuation (CLV). CLV is then used in loyalty incentive planning/offers. Various features of the data-generating process provided stimulus in the development of new statistical methods to be discussed.
Analyzing Multiple Time Series Data
Erniel Barrios
13 April 2019
In addition to the telecommunications industry, credit card transactions, bank accounts, financial markets also contributed in the early evolution of big data. Data from these sectors are often characterized by multiple time series. Multiple time series is differentiated from multivariate time series or from panel data. An estimation procedure for models in multiple time series data is proposed. Statistical methods are developed to contribute in the analysis of other features of multiple time series data.
Some Open Topics
Erniel Barrios
13 April 2019
There are many themes describing features of big data. Some common topics of interest includes varying frequencies and clustering. We present some statistical problems formulated from these topics, initial results generated so far and some open problems are also discussed. Further features of multiple time series are extended to the concept of changepoint analysis and in clustering of time series. Some initial work on text mining will also be discussed.
Information
Please direct any inquiries to Nabendu Pal or Bruce Wade.