Gini Distance Correlation and Feature Selection
Xin Dang, Professor of Mathematics, The University of Mississippi
21 March 2019
Big data is becoming ubiquitous in the biological, engineering, geological and social sciences, as well as in government and public policy. Building an interpretable model is an effective way to extract information and to do prediction. However, this task becomes particularly challenging for the scenario of big data, which are large scale and ultra-high dimensional with mixed-type features being both structured and unstructured. A common practice in tackling this challenge is to reduce the number of features under consideration via feature selection by choosing a subset of features that are "relevant" and useful. The work in this talk aims at proposing a new dependence measure in feature selection. The features having strong dependence with the response variable are selected as candidate features. We proposes a new Gini correlation to measure dependence between categorical response and numerical feature variables. Compared with the existing dependence measures, the proposed one has both computational and statistical efficiency advantages that improve the feature selection procedure and therefore the resulting prediction model.
About the speaker
Xin Dang received a Bachelor of Science degree in Applied Mathematics from Chongqing University, China, in 1991, and Master's and PhD degrees, in 2003 and 2005, in Statistics from the University of Texas at Dallas. Currently she is a professor in the Department of Mathematics at the University of Mississippi. Her research interests include robust and nonparametric statistics, statistical and numerical computing, and multivariate data analysis. In particular, she has focused on data depth and applications, bioinformatics, machine learning, and robust procedure computation. Dr. Dang is a member of the IMS, ASA, ICSA and IEEE.
Maxim Doucet Hall 206
Contact Madi Angerdina to register for the luncheon.