DUYURULAR / ANNOUNCEMENTS

Sayın Katılımcılar,

 

 

 

 

 

 

 

DAVETLİ KONUŞMACILAR

Prof. Dr. Hamparsum BOZDOGAN
Prof. Dr. Hamparsum BOZDOGAN

Expected Volume Confidence Region Complexity (EVCR-COMP): A New Criterion for Model Selection in High-Dimensional Statistical Modeling

 

We introduce a novel model selection criterion for high-dimensional statistical modeling, termed Expected Volume Confidence Region Complexity (EVCR-COMP). Confidence intervals and regions are essential tools for quantifying uncertainty in high-dimensional data. The shape and volume of a confidence region depend on the underlying statistical model, the parameters being estimated, the sample size, and the desired confidence level. The expected volume reflects how uncertainty scales with the covariance structure of the model.

Despite its importance, limited work has addressed the construction and evaluation of confidence regions and their expected volumes in high-dimensional settings. We derive a closed-form analytical expression for EVCR-COMP within multivariate high-dimensional models. For a given data dimensionality, confidence level, and sample size, we propose incorporating the expected volume of the confidence region (EVCR) as a penalty term in model selection. This approach helps control model complexity and mitigate overfitting.

By integrating lack-of-fit, the entropic complexity of the covariance matrix, and EVCR, we formulate EVCR-COMP —a new criterion that generalizes traditional methods such as Akaike’s Information Criterion (AIC) and Schwarz’s Bayesian Criterion (SBC, or BIC). Notably, our framework introduces the level of statistical significance into model selection, bridging the gap between model selection and classical statistical inference.

We demonstrate the utility of EVCR-COMP through several applications in machine learning: selecting predictor subsets in Bayesian regression using EM and Genetic Algorithms (GA); determining the number of factors in Sparse Factor Analytic (SFA) models; choosing the number of components in Sparse Probabilistic Principal Component Analysis(SPPCA); and detecting influential observations in Multivariate Predictive Regression (MVR) models using both real and simulated datasets. Other potential applications will be also discussed in supervised and unsupervisedclassification in healthcare analytics, and in medicine.

Eskişehir Web Tasarım