题目:Model-Free Feature Screening via Subsampling: A Unified Framework and Adaptive Two-Step Recovery of Mysterious Features
报告人:王启华
会议时间:2025年9月15日(周一) 14: 30
地点:综合楼615会议室
报告人简介:
王启华,教授,博士生导师。国家杰出青年基金获得者,教育部长江学者奖励计划特聘教授,中科院“百人计划”入选者,素人 特聘教授。主要从事不完全数据分析、高维数据分析及大规模数据统计推断等方面的研究。曾主持国家杰出青年基金项目、重点项目及多项面上项目。在国际重要刊物发表学术论文150多篇,出版专著三部,部分成果已产生二十年持久不断的学术影响。
报告摘要:
Feature screening is a common strategy in ultrahigh-dimensional data analysis. However, in the era of big data, classic feature screening methods face challenges in two aspects. First, the substantial sample size brings computational difficulties to classic screening methods due to limitations in data storage and computing resources. Second, covariates are often highly correlated in ultrahigh-dimensional data, in which case most existing screening methods may ignore significant covariates that are marginally uncorrelated with the response. In this paper, we first develop a general subsampling-based feature screening framework via sampling with replacement scheme. This framework considers a wide range of correlation measures, including model-free screening measures. The proposed general method enjoys the sure screening property under some mild assumptions and is computationally attractive. Furthermore, when strong dependence exists between covariates, we propose a two-step subsampling method based on the proposed general framework. In the first step, uniform sampling is used to select covariates with strong marginal correlation with the response. In the second step, kernel-based non-uniform sampling is designed to recruit important features from remaining covariates. %The proposed general framework is %used for the two steps with %different sampling strategies.
The two-step method can help to recover significant covariates that have no marginal correlations with response. The sure screening property is established for the two-step method. Simulation studies and a real data analysis are conducted to illustrate the empirical performance of our proposed methods.