素人

“数字+”与统计数据工程系列讲座(第108讲)9月29日厦门大学钟威教授来素人 讲座预告

发布者:施宇婷发布时间:2025-09-28浏览次数:12

讲座时间:2025929(周一)  1620

地点: 综合楼644会议室

报告题目:关于Statistical Stability(统计稳定性)的一些思考和探索 

报告人简介:

钟威,现任厦门大学王亚南经济研究院、经济学院统计学与数据科学系南强特聘教授、系主任、博士生导师。2012年获得美国宾夕法尼亚州立大学统计学博士学位,国家优青(2019),福建省杰青(2019),国家重大人才工程领军人才(教育部,2024)。主要从事高维数据统计分析、统计学习算法等研究。先后担任美国统计协会会刊JASA等6个期刊编委(AE),在Annals of Statistics, Journal of the American Statistical Association, Biometrika, Journal of Machine Learning Research, 中国科学数学等国内外统计学权威期刊发表(含接收)40多篇论文,获得2020年获得厦门大学青年教师技能比赛特等奖, 2022年获得霍英东教育基金会高等院校青年科学奖,2024年获得第九届高等学校科学研究优秀成果奖,2024年获得宝钢优秀教师奖,2025年以负责人获得厦门大学卓越教学团队奖等。

报告摘要:

Ensuring that scientific findings remain consistent across data perturbations, modeling choices, and computational environments is now recognized as a prerequisite for trustworthy knowledge. Building on this consensus, we note that stability and reproducibility are essential considerations in the application of statistical methods to scientific research. False Discovery Rate (FDR) control methods provide a rigorous framework for limiting false signals in multiple testing and regression. However, some of these methods yield unstable results due to the inherent randomness of the algorithms. For instance, different constructions of knockoff copies can lead to different sets of selected variables. To enhance the stability and reproducibility of statistical outcomes, we propose a unified stability approach for feature selection and multiple testing algorithms with FDR control, named FDR Stabilizer. Our method aggregates e-values  based on rank statistics generated from multiple runs of the base algorithm to construct stabilized e-values, which are then processed using the eBH procedure. This approach not only guarantees FDR control and power performance but also enhances stability. It is adaptable and can be applied to most existing FDR control methods. Moreover, we investigate the theoretical properties of the stability method, including asymptotic FDR control, power and stability guarantees. Extensive numerical experiments and applications to real datasets demonstrate that the proposed method generally outperforms existing alternatives.