【大师讲坛】第225期:通过统计交叉科学建立端到端、可扩展、可解释的数据科学生态系统Build an end-to-end scalable and interpretable data science ecosystem by integrating statistics, ML, and domain sciences

2024-07-03 09:30:00-11:00:00
理科群楼5&6号楼300号报告厅

The data science ecosystem encompasses data fairness, statistical, ML methods and tools, interpretable data analysis, and trustworthy decision-making. Rapid advancements in ML have revolutionized data utilization and enabled machines to learn from data more effectively. Statistics, as the science of learning from data while accounting for uncertainty, plays a pivotal role in addressing complex real-world problems and facilitating trustworthy decision-making. In this talk, I will discuss the challenges and opportunities involved in building an end-to-end scalable and interpretable data science ecosystem that integrates statistics, ML, and domain science. I will illustrate key points using the analysis of whole genome sequencing data and electronic health records by discussing a few scalable and interpretable statistical and ML methods, tools and data science resources, using large annotation databases, summary statistics, sparsity, and ensemble methods. This talk aims to ignite proactive and thought-provoking discussions, foster collaboration, and cultivate open-minded approaches to advance scientific discovery.

嘉宾介绍

林希虹

美国国家科学院院士
演讲主题:通过统计交叉科学建立端到端、可扩展、可解释的数据科学生态系统 Build an end-to-end scalable and interpretable data science ecosystem by integrating statistics, ML, and domain sciences
林希虹,美国国家科学院院士,美国国家医学院院士,哈佛大学公共卫生学院生物统计学系终身教授和前系主任,数量基因研究部主任,和统计系终身教授。她主要从事海量基因和健康数据,流行病数据的统计和机器学习方法的研究及应用。她曾任考普斯委员会主席,和计量生物学(Biometrics)和 Statistics in Bioscience杂志的主编,先后获得考普斯会长奖(2006年)、美国国家癌症研究院杰出研究员奖(2015年)、考普斯FN David奖(2017年)、美国国家统计研究院交叉研究Sacks奖(2022年)和哈佛大学统计科学Zelen领导力奖(2022年)。