【大师讲坛】第225期:通过统计交叉科学建立端到端、可扩展、可解释的数据科学生态系统Build an end-to-end scalable and interpretable data science ecosystem by integrating statistics, ML, and domain sciences
The data science ecosystem encompasses data fairness, statistical, ML methods and tools, interpretable data analysis, and trustworthy decision-making. Rapid advancements in ML have revolutionized data utilization and enabled machines to learn from data more effectively. Statistics, as the science of learning from data while accounting for uncertainty, plays a pivotal role in addressing complex real-world problems and facilitating trustworthy decision-making. In this talk, I will discuss the challenges and opportunities involved in building an end-to-end scalable and interpretable data science ecosystem that integrates statistics, ML, and domain science. I will illustrate key points using the analysis of whole genome sequencing data and electronic health records by discussing a few scalable and interpretable statistical and ML methods, tools and data science resources, using large annotation databases, summary statistics, sparsity, and ensemble methods. This talk aims to ignite proactive and thought-provoking discussions, foster collaboration, and cultivate open-minded approaches to advance scientific discovery.