学术报告一：Transfer Learning and its use in medical/biological applications
报告人：龙泉，University of Calgary，Associate Professor
报告摘要：Machine Learning including deep learning techniques have been successfully used in many bigdata fields. However, a limitation of many machine learning tools is that one needs to have a very large sample size to train a model with many parameters. This may prevent the broader use of machine learning in sample-sparse domains. For instance, in medical genetics, the number of patients of a particular disease available for a research project may be at the level of hundreds or even dozens, which is way lower than the requirement of many machine learning techniques that are sample-hungry. Towards this line, researchers have developed a technique called “transfer-learning”, which can re-task an established general model (that are usually trained by very large sample) to a specific target using tailored samples of limited size. Such transfer-learning models open the door of developing many tools tailored to specific tasks using small samples with nimble training procedure. In this talk, I will first explain the basic theory of transfer-learning, followed by three projects using transfer-learning to characterize genetic basis of complex diseases by retasking a large general model.
报告人简介：Bio: I am an Associate Professor at University of Calgary, hosted by the Dept. of Biochemistry and Molecular Biology. Additionally, I have a joint appointment in the Dept. of Medical Genetics and an adjunct appointment in the Dept. of Math and Stats. I am a member of Alberta Children's Hospital Research Institute and Hotchkiss Brain Institute. I graduated from Peking University with a PhD in Applied Mathematics (majoring in software engineering). After graduation I worked in IBM Research as a Staff R&D Engineer for a year on reparation of programming errors leading to memory leak. Then I entered the wonderful biological world, starting by serving for path-finding genomics projects including human 1000 Genomes Project at the Wellcome Trust Sanger Institute, plant 1001 Genomes Project at the Gregor Mendel Institute, and Genotype-Tissue Expression Project at Icahn School of Medicine at Mount Sinai. During the course, I have developed various computational and statistical tools for genetics and genomics. Currently I am leading a research group to develop computational and statistical tools, focusing on genomic problems with high-dimensional features and low sample sizes. We are also interested in theoretical problems in machine learning. My works have been published in leading journals including Nature, Science, Nature Genetics, Molecular Biology & Evolution, Genetics, PLoS Genet, Bioinformatics, PLoS Comp Biol, and Genetic Epidemiology, attracting 30,000+ citations (Google Scholar). I served as a reviewer for established journals including Bioinformatics, Brief Bioinfo, Cell, Genome Biol, Mol Biol & Evol, Nature Biotech, Nature Comm, Nucleic Acids Res, PLoS Comp Biol, PLoS Genet, Science, Trends in Genet, and various journals in the field of statistics.
学术报告二：Stabilized COre gene and Pathway Election uncovers pan-cancer shared pathways and a cancer specific driver
报告人：张清润，University of Calgary，Assistant Professor
报告摘要：Approaches systematically characterizing interactions via transcriptomic data usually follow two systems: (1) co-expression network analyses focusing on correlations between genes; (2) linear regressions (usually regularized) to select multiple genes jointly. Both suffer from the problem of stability: a slight change of parameterization or dataset could lead to dramatic alternations of outcomes. Here, we propose Stabilized Core gene and Pathway Election, or SCOPE, a tool integrating bootstrapped LASSO and co-expression analysis, leading to robust outcomes insensitive to variations in data. By applying SCOPE to six cancer expression datasets (BRCA, COAD, KIRC, LUAD, PRAD and THCA) in The Cancer Genome Atlas, we identified core genes capturing interaction effects in crucial pan-cancer pathways related to genome instability and DNA damage response. Moreover, we highlighted the pivotal role of CD63 as an oncogenic driver and a potential therapeutic target in kidney cancer. SCOPE enables stabilized investigations towards complex interactions using transcriptome data. (This work has been accepted by Science Advances, a sister journal of Science with an impact factor of 14.)
报告人简介：Bio: I am an Assistant Professor in Biostatistics in the Department of Mathematics and Statistics in University of Calgary. I also have an adjunct appointment in the Department of Biochemistry and Molecular Biology, with membership in Arnie Charbonneau Cancer Institute and Alberta Children's Hospital Research Institute. I was trained in biotechnology (B.S. in Yunnan University) and statistical genetics (Ph.D. in Chinese Academic of Sciences, Beijing Institute of Genomics, supervisor: Dr. Jurg Ott). Before moving to Calgary, I have received my postdoctoral training in Gregor Mendel Institute and Icahn School of Medicine at Mount Sinai. Currently, I am leading a small research group to develop Machine Learning algorithms focusing on high-dimensional data, with applications to the integration of genomics and gene expression data to predict risk and occurrence of diseases, as well as drug responses. Collaborating with colleagues in the Cumming School of Medicine, we develop computational tools tailored to different diseases including cancers and neurodevelopment disorders. Our research addresses the following questions: (1) How to learn sensible representations of high dimensional data. (2) How to identify association and causality using data mining techniques for complex multi-scale data. (3) How to carry out statistical inference in noisy and biased data, in particular, single-cell sequencing data. Based on Google Scholar citation reports, my papers have been cited 20,000+ times.