|
Syllabus, Readings and Lecture Notes
Course Overview
Motif and cis-Regulatory Module (CRM) Modeling
- topics: learning motif models, learning models of cis-regulatory
modules, Gibbs sampling, Dirichlet priors,
parameter tying, sequence entropy, mutual information
- required reading
- T. Bailey and C. Elkan.
The value
of prior knowledge in discovering motifs with MEME.
In Proceedings of the 3rd International Conference on
Intelligent Systems for Molecular Biology, pp. 21-29, 1995.
- C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald, and
J. Wootton. Detecting
subtle sequence signals: a Gibbs sampling strategy for multiple alignment.
Science 262:208-214, 1993.
- O. Elemento, N. Slonim and S. Tavazoie.
A universal framework for regulatory element discovery across all genomes and data types.
Molecular Cell 28(2):337-350, 2007.
(Supplemental materials containing key methodological details)
- optional reading
- optional viewing
- lecture notes
- Learning Sequence
Motif Models using EM
(PDF, PPTX)
(1/28, 2/2)
- Learning Sequence Motif Models and Gibbs Sampling
(PDF, PPTX, Gamma
example, Dirichlet
example) (2/4, 2/9)
- Inferring Models of cis-Regulatory Modules using Information
Theory
(PDF, PPTX)
(2/11, 2/16)
Genotype Analysis
- topics: haplotype inference, genome-wide association studies (GWAS),
quantitative trait loci (QTL) mapping, multiple hypothesis testing, convolutional neural networks, interpreting noncoding genetic variants
- required reading
- optional reading
- lecture notes
- Linking Genetic Variation to Phenotypes
(PDF, PPTX) (2/18)
- GWAS, multiple testing correction and QTLs
(PDF, PPTX) (2/23, 2/25)
- Interpreting noncoding variants
(PDF, PPTX)
(3/2, 3/4)
Epigenomics
- topics: epigenomic data types, DNase I hypersensitivity, Gaussian processes
- required reading
- R.I. Sherwood, T. Hashimoto, C.W. O'Donnell, S. Lewis, A.A. Barkal, J.P. van Hoff, V. Karun, T. Jaakkola, and D.K. Gifford. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat Biotechnol 32(2):171-178, 2014.
- J. Lever, M. Krzywinski, and N. Altman. Points of Significance: Classification evaluation. Nat Methods 13(8):603-604, 2016.
- optional reading
- lecture notes
Network Biology
- topics: biological network analysis, protein interactions, pathway identification, linear programming, min cost flow
- required reading
- E. Yeger-Lotem, L. Riva, L.J. Su, A.D. Gitler, A.G. Cashikar, O.D. King, P.K. Auluck, M.L. Geddie, J.S. Valastyan, D.R. Karger, S. Lindquist, and E. Fraenkel. Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity. Nat Genet 41(3):316-323, 2009.
- optional reading
- T. Ideker, and R. Nussinov. Network approaches and applications in biology. PLoS Comput Biol, 13(10):e1005771, 2017.
- D-Y. Cho, Y-A. Kim, and T.M. Przytycka. Chapter 5: Network Biology Approach to Complex Diseases. PLoS Comput Biol, 8(12):e1002820, 2012.
- A. Barabasi, and Z. N. Oltvai. Network biology: understanding the cell's functional organization. Nat Rev Genet, 5:101-113, 2004.
- J.W. Chinneck. Practical Optimization: A Gentle Introduction.
- lecture notes
- Network biology
(PDF, PPTX)
(3/18, 3/23)
Applied Machine Learning Part I
- topics: unsupervised learning, partitioning vs. hierarchical clustering, classification, support vector machine
- required reading
- optional reading
- lecture notes
- Applied Machine Learning Part I
(PDF, PPTX)
(3/25, 3/30)
RNA-seq Analysis and Gene Discovery
- topics: RNA-seq technology, transcript quantification, gene finding, interpolated Markov models
- required reading
- B. Li, V. Ruotti, R.M. Stewart, J.A. Thomson, and C.N. Dewey. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26(4): 493-500, 2010.
- S. Salzberg, A. Delcher, S. Kasif, and O. White.
Microbial
gene identification using interpolated Markov models.
Nucleic Acids Research 26(2):544-548, 1998.
- Sections 3.1, 3.5 in Durbin et al.
- optional reading
- L.H. LeGault and C.N. Dewey. Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs. Bioinformatics 29(18): 2300-2310, 2013.
- A. Conesa, P. Madrigal, S. Tarazona, D. Gomez-Cabrero, A. Cervera, A. McPherson, M.W. Szczesniak, D.J. Gaffney, L.L. Elo, X. Zhang, and A. Mortazavi. A survey of best practices for RNA-seq data analysis. Genome Biology 17(13), 2016.
- Sections 3.4, 4.1 in Durbin et al.
- C. Burge and S. Karlin. Prediction of complete gene structures in human
genomic DNA. Journal of Molecular Biology 268(1):78-94, 1997.
- I. Korf, P. Flicek, D. Duan, and M. Brent.
Integrating genomic homology into gene structure prediction.
Bioinformatics 17(Suppl. 1):S140-S148, 2001.
- lecture notes
- RNA-Seq analysis and gene discovery
(PDF, PPTX)
(4/1,4/6)
Applied Machine Learning Part II
- topics: multi-view learning, dimensionality reduction, more supervised and unsupervised learning approaches
- reading
- N. Nguyen, D. Wang. Multiview learning for understanding functional multiomics . PLoS Comput Biol. 16(4): e1007677. 2020
- O. Alter, P. O. Brown, D. Botstein. Singular value decomposition for genome-wide expression data processing and modeling . PNAS 97(18) 10101-10106, 2000.
- C. Kingsford, S. L. Salzberg. What are decision trees? . Nat Biotech 26:1011-1013, 2008.
- K. Yan et al. OrthoClust: an orthology-based network framework for clustering data across multiple species. Genome Biol 15, R100, 2014.
- D. Chicco. Ten quick tips for machine learning in computational biology . BioData Mining 10:35, 2017.
- C. Angermueller, T. Parnamaa, L. Parts, and O. Stegle. Deep learning for computational biology. Mol Syst Biol 12(7):878, 2016.
- lecture notes
- Applied Machine Learning Part II
(PDF, PPTX)
(4/8, 4/13)
Single Cell Omics
- topics: single cell sequencing data processing and analysis (scRNA-seq, scATAC-seq), cell-type regulatory networks, single cell deconvolution
- reading
- M. Luecken, F. Theis. Current best practices in single-cell RNA-seq analysis: a tutorial . Mol Syst Bio. 15:e8746. 2019
- A. Pratapa et al. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data . Nature Methods 17(2):147-154, 2020.
- B. Sande et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis . Nat Protoc. 15(7):2247-2276, 2020.
- T. Stuart, R. Satija. Integrative single-cell analysis. Nat Rev Genet. 20, 257–272, 2019.
- A. Cobos et al. Benchmarking of cell type deconvolution pipelines for transcriptomics data . Nat Commun. 11, 5650, 2020.
- P. Stahl et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science Vol. 353, Issue 6294, pp. 78-82, 2016.
- lecture notes
- Single Cell Omics
(PDF, PPTX)
(4/15, 4/20)
Advanced Topics in Bioinformatics
- topics: Challenges for machine learning applications, spatial transcriptomics, Imaging genetics, Artificial intelligence in drug discovery
- reading
- M. Libbrecht, W. S. Noble. Machine learning applications in genetics and genomics . Nat Rev Genet. 16, 321-332, 2015
- P. Stahl et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science Vol. 353, Issue 6294, pp. 78-82, 2016.
- L. Elliott et al. Genome-wide association studies of brain imaging phenotypes in UK Biobank . Nature 562, 201-216, 2018.
- J. Vamathevan et al. Applications of machine learning in drug discovery and development . Nat Rev Drug Discov 18, 463-477, 2019.
- R. Roscher et al. Explainable Machine Learning for Scientific Insights and Discoveries. IEEE Access 8, 42200-42216, 2020.
- lecture notes
- Advanced Topics in Bioinformatics
(PDF, PPTX)
(4/22)
Lecture Notes
Thank you to Professors Mark Craven, Tony Gitter and Colin Dewey for providing
lecture material. These slides, excluding third-party material, are
licensed
under CC BY-NC
4.0 by Mark Craven, Colin Dewey, Anthony Gitter and Daifeng Wang.
|