|
Syllabus, Readings and Lecture Notes
Course Overview
Motif and cis-Regulatory Module (CRM) Modeling
- topics: learning motif models, learning models of cis-regulatory
modules, Gibbs sampling, Dirichlet priors,
parameter tying, sequence entropy, mutual information
- required reading
- T. Bailey and C. Elkan.
The value
of prior knowledge in discovering motifs with MEME.
In Proceedings of the 3rd International Conference on
Intelligent Systems for Molecular Biology, pp. 21-29, 1995.
- C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald, and
J. Wootton. Detecting
subtle sequence signals: a Gibbs sampling strategy for multiple alignment.
Science 262:208-214, 1993.
- O. Elemento, N. Slonim and S. Tavazoie.
A universal framework for regulatory element discovery across all genomes and data types.
Molecular Cell 28(2):337-350, 2007.
(Supplemental materials containing key methodological details)
- optional reading
- optional viewing
- lecture notes
- Learning Sequence
Motif Models using EM
(PDF, PPTX)
(1/23, 1/28)
- Learning Sequence Motif Models and Gibbs Sampling
(PDF, PPTX, Gamma
example, Dirichlet
example) (1/30)
- Inferring Models of cis-Regulatory Modules using Information
Theory
(PDF, PPTX)
(2/4, 2/6)
Genotype Analysis
- topics: haplotype inference, genome-wide association studies (GWAS),
quantitative trait loci (QTL) mapping, multiple hypothesis testing, convolutional neural networks, interpreting noncoding genetic variants
- required reading
- optional reading
- lecture notes
- Linking Genetic Variation to Phenotypes
(PDF, PPTX) (2/11)
- GWAS, multiple testing correction and QTLs
(PDF, PPTX) (2/13, 2/18)
- Interpreting noncoding variants
(PDF, PPTX)
(2/18, 2/20)
Epigenomics
- topics: epigenomic data types, DNase I hypersensitivity, Gaussian processes
- required reading
- R.I. Sherwood, T. Hashimoto, C.W. O'Donnell, S. Lewis, A.A. Barkal, J.P. van Hoff, V. Karun, T. Jaakkola, and D.K. Gifford. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat Biotechnol 32(2):171-178, 2014.
- J. Lever, M. Krzywinski, and N. Altman. Points of Significance: Classification evaluation. Nat Methods 13(8):603-604, 2016.
- optional reading
- lecture notes
Mass Spectrometry
- topics: peptide and protein identification with mass spectrometry
- required reading
- optional reading
- lecture notes
- Mass spectrometry
(PDF, PPTX)
(3/10)
Biological Network Analysis
- topics: protein interactions, pathway identification, linear programming, min cost flow
- required reading
- E. Yeger-Lotem, L. Riva, L.J. Su, A.D. Gitler, A.G. Cashikar, O.D. King, P.K. Auluck, M.L. Geddie, J.S. Valastyan, D.R. Karger, S. Lindquist, and E. Fraenkel. Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity. Nat Genet 41(3):316-323, 2009.
- optional reading
- lecture notes
- Identifying signaling pathways
(PDF, PPTX)
(3/24, 3/26)
Machine Learning Modeling
- topics: unsupervised learning, partitioning vs. hierarchical clustering, classification, support vector machine
- required reading
- optional reading
- lecture notes
- Machine learning applications in Bioinformatics
(PDF, PPTX)
(3/31, 4/2)
RNA-seq and Transcript assembly
- topics: RNA-seq technology, transcript quantification,
alternative splicing, splice graphs, transcript assembly
- required reading
- optional reading
- Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, et
al. Transcript assembly and quantification by RNA-Seq reveals
unannotated transcripts and isoform switching during cell
differentiation. Nat Biotechnol. 2010;28: 511–515.
- Z. Wang, M. Gerstein, and M. Snyder. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1): 57-63, 2009.
- A. Conesa, P. Madrigal, S. Tarazona, D. Gomez-Cabrero, A. Cervera, A. McPherson, M.W. Szczesniak, D.J. Gaffney, L.L. Elo, X. Zhang, and A. Mortazavi. A survey of best practices for RNA-seq data analysis. Genome Biology 17(13), 2016.
- Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT,
Salzberg SL. StringTie enables improved reconstruction of a
transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33:
290-295.
- lecture notes
- Transcript quantification and Analysis of alternative splicing with RNA-Seq
(PDF, PPTX)
(4/7,4/9)
RNA Structure Analysis
- topics: predicting RNA secondary structure, dynamic programming, Nussinov/energy-minimization algorithms,
stochastic context free grammars
- required reading
- optional reading
- lecture notes
Gene Finding
- topics: gene finding, interpolated Markov models, generalized HMMs, pair HMMs
- required reading
- optional reading
- lecture notes
- Gene Finding (PDF, PPTX) (4/21, 4/23)
Large-Scale Sequence Alignment
- topics: large-scale alignment, suffix trees, k-mer tries, longest increasing
subsequence problem, MUMmer
- required reading
- A. Delcher, S. Kasif, R. Fleischmann, J. Peterson, O. White
and S. Salzberg.
Alignment of Whole Genomes.
Nucleic Acids Research 27(11):2369-2376, 1999.
- M. Brudno, C. Do, G. Cooper, M. Kim, E. Davydov, NISC Comparative
Sequencing Program, E. Green, A. Sidow, and S. Batzoglou.
LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale
Multiple Alignment of Genomic DNA.
Genome Research 13:721-731, 2003.
- optional reading
- lecture notes
- Alignment of Long Sequences
(PDF, PPTX)
(4/28, 4/30)
Lecture Notes
Thank you to Professors Mark Craven, Tony Gitter and Colin Dewey for providing
lecture material. These slides, excluding third-party material, are
licensed
under CC BY-NC
4.0 by Mark Craven, Colin Dewey, Anthony Gitter and Daifeng Wang.
|