BMI/CS 776 Course Project

Incomplete List of Project Suggestions

Design and evaluate motif-finding method that takes into account dependencies between non-adjacent positions.
Implement and empirically compare motif-finding methods that vary in the types of dependencies they can represent (e.g. zeroth and first-order Markov chains, Bayes nets, MDD).
Design, implement and evaluate an algorithm for identifying cis-regulatory modules (arrangements of binding site motifs that regulate a set of genes under certain conditions).
Compare the time to convergence and the resulting accuracy when EM and Gibbs sampling are used in a model with hidden state. The model could be a MEME-style motif model, another type of hidden Markov model, a stochastic context free grammar, a mixture model, etc.
Implement a method for gene finding that employs multiple genomes. Investigate how the accuracy of the predictions are affected by how closely related the informant genome is (e.g. you might use, say, mouse, zebrafish, and fruit fly as the informant genomes).
Implement and compare generative and discriminative probabilistic methods for a given task, such as gene finding.
Randomize a traditional filter for finding highly similar local similarities in sequences. Use an algorithm based on this randomization to efficiently find all high-scoring local alignments in a set of sequences.
Implement and experiment with an SCFG-based approach for identifying RNA genes via cross-genome comparisons.
Extend the method of Bockhorst and Craven for refining the structure of a context free grammar. Devise a new operator and an appropriate heuristic for applying it. Evaluate the method using a terminator data set.
Design an algorithm for the alignment of protein networks.
Implement and experiment with the module network approach of Segal et al.
Design, implement and evaluate a method that clusters genes using multiple sources of evidence, such as gene-expression data and text associated with the genes.
Design, implement and evaluate an algorithm for discovering themes in a set of scientific articles.
Devise a grammar for some type of biological named entity (e.g. gene/protein names). Implement and evaluate a model, based on the grammar, for recognizing entities of this type in scientific articles.
Design, implement and evaluate a method for classifying scientific articles according to their relevance to specific categories (e.g. whether they discuss tumor biology or not).