Computational Molecular Biology

Computer Science 88/188
Winter, 2006

MWF 1:45-2:50
Place: 213 Sudikoff
Professor Bruce Randall Donald
113 Sudikoff Lab, x6-3173
www.cs.dartmouth.edu/~brd/Teaching/Bio/current


Overview | Schedule | Bibliography | Some Relevant WWW Links
How to give a good talk | Projects | Reports | Grading



Overview

Some of the most challenging and influential opportunities for computer science arise in developing and applying information technology to understand the molecular machinery of the cell. Recent work shows that many algorithmic techniques may be fruitfully applied to the challenges of computational molecular biology. This research may lead to computer systems and algorithms that are useful in structural molecular biology, proteomics, and rational drug design.

Concomitantly, a wealth of interesting computational problems arise in proposed methods for discovering new pharmaceuticals. Among these problems are: identifying the low-energy conformations of molecules, interpreting protein NMR (nuclear magnetic resonance) and X-ray data, inferring constraints on the shape of active drug molecules based on measurements of activity of related drug molecules, and docking candidate drug molecules to known protein targets.

This seminar is open to graduate students, and advanced undergraduates with a background in both algorithms and systems (at least CS 25 and CS 23). A background in biology is useful but not required. Students should be interested in doing some outside reading in biochemistry and biophysics. Students will be required to present papers in the seminar, and to do a project. Non-CS students (e.g., in biology and chemistry) with an interest in computational issues are invited as well; please speak with me about your background first though.

If you took my previous CS-Bio seminar, I estimate that the papers we will read will have only about 20% overlap. I plan for us to read a largely different corpus, reading new papers.

Motivation

Computational biology is at the core of scientific computation, and both solves real biological problems, and contributes back to computer science. We use and extend computational techniques including statistical methods, provable interleaving strategies, AI techniques, numerical methods, optimization, branch and bound algorithms, expectation/maximization, stochastic labeling and Markov random field paradigms. In this field, computational techniques are central, and the applications present intriguing problems to computer scientists who design algorithms and implement systems. We will develop both upper and lower bounds in the setting of novel algorithms for biophysical problems. For example, to quote Richard Karp, ``The Celera whole-genome shotgun sequencing algorithm is an instance of a general approach to combinatorial problem solving in which constraints on the solution are enforced in an order determined by the strength of the evidence for them. Should this approach be studied within theoretical computer science?" (Keynote address, Computational Systems Bioinformatics Conference, 2003).

You may wish to read about research at Dartmouth in this area: http://www.cs.dartmouth.edu/~brd/Research/Bio/

You may wish to see a list of papers we have read in in previous offerings of this course: http://www.cs.dartmouth.edu/~brd/Teaching/ and 2002 course. .


Here is the collection of all lecture notes.



How to Give a Good Talk

If you are scheduled to give a talk, I've prepared a set of hints for giving a good talk. Follow every atom of every letter of every word of advice in these rules.

Here is a list of ways to give a terrible talk, that you should read, and then avoid, evade, elude, shun, and eschew (avoid stresses forethought and caution in keeping clear of danger or difficulty; evade implies adroitness, ingenuity, or lack of scruple in escaping or avoiding; elude implies a slippery or baffling quality in the person or thing that escapes; shun often implies an avoiding as a matter of habitual practice or policy and may imply repugnance or abhorrence; eschew implies an avoiding or abstaining from as unwise or distasteful).

For your talk, make slides (either by hand, or electronically). Do not use the board during your talk. The reason for this is that all students can learn in the course of this class, to give a good talk using slides. To give a good talk using the board -- that is, to teach board technique, is much more difficult, and beyond a scope of this course. Do not go back and forth between your slides and the board during your talk. If there's something that you need to explain that you plan to use the board for -- don't! Instead, put this material on your slides!

The one exception is that if you get a question from the audience, you may use the board to answer it. However: what you should do is try to anticipate what questions you expect from audience ahead of time, and make extra slides to answer these, to have just in case.

In your talk, try to go into some technical detail. Your goal should be to show the class something technical -- and teach them something concrete and technical rather than the skim over everything. You want to go into some depth -- describe at least two algorithms in detail, show two theorems in detail, etc. Applications are good, but only if you've covered something technical -- an algorithm, a theorem etc. -- first.

Be prepared for your talk. If there are things that you don't understand you need to read more papers on the subject to fill in the holes. This class is not just about reading the assigned papers -- you need to read some background reading if there are things that you don't understand. Basically one strategy to do this is to search for related papers that answer the question, or back-chain from the references in the papers you are assigned. The mind-set to have is: pretend this is research. If there is something you don't understand, you cannot just say 'I don't know.' You have to do some research -- i.e. reading and thinking -- to figure it out, just as you would for your thesis!

Occasionally students want to include a figure from the PDF of the paper in their talk. This is okay -- so long as it is not overdone -- but if you do this be sure to use the "snapshot tool" in Adobe Acrobat. Before using the snapshot tool, increase the size of the image to the maximum possible -- this will make sure that the resolution is sufficient so the image is not blurry in your presentation.

Under no circumstances should you use the "Mac Grab" feature available on a Macintosh -- the resulting PowerPoint will not be machine independent and will only work on a Mac.

Here is an example of how to overdo this business of grabbing images from the paper to use in your talk. Never make these errors!



Projects

Students will be required to do a project. Pick something in computational biology you are interested in, and (a) implement it, (b) analyze it, (c) improve it, (d) extend it, or (e) apply it.

A 4-5 page written project proposal is due on January 30.

Final projects are due on the last day of class. You must

  1. Turn in a written report,
  2. Make a web page about your project, and
  3. Prepare a short presentation for the class on what you did. Make slides for your presentation.



Reports

You may be assigned one or more reports to do during this class. This section discusses what is entailed in a report.

(Borrowed, with thanks, from Greg Gangor's description of the reviews used in his class at CMU). Your reports should:

We do not want a book report or a repeat of the paper's abstract. Rather, we want your considered opinions about the key points indicated above. Of course, if you have an insight that doesn't fit the above format, please include it as well. Your reports will be graded on content, not length. For most of the papers we read, one or two well thought-out paragraphs should be sufficient. You are, of course, welcome to write as much as you want.
If you were not assigned to do an in-class presentation, you must, in addition to the project, write a critique (report) on one of the papers we read. Your critique should be a detailed analysis of the methods presented, their flaws, strengths, and weaknesses. You should consider improvements and extensions in your critique. Reports should be about 10 pages single-spaced.

You must

  1. Turn in a written critique, and
  2. Make a web page about your critique.



Recommended Textbooks

Here is a list of recommended textbooks.

How to Exchange Files

We share a common file system so it is criminal to send enclosures. Never send enclosures for anything related to this course. If you have an account on the CS Unix Filesystems, send a pointer to the filename. Or, put it on the web and send the URL.



Grading

Grading: Grades will be based upon (a) your presentations in class, (b) your project, (c) class participation/discussion, and (d) assigned homweorks. If you are not giving a presentation, "(a)" will be graded based on your report.



Schedule and Readings

Each week, we will meet twice out of our three (M,W,F) slots. Which two days we use will vary. Some weeks we may meet three times. Please keep all three days open.

Please check this webpage, and schedule frequently, since I will post new papers and new readings and new assignments frequently, as we proceed through the term.

Please note: These dates and times might move some (see "The Queue", below), as we adapt to the time required to discuss the papers, or if I am unexpectedly called to Washington, etc.

*Papers that are not available online (below) have been handed out on paper.

*RECOMB papers (Proceedings of the Nth Annual International Conference on Computational Molecular Biology (N=1,2,3,4,...)) are available online via the ACM Digital Library.

In case the links at ACM, PNAS, etc. are down, Here is a local copy of many of the papers.

A few papers will be handed out in class. If you miss class, you can copy them from a classmate.

Announcements will be made in class. I will try to post them here, so consult this website.

Here is a useful bibliography of papers (and PDFs) in the area of this course.


Begin schedule
  • F 1/6 and M 1/9
    NMR ensemblePresenting: Bruce.
    [Lecture Notes ]
    Proteins and NMR Structural Biology

    Reading:
  • W 1/11
    Presenting: Bruce Donald.
    [Lecture Notes ]
    Computational Protein Design
    Abstract


  • Date: F 1/13 NMR ensemblePresenting: Bruce
    [Lecture Notes ]
    Residual Dipolar Couplings (RDCs) in NMR Structural Biology

    Reading:
  • Date: W 1/18 and Th 1/19 (1pm)
    Presenting: Bruce
    [Lecture Notes ]
    Topic: Nuclear Vector Replacement

    Reading:
  • Date: M 1/23
    Presenting: Serkan
    [Slides (PPT) ] [Lecture Notes ]
    Topic: Protein Flexibility (1): FIRST and NMA Basics

    Reading:
  • Date: W 1/25
    Presenting: John MacMaster
    [Slides (PPT) ] [Lecture Notes ]
    Topic: Protein Flexibility (2): Loop Closure and Inverse Kinematics

    Reading:
  • Date: F 1/27
    Presenting: John Thomas
    [Slides (PPT) ] [Lecture Notes ]
    Topic: Protein Flexibility (3): Using FIRST to Explore Flexibility using ROCK (and Applications in Ligand-Protein Binding) and FRODA

    Reading:
  • Date: 1/30
    Presenting: Fei
    [Slides (PPT) ] [Lecture Notes ]
    Topic: Protein Flexibility (4): Applications of NMA to Protein-Protein and Ligand-Protein Binding

    Reading:
  • Date: W 2/1
    Presenting: Jianyang (Michael)
    [Slides (PPT) ] [Lecture Notes ]
    Topic: Distance Geometry with Orientational Restraints

    Reading:
  • Date: F 2/3
    Presenting: Tony
    [The slides can be downloaded from the Unix file system: ~donaldclass/Bio/Slides06/yan/NOE.RDC.Presentation.07.class/]
    Topic: Solving the Structure Of Membrane Proteins

    Reading:
  • Date: M 2/6
    Presenting: Ivelin
    [Slides (PPT) ] [See also: ~donaldclass/Bio/Slides06/GraphCuts1.ppt] [Lecture Notes ]
    Topic: Graph Cuts for Nuclear Vector Replacement and Structure-Based NMR Assignment (1)

    Reading:
  • Date: W 2/8
    Presenting: Chittu
    [Slides (PPT) ] [See also: ~donaldclass/Bio/Slides06/graphCuts_chittu_v1.ppt] [Lecture Notes ]
    Topic: Graph Cuts for Nuclear Vector Replacement and Structure-Based NMR Assignment (2)

    Reading:
  • Date: F 2/10
    Presenting: Rahul
    [Slides (PPT) ] [Lecture Notes ]
    Topic: Protein Unfolding by Using Residual Dipolar Couplings

    Reading:
  • Date: M 2/13
    Presenting: Bruce
    Molecular Replacement, Protein Design, and Proteomics

    Reading:
    Queue

    What follows below is a queue of papers we will read next.


  • Date: F 2/15
    Presenting: Lincong
    [The slides can be downloaded from the Unix file system: ~donaldclass/Bio/Slides06/cs88_Protein-Ligand01_Lincong.ppt] [Lecture Notes ]
    Topic: Protein-Ligand Binding

    Reading:
  • Date: F 2/17
    Presenting: Lincong
    Topic: Protein-folding and Enzyme Dynamics

    Reading:
  • Date: W 2/20
    Presenting: Chittu
    Topic: More on Graph Cuts for Nuclear Vector Replacement and Structure-Based NMR Assignment (2)

    Reading:
  • Date: M 2/27
    Presenting: Igor
    [Slides (PPT) ] [See also: ~donaldclass/Bio/Slides06/CompBio2006_v2.ppt] [Lecture Notes ]
    Topic: Analyzing Protein Structure by Using Ensemble Representation

    Reading:
  • Date: W 3/1
    Presenting: Xiaoduan
    [Slides (PPT) ] [Lecture Notes ]
    Topic: NMR Resonance Assignment Assisted by Mass Spectrometry

    Reading:
  • Date: F 3/3
    Presenting: John Thomas
    [Slides (PPT) ] [Lecture Notes ]
    Topic: Automated NMR Resonance Assignment

    Reading:
  • Date: M 3/6
    Presenting: Tony
    [Slides (PPT) ] [Lecture Notes ]
    Topic: Enzyme Redesign by SVM

    Reading:
  • Date: W 3/8
    Presenting: John MacMaster
    [Slides (PPT) ] [Lecture Notes ]
    Topic: Flexible Ligand-Protein Docking

    Reading: Class projects are due today in class. You must turn in a hardcopy by 1:45 Weds 3/8.
  • Date: TBA
    Presenting: Fei
    Topic: Receptor Flexibility in Ligand Design and Docking

    Reading:
  • Date: TBA
    Presenting: Ivelin
    Topic: Computational Enzyme Design

    Reading:
  • Date: TBA
    Presenting: Chittu
    Topic: Protein-Protein Docking with Multiple Residue Conformations

    Reading:
  • Date: TBA
    Presenting: Xiaoduan
    Topic: Molecular Motions by Using Residual Dipolar and Hydrogen-Bond Scalar Couplings

    Reading:
  • Date: TBA
    Presenting: Jianyang (Michael)
    Topic: Statistical Coil Model of the Unfolded State

    Reading:
  • Date: TBA
    Presenting: John MacMaster
    Topic: Automated NMR Assignment and Structure Determination

    Reading:
    Papers that we read last year included the following. We will read different papers this year but this is to give you an idea of the kind of papers we may read::

  • Date: TBA
    Medline (PubMed) Example
    NMR ensemblePresenting: TBA
    RDCs, Dynamics and Ensembles

    [Slides]
    Reading:
  • Date: TBA
    NMR ensemblePresenting: TBA
    [Slides]
    Protein Structure Determination using Residual Dipolar Couplings

    A 4-5 page written project proposal is due on January 30.

    Reading:


  • Date: TBA
    Presenting: TBA
    [Slides] DNA Self-Assembly and Computation

    Reading:
  • Date: TBA
    NMR ensemblePresenting: TBA
    [Slides]
    Distance Geometry

    Reading:
  • Date: W 10/27
    Presenting: Bruce
    Proteomics and Computatonal Structural Biology

    Reading:
  • Date: TBA
    NMR ensembleGuest lecture: TBA
    Note unusual place: 006 Steele
    Chiral Mutagenesis of Insulin. Foldability and Function are Inversely Regulated by a Stereospecific Switch in the B Chain

    Michael Weiss, Professor and Chairman of the Biochemistry Department at the Case Western Reserve University Medical School, will be visiting next Thursday (10/28) and giving the Chemnistry Department Colloquium. His talk is entitled: "Chiral Mutagenesis of Insulin. Foldability and Function are Inversely Regulated by a Stereospecific Switch in the B Chain". His research focuses on the structural biology of proteins and enzymes, and the regulation of gene expression. He is an expert in the use of high field NMR spectroscopy to address questions on protein structure and function. For more information, please see his CWRU website: here.
  • Date: TBA
    NMR ensemblePresenting: TBA
    [Slides]
    Distance Geometry, continued (*).

    Reading:
  • Date:TBA
    Presenting: TBA
    Protein design

    [Slides]
    Reading:
  • Date: TBA
    Presenting: TBA
    Enzyme design

    [Slides]
    Reading:
  • Date: TBA
    NMR ensemblePresenting: TBA
    Bayesian Assignment and Direct Methods for NMR

    [Slides]
    Reading:
  • Date: TBA
    NMR ensemblePresenting: TBA
    Slides
    Rotating or Spinning Samples in order to Scale RDCs

    Reading:
  • Date: TBA
    Presenting: TBA
    Topic: Minimized DEE and using A* Search to approximate K*

    Main reading:
  • Date:
    No Class: office Dartmouth Holiday.

  • Date: TBA
    NMR ensemblePresenting: TBA
    Class Project: MEMS and Nanotechnology Techniques for Aligning Proteins in Solution

    Reading:
  • Date: W 12/1
    Presenting: John Thomas, John MacMaster, Xiaoduan
    Last day of class.
    Class Projects



  • Date: TBA
    Presenting: TBA
    Topic

    Reading:

    Syllabus

    For an example of the kind of papers we will read, please see This page. . If you took my previous CS-Bio seminar, I estimate that the papers we will read will have only about 20% overlap. I plan for us to read a largely different corpus, reading new papers.

    Supplementary material and links

  • Some other papers you may read
    1. Here is a useful bibliography of papers (and PDFs) in the area of this course.

    2. Whitepaper on Advanced Computational Structural Genomics (read the long version, not the "lite" version).
    3. Fast detection of common substructure in proteins, P. Chew, K. Kedem, J. Kleinberg, and D. Huttenlocher (RECOMB'99).
    4. Rick Lathrop Lab Other papers we may read include:
    5. Date: TBA


    Some Relevant WWW Links

  • AMMP.
  • Read the white paper on Advanced Computational Structural Genomics
  • Computational biology research at Dartmouth.
  • Check out Donald Lab Papers at
  • RECOMB'99
  • Intelligent Systems in Molecular Biology (ISMB) (all meetings).
  • Dartmouth M.D.-Ph.D. Program
  • Web sites of interest to structural biologists.
  • A large resource page on computational biology at George Mason University.
  • A large resource page on bioinformatics at the Institut Pasteur.
  • CARB Biocomputing Resources.
  • A list of protein folding groups on the web.
  • The WWW Virtual Library page on biomolecules.
  • Donald Lab.
  • The Journal of Computer-Aided Molecular Design
  • Some resources and descriptions of problems in Computational Biology.


    Notes
    Related Resources on the World Wide Web

    General Notes

    Muscle-Specific Regulation of Transcription: A Catalog of Regulatory Elements by Laura L. L-pez and James W. Fickett presents a summary of published information on muscle-specific transcriptional regulation.

    Pedro's BioMolecular Research Tools is a collection of WWW links to information and services useful to molecular biologists. It provides links to molecular biology search and analysis tools; bibliographic, text, and Web search services; guides and tutorials; and biological and biochemical journals and newsletters.

    The World Wide Web Virtual Library: Biosciences points to virtual library pages for Biomolecules, and Biochemistry and Molecular Biology. Each of these pages presents a long list of Web resources. The World Wide Web Virtual Library Biomolecules covers molecular sequence and structure databases, metabolic pathway databases, and other lists of Web resources. The World Wide Web Virtual Library: Biochemistry and Molecular Biology is a list of resources listed by provider.

    Cell & Molecular Biology Online is a well-organized list of Web resources for cell and molecular biologists. For each resource, a brief description is provided.

    CSUBIOWEB, the California State University Biological Sciences Web server, provides links to other Web sites on cell biology and molecular biology.

    The Dictionary of Cell Biology (London: Academic Press, 1995) defines transcription, leucine zipper, and other terms used in this research commentary.

    Biotech Life Science Dictionary is a free resource that defines terms in biochemistry, biotechnology, botany, cell biology, and genetics, including terms used in this research commentary.

    Protein Synthesis is a tutorial on the processes involved in Protein Synthesis, starting from the genetic information in DNA, through transcription to produce messenger RNA, and translation of mRNA to a polypeptide. This tutorial is a section of Principles of Protein Structure Using the Internet, a Birkbeck College (University of London) accredited Advanced Certificate course.

    Numbered Notes

    1. Reading the Messages in Genes describes transcription and provides a diagram. This page is a unit of Access Excellence, a national educational program sponsored by Genentech that provides high school biology teachers access to their colleagues, scientists, and critical sources of new scientific information via the Web.

    2. The MIT Biology Hypertextbook is a Web-based textbook developed for introductory biology courses at MIT. Central Dogma provides an illustrated description of the process of transcription.

    3. DNA binding proteins, enhancers, and the control of gene expression describes transcription and transcription factors. This page was developed by Ronald R. D. Croy as a component of Course Notes for Molecular Genetics I Lectures.

    4. Control of Gene Expression in Eukaryotes by Phillip McClean is a tutorial on gene regulation. The Transcription Complex provides a brief discussion of transcription factors.

    5. The Mechanisms of Gene Regulation are outlined in Microbial Genetics Lecture Notes, developed by L. S. Pierson III and C. Kennedy for a class at the University of Arizona.

    6. The Wolberger Lab lists publications of Cynthia Wolberger and her co-workers.

    7. Introduction to the Metazoa describes the metazoan phyla. This introduction is a chapter of The Phylogeny of Life, an online exhibit developed by the University of California Museum of Paleontology.

    8. Protein Zippers describes the leucine zipper and provides an illustration.

    9. Barbara Graves' research is described and selected publications are listed on the Huntsman Cancer Institute Web page at the University of Utah.

    Some Useful References for the Course

    Protein Science

    Biochemistry

    Cell Biology

    Hypertextbooks

  • BioComputing, for the VSNS-Biocomputing Division Course
  • Biology, developed by Shane Crotty, MIT
  • Course/Tutorial on Cell Biology, Mark Dalton, Cray Research
  • Principles of Biochemistry, Horton, Moran, Ochs, Rawn, Scrimgeour

    Return to top of page