Computational
Molecular Biology
Computer Science 88/188
Winter, 2006
|
Overview
"Strictly speaking, molecular biology is not a new discipline, but
rather a new way of looking at organisms as reservoirs and
transmitters of information. This new vision opened up possibilities
of action and intervention that were revealed during the growth of
genetic engineering."
- Michel Morange,
"A History of Molecular Biology," Harvard
University Press (1998).
Some of the most challenging and influential opportunities for
computer science arise in developing and applying information
technology to understand the molecular machinery of the cell. Recent
work shows that many algorithmic techniques may be fruitfully applied
to the challenges of computational molecular biology. This research
may lead to computer systems and algorithms that are useful in
structural molecular biology, proteomics, and rational drug design.
Concomitantly, a wealth of interesting computational problems arise in
proposed methods for discovering new pharmaceuticals. Among these
problems are: identifying the low-energy conformations of molecules,
interpreting protein NMR (nuclear magnetic resonance) and X-ray data,
inferring constraints on the shape of active drug molecules based on
measurements of activity of related drug molecules, and docking
candidate drug molecules to known protein targets.
This seminar is open to graduate students, and advanced undergraduates
with a background in both algorithms and systems (at least CS 25 and
CS 23). A background in biology is useful but not required. Students
should be interested in doing some outside reading in biochemistry and
biophysics. Students will be required to present papers in the
seminar, and to do a project. Non-CS students (e.g., in biology and
chemistry) with an interest in computational issues are invited as
well; please speak with me about your background first though.
If you took my previous CS-Bio seminar, I estimate that the papers
we will read will have only about 20% overlap. I plan for us to read a
largely different corpus, reading new papers.
Motivation
Computational biology is at the core of scientific computation, and
both solves real biological problems, and contributes back to computer
science. We use and extend computational techniques including
statistical methods, provable interleaving strategies, AI techniques,
numerical methods, optimization, branch and bound algorithms,
expectation/maximization, stochastic labeling and Markov random field
paradigms. In this field, computational techniques are central, and
the applications present intriguing problems to computer scientists
who design algorithms and implement systems. We will develop both
upper and lower bounds in the setting of novel algorithms for
biophysical problems. For example, to quote Richard Karp, ``The Celera
whole-genome shotgun sequencing algorithm is an instance of a general
approach to combinatorial problem solving in which constraints on the
solution are enforced in an order determined by the strength of the
evidence for them. Should this approach be studied within theoretical
computer science?" (Keynote address, Computational Systems
Bioinformatics Conference, 2003).
You may wish to read about research at Dartmouth in this area:
http://www.cs.dartmouth.edu/~brd/Research/Bio/
You may wish to see a list of papers we have read in in previous
offerings of this course:
http://www.cs.dartmouth.edu/~brd/Teaching/
and 2002 course. .
Here is the collection of all lecture notes.
How to Give a Good Talk
If you are scheduled to give a talk, I've prepared a set of hints for giving a
good talk. Follow every atom of every letter of every word of
advice in these rules.
Here is a list of
ways to give a terrible talk, that you should read, and then avoid,
evade, elude, shun, and eschew (avoid stresses forethought
and caution in keeping clear of danger or difficulty; evade
implies adroitness, ingenuity, or lack of scruple in escaping or
avoiding; elude implies a slippery or baffling quality in the
person or thing that escapes; shun often implies an avoiding
as a matter of habitual practice or policy and may imply repugnance or
abhorrence; eschew implies an avoiding or abstaining from as
unwise or distasteful).
For your talk, make slides (either by hand, or electronically).
Do not use the board during your talk. The reason for
this is that all students can learn in the course of this class, to
give a good talk using slides. To give a good talk using the board --
that is, to teach board technique, is much more difficult, and beyond
a scope of this course. Do not go back and forth between your slides
and the board during your talk. If there's something that you need to
explain that you plan to use the board for -- don't! Instead, put
this material on your slides!
The one exception is that if you get a question from the audience, you
may use the board to answer it. However: what you should do is try to
anticipate what questions you expect from audience ahead of time, and
make extra slides to answer these, to have just in case.
In your talk, try to go into some technical detail. Your goal should
be to show the class something technical -- and teach them something
concrete and technical rather than the skim over everything. You want
to go into some depth -- describe at least two algorithms in detail,
show two theorems in detail, etc. Applications are good, but only if
you've covered something technical -- an algorithm, a theorem etc. --
first.
Be prepared for your talk. If there are things that you don't
understand you need to read more papers on the subject to fill in the
holes. This class is not just about reading the assigned papers -- you
need to read some background reading if there are things that you
don't understand. Basically one strategy to do this is to search for
related papers that answer the question, or back-chain from the
references in the papers you are assigned. The mind-set to have is:
pretend this is research. If there is something you don't understand,
you cannot just say 'I don't know.' You have to do some research --
i.e. reading and thinking -- to figure it out, just as you would for
your thesis!
Occasionally students want to include a figure from the PDF of the
paper in their talk. This is okay -- so long as it is not overdone --
but if you do this be sure to use the "snapshot tool" in Adobe
Acrobat. Before using the snapshot tool, increase the size of the
image to the maximum possible -- this will make sure that the
resolution is sufficient so the image is not blurry in your
presentation.
Under no circumstances should you use the "Mac Grab" feature available
on a Macintosh -- the resulting PowerPoint will not be machine
independent and will only work on a Mac.
Here is an example
of how to overdo this business of grabbing images from the paper to
use in your talk. Never make these errors!
Projects
Students will be required to do a project. Pick something in
computational biology you are interested in, and (a) implement it, (b)
analyze it, (c) improve it, (d) extend it, or (e) apply it.
A 4-5 page written project proposal is due on January 30.
Final projects are due on the last day of class. You must
- Turn
in a written report,
- Make a web page about your project, and
-
Prepare a short presentation for the class on what you did. Make
slides for your presentation.
Notes:
- (1) and (2) can be the same document.
- Put your webpage in the following place. If your username is
"erdmann", then put it at
"http://www.cs.dartmouth.edu/~erdmann/cs104/index.html".
- Your final report can be in html or PDF from pdflatex. If you
want to use another format, ask me first.
- I suggest your final report (and slides) should contain
illustrative pictures and figures.
- If you wrote code, I would like to see it. Please include the
code with your writeup, and link to it from your webpage.
- Some students will want to do projects close their thesis
area. If your thesis area is not molecular or structural
biological, there is a danger here:
- If your thesis area is not molecular or structural
biological, to make sure your project proposal is acceptable -- make
sure that in the proposal/project what you *mostly* write about are
the computational biology algorithms you invent, use, and implement
and how they worked -- what we don't want is a really long description
that's 90 percent about your (non-biology) research area, and only 10
percent about the important stuff: the computational biology
algorithms, how they work, what you did that is innovative etc. It
should be more like 5% -background vs. 95% - computational biology
algorithms and systems.
- It is important that this project exercise the kind of techniques
we're studying in this course. I would not want to see a project that
was essentially and exclusively on your (non-biological) thesis, that
did not use and explore algorithms from computational biology and
chemistry with some extensiveness.
- If your thesis is on a topic in computational molecular
biology, then I expect that your project would extend or innovate in
some way at an appropriate scale for one term -- for example I don't
want a project that is simply your last paper, written up for this
class. However, the project could be on your next paper -- and in the
past, several class projects for this class have turned into papers
that were published at prestigious conferences and journals in
computational biology and chemistry!!
Reports
You may be assigned one or more reports to do during this class. This
section discusses what is entailed in a report.
(Borrowed, with thanks, from Greg Gangor's description of the reviews used
in his class at CMU).
Your reports should:
- State at least three important things the paper says. These could
be some combination of their motivations, observations, interesting
parts of the design, or clever parts of their implementation.
- Describe at least one deficiency in the paper. Every paper has
some fault. Perhaps an experiment was poorly designed or the main idea
had a narrow scope or applicability. Being able to assess weaknesses
as well as strengths is an important skill for this course and beyond.
- Describe what conclusion(s) you draw from the paper as to
how to build and analyze computational biology algorithms and systems
in the future. Most of the assigned papers are have been significant
to the computational biology and/or computer science community and
have had some lasting impact on the area.
We do not want a book report or a repeat of the paper's abstract. Rather,
we want your considered opinions about the key points indicated above. Of
course, if you have an insight that doesn't fit the above format,
please include it as well.
Your reports will be graded on content, not length. For most of the papers
we read, one or two well thought-out paragraphs should be sufficient.
You are, of course, welcome to write as much as you want.
If you were not assigned to do an in-class presentation, you must, in
addition to the project, write a critique (report) on one of the
papers we read. Your critique should be a detailed analysis of the
methods presented, their flaws, strengths, and weaknesses. You should
consider improvements and extensions in your critique. Reports should
be about 10 pages single-spaced.
You must
- Turn in a written critique, and
- Make a web page about
your critique.
Notes:
- (1) and (2) can be the same document.
- Email me the URL for the webpage for your critique. E.g.,
"http://www.cs.dartmouth.edu/~hood/compbio/critique.html".
- Your critique can be in PDF, html, PostScript from LaTeX, or
PDF from LaTeX. If you want to use another format, ask me first.
Recommended Textbooks
Here is a list of recommended textbooks.
How to Exchange Files
We share a common file system so it is criminal to send
enclosures. Never send enclosures for anything related to this
course. If you have an account on the CS Unix Filesystems, send a
pointer to the filename. Or, put it on the web and send the URL.
Grading
Grading: Grades will be based upon (a) your presentations in class,
(b) your project, (c) class
participation/discussion, and (d) assigned homweorks. If you are not
giving a presentation, "(a)" will be graded based on your report.
Schedule
and Readings
Each week, we will meet twice out of our three (M,W,F)
slots. Which two days we use will vary. Some weeks we may meet three
times. Please keep all three days open.
Please check this webpage, and schedule frequently, since I will
post new papers and new readings and new assignments frequently, as we
proceed through the term.
Please note: These dates and times might move some (see "The Queue", below), as we adapt to the time
required to discuss the papers, or if I am unexpectedly called to
Washington, etc.
The
Queue
Student presentations will proceed in a strict rotation, ordered as a
queue. The queue order is:
We will not assign exact dates to presentations but only an order in
which the papers will be presented. This means that if you're
planning ahead, your presentation might be moved to the next class, if
our discussion takes longer. It will not be possible to plan to give
your presentation on a precise day for this reason. However, the
order of the presentations should be relatively stable, and, in
general you will not be asked to present earlier than the order
dictated by the queue. Moreover, in general, the paper you are
presenting will be determined well ahead of time so you can prepare.
*Papers that are not available online (below) have been handed out
on paper.
*RECOMB papers (Proceedings of the Nth Annual
International Conference on Computational Molecular Biology
(N=1,2,3,4,...))
are available online via the
ACM Digital Library.
In case the links at ACM, PNAS, etc. are down, Here is a local copy of many of the
papers.
A few papers will be handed out in class. If you miss class, you can
copy them from a classmate.
Announcements will be made in class. I will try to post them here, so
consult this website.
Here is a useful bibliography of
papers (and PDFs) in the area of this course.
Begin schedule
F 1/6 and M 1/9
Presenting: Bruce.
[Lecture Notes ]
Proteins and NMR Structural Biology
Reading:
- Background Reading:
- Cavanagh et al, chapter 8.
- Reference: Protein NMR Spectroscopy : Principles and Practice by John
Cavanagh,
Arthur G., III Palmer, Wayne Fairbrother (Contributor), Nick Skelton
(Contributor) Hardcover - 587 pages (April 1996) Academic Pr; ISBN:
0121644901
- Refer to Wüthrich as needed for reference
- Reference: NMR of Proteins and Nucleic Acids by Kurt Wuthrich Hardcover - 320
pages (September 1986) John Wiley & Sons; ISBN: 0471828939
-
Online Tutorials, Notes, and References on
NMR
W 1/11
Presenting: Bruce Donald.
[Lecture Notes ]
Computational Protein Design
Abstract
- Computational approaches to Protein Design:
- Dead-end elimination
- Dynamic programming
- Branch & Bound
- Energy minimization
- Parallelization
Date: F 1/13
Presenting: Bruce
[Lecture Notes ]
Residual Dipolar Couplings (RDCs) in NMR Structural Biology
Reading:
- Martin Blackledge, Dipolar Couplings in Partially Aligned Macromolecules - New Directions in Structure Determination using Solution State NMR. EMBO practical course 2003: Structure determination of biological macromolecules by solution NMR. PDF
- Residual Dipolar Couplings in Structure Determination of
Biomolecules,
J. H. Prestegard et al, Chem Rev 2004.
[PDF]
- C. Langmead and B. R. Donald. An expectation/maximization
nuclear vector replacement algorithm for automated NMR resonance
assignments. Jour. Biomolecular NMR, 29(2):111-138, 2004. [PDF]
- Annual Review of Biophysics and Biomolecular Structure, Vol. 33:
387-413 (June 2004) (doi:10.1146/annurev.biophys.33.110502.140306)
Residual Dipolar Couplings In NMR Structure Analysis, Rebecca
S. Lipsitz and Nico Tjandra. PDF
- Losonczi, J. A., Andrec, Michael, Fischer, Mark, Prestegard,
James H. Order Matrix Analysis of Residual Dipolar Couplings Using
Singular Value Decomposition. Journal of Magnetic Resonance, Vol. 138,
1999: 334-342. PDF
- Here is a wonderful textbook
that covers the Singular Value Decomposition (SVD), Penrose
pseudo-inverse, and other useful numerical methods.
-
L. Wang and B. R. Donald.
Exact solutions for internuclear vectors and backbone dihedral angles
from NH residual dipolar couplings in two media, and their application in a
systematic search algorithm for determining protein backbone structure.
Jour. Biomolecular NMR, 29(3):223-242, 2004.
[PDF]
Date: W 1/18 and Th 1/19 (1pm)
Presenting: Bruce
[Lecture Notes ]
Topic: Nuclear Vector Replacement
Reading:
- An Expectation/Maximization Nuclear Vector Replacement Algorithm for Automated NMR Resonance Assignments. Journal of Biomolecular NMR 2004; 29(2):111-138.
PDF
- 3D Structural Homology Detection via Unassigned Residual Dipolar
Couplings, (with C. Langmead) Proc. IEEE Computational Systems
Bioinformatics Conference (CSB), Stanford University, Palo Alto
(August 10, 2003) pp. 209-217. ISBN 0-7695-2000-6.
PDF
- High-Throughput 3D Structural Homology Detection via NMR
Resonance Assignment. The IEEE Computational Systems Bioinformatics
Conference (CSB), Stanford CA, (August, 2004) pp. 278-289.
PDF
Date: M 1/23
Presenting: Serkan
[Slides (PPT) ]
[Lecture Notes ]
Topic: Protein Flexibility (1): FIRST and NMA Basics
Reading:
- D.J. Jacobs, A.J.Rader, L.A. Kuhn, and M.F. Thorpe. Protein Flexibility Predictions Using Graph Theory. Proteins: Structure, Function, and Genetics 2001; 44:150-165.
PDF
- K. Hinsen. Normal Mode Theory and Harmonic Potential Approximation.
PDF
Date: W 1/25
Presenting: John MacMaster
[Slides (PPT) ] [Lecture Notes ]
Topic: Protein Flexibility (2): Loop Closure and Inverse Kinematics
Reading:
- R. Singh, B. Berger. ChainTweak: Sampling from the Neighbourhood of a Protein Conformation. Pacific Symposium on Biocomputing 2005: 54-65. PDF
- K. Noonan, D. O'Brien, and J. Snoeyink. Probik: Protein Backbone Motion by Inverse Kinematics. The International Journal of Robotics Research 2005; 24(11): 971 - 982. PDF
- http://www4.cs.umanitoba.ca/~jacky/Teaching/Courses/74.795-Humanoid-Robotics/ReadingList/chap3-forward-kinematics.pdf PDF
Date: F 1/27
Presenting: John Thomas
[Slides (PPT) ]
[Lecture Notes ]
Topic: Protein Flexibility (3): Using FIRST to Explore Flexibility using ROCK (and Applications in
Ligand-Protein Binding) and FRODA
Reading:
- M.I. Zavodszky, M. Lei, M.F. Thorpe, A.R. Day, and L.A. Kuhn. Modeling Correlated Main-Chain Motions in Proteins for Flexible Molecular Recognition. Proteins: Structure, Function, and Genetics 2004; 44:150-165. PDF
- S. Wells, S. Menor, B. Hespenheide, and M.F. Thorpe. Constrained Geometric Simulation of Diffusive Motion in Proteins. Phys. Biol. 2 (2005) S127-S136. PDF
Date: 1/30
Presenting: Fei
[Slides (PPT) ]
[Lecture Notes ]
Topic: Protein Flexibility (4): Applications of NMA to Protein-Protein and Ligand-Protein Binding
Reading:
- D. Tobi and I. Bahar. Structural Changes Involved in Protein Binding Correlate with Intrinsic Motions of Proteins in the Unbound State. PNAS 2005; 102(52):18908-18913. PDF
- C.N. Cavasotto, J.A. Kovacs, R.A. Abagyan. Representing Receptor Flexibility in Ligand Docking through Relevant Normal Modes. J Am Chem Soc. 2005 Jul 6;127(26):9632-40. PDF
Date: W 2/1
Presenting: Jianyang (Michael)
[Slides (PPT) ]
[Lecture Notes ]
Topic: Distance Geometry with
Orientational Restraints
Reading:
- M. Badoiu, and E.D. Demaine, and M.T. Hajiaghayi, and P. Indyk. Low-Dimensional Embedding with Extra Information. Proceedings of the twentieth annual symposium on Computational geometry 2004. PDF
- Saxe, J. B. Embeddability of weighted graphs in $k$-space is strongly {NP}-hard. Proceedings of the 17th Allerton Conference on Communications, Control, and Computing, pages 480--489, 1979. PDF
- L. Wang and B. R. Donald. Exact solutions for internuclear vectors and backbone dihedral angles from NH residual dipolar couplings in two media, and their application in a systematic search algorithm for determining protein backbone structure.
Jour. Biomolecular NMR, 29(3):223-242, 2004. [PDF]
- Bernard Chazelle, Carl Kingsford, Mona Singh: A Semidefinite Programming Approach to Side Chain Positioning with New Rounding Strategies. INFORMS Journal on Computing 16(4): 380-392 (2004). PDF
- P. Biswas, T.-C. Liang, T.C. Wang and Y. Ye. Semidefinite Programming for Ad Hoc Wireless Sensor Network Localization. Appeared in IPSN 2004, to appear in ACM J on Transactions on Sensor Networks (2006). PDF
Date: F 2/3
Presenting: Tony
[The slides can be downloaded from the Unix file system: ~donaldclass/Bio/Slides06/yan/NOE.RDC.Presentation.07.class/]
Topic: Solving the Structure Of Membrane Proteins
Reading:
- S. Potluri, A.K. Yan, J.J. Chou, B.R. Donald, and C. Bailey-Kellogg. Structure Determination of Symmetric Homo-oligomers by a Complete Search of Symmetry Configuration Space using NMR Restraints and van der Waals Packing. Submitted. [The draft can be downloaded from the Unix file system: ~donaldclass/Bio/Papers/yan/]
- J.A. Losonczi, M. Andrec, M.W. Fischer, J. H. Prestegard. Order matrix analysis of residual dipolar couplings using singular value decomposition. J Magn Reson. 1999 Jun;138(2):334-42. (1999) PDF
- A.K. Yan and B.R. Donald. Symmetry, Goniometers, and RDC's. Pre-print. [The draft can be downloaded from the Unix file system: ~donaldclass/Bio/Papers/yan/]
- L. Wang and B.R. Donald. Exact Solutions for Internuclear Vectors and Backbone Dihedral Angles from NH Residual Dipolar Couplings in Two Media, and Their Application in a Systematic Search Algorithm for Determining Protein Backbone Structure. Journal of Biomolecular NMR 2004; 29(3):223-242. PDF
- Figure 1 from [Exact solutions for chemical bond orientations from residual dipolar
couplings. William J. Wedemeyer, Carol A. Rohl, Harold A. Scheraga. Journal of Biomolecular NMR, 22: 137-151, 2002.] PDF
Date: M 2/6
Presenting: Ivelin
[Slides (PPT) ]
[See also: ~donaldclass/Bio/Slides06/GraphCuts1.ppt]
[Lecture Notes ]
Topic: Graph Cuts for
Nuclear Vector Replacement and Structure-Based NMR Assignment (1)
Reading:
- Computing Visual Correspondence with Occlusions using Graph Cuts (Kolmogorov and Zabih, ICCV '01) PDF. See also: http://www.cs.cornell.edu/~rdz/graphcuts.html
- Markov Random Fields with Efficient Approximations (Boykov, Veksler and Zabih, CVPR '98) PDF.
- An Expectation/Maximization Nuclear Vector Replacement Algorithm for Automated NMR Resonance Assignments. Journal of Biomolecular NMR 2004; 29(2):111-138.
PDF
Date: W 2/8
Presenting: Chittu
[Slides (PPT) ]
[See also: ~donaldclass/Bio/Slides06/graphCuts_chittu_v1.ppt]
[Lecture Notes ]
Topic: Graph Cuts for
Nuclear Vector Replacement and Structure-Based NMR Assignment (2)
Reading:
- Spatially Coherent Matching and Bayesian Recognition (Boykov and Huttenlocher, CVPR '99) PDF. See also: http://www.cs.cornell.edu/~rdz/graphcuts.html
- Spatially Coherent Clustering with Graph Cuts (Zabih and Kolmogorov, CVPR '04)
PDF.
- What Energy Functions can be Minimized via Graph Cuts? (Kolmogorov
and Zabih, ECCV '02/PAMI '04) PDF.
- An Expectation/Maximization Nuclear Vector Replacement Algorithm for Automated NMR Resonance Assignments. Journal of Biomolecular NMR 2004; 29(2):111-138.
PDF
Date: F 2/10
Presenting: Rahul
[Slides (PPT) ]
[Lecture Notes ]
Topic: Protein Unfolding by Using Residual Dipolar Couplings
Reading:
- Bernado P, Bertoncini CW, Griesinger C, Zweckstetter M, Blackledge M.. Defining Long-Range Order and Local Disorder in Native alpha-Synuclein Using
Residual Dipolar Couplings. J Am Chem Soc. 2005 Dec 28;127(51):17968-17969. PDF
- Bernado P, Blanchard L, Timmins P, Marion D, Ruigrok RW, Blackledge M.. A structural model for unfolded proteins from residual dipolar couplings and
small-angle x-ray scattering. Proc Natl Acad Sci U S A. 2005 Nov 22;102(47):17002-7. Epub 2005 Nov 11. PDF
- Jean-Christophe Hus, Dominique Marion, and Martin Blackledge. Determination of Protein Backbone Structure Using Only Residual Dipolar Couplings. J. Am. Chem. Soc. 123, 1541-1542, 2001. PDF
Date: M 2/13
Presenting: Bruce
Molecular Replacement, Protein Design, and
Proteomics
Reading:
- How do we determine homo- or hetero-oligimeric
protein crystal structures using
X-ray diffraction?
An Introduction to Molecular Replacement with Non-Crystallographic Symmetry.
A Subgroup Algorithm to Identify Cross-Rotation Peaks Consistent
with Non-Crystallographic Symmetry. Dartmouth Computer Science
Department, Acta Crystallographica D: Biological Crystallography 2004;
D60, 1057-1067. [PDF]
- How do we redesign enzymes to have novel function?
A Novel Ensemble-Based Scoring and Search Algorithm for Protein
Redesign, and its Application to Modify the Substrate Specificity of
the Gramicidin Synthetase A Phenylalanine Adenylation Enzyme.
R. Lilien, B. Stevens, A. Anderson, and B. R. Donald.
Journal of Computational Biology 2005; 12(6-7):740-761.
- How do we discover protein targets and biomarkers?
"Probabilistic Disease Classification of Expression-Dependent
Proteomic Data from Mass Spectrometry of Human Serum," Journal of Computational Biology,
10(6) 2003, pp. 925-946.
Queue
What follows below is a queue of papers we will read next.
Date: F 2/15
Presenting: Lincong
[The slides can be downloaded from the Unix file system: ~donaldclass/Bio/Slides06/cs88_Protein-Ligand01_Lincong.ppt]
[Lecture Notes ]
Topic: Protein-Ligand Binding
Reading:
- Davis AM, Teague SJ, Kleywegt GJ. Application and limitations of X-ray crystallographic data in structure-based ligand and drug design. Angewandte Chemie International Edition, Jun 23;42(24):2718-36, 2003. PDF
- Erickson JA, Jalaie M, Robertson DH, Lewis RA, Vieth M. Lessons in molecular recognition: the effects of ligand and protein flexibility on molecular docking accuracy. J. Med. Chem., 47 (1), 45 -55, 2004. PDF
- Halperin I, Ma B, Wolfson H, Nussinov R. Principles of docking: An overview of search algorithms and a guide to scoring functions. Proteins. 2002 Jun 1;47(4):409-43. PDF
- Additional reading:
- Claussen H, Buning C, Rarey M, Lengauer T. FlexE: efficient molecular docking considering protein structure variations. J Mol Biol. 2001 Apr 27;308(2):377-95. PDF
- Knegtel RM, Kuntz ID, Oshiro CM. Molecular docking to ensembles of protein structures. J Mol Biol. 1997 Feb 21;266(2):424-40. PDF
- Taylor RD, Jewsbury PJ, Essex JW. FDS: flexible ligand and receptor docking with a continuum solvent model and soft-core energy function. J Comput Chem. 2003 Oct;24(13):1637-56. PDF
- McGann MR, Almond HR, Nicholls A, Grant JA, Brown FK. Gaussian docking functions. Biopolymers. 2003 Jan;68(1):76-90. PDF
- Peters KP, Fauck J, Frommel C. The automatic search for ligand binding sites in proteins of known three-dimensional structure using only geometric criteria. J Mol Biol. 1996 Feb 16;256(1):201-13. PDF
- Jones G, Willett P, Glen RC, Leach AR, Taylor R. Development and validation of a genetic algorithm for flexible docking. J Mol Biol. 1997 Apr 4;267(3):727-48. PDF
- Simonson T, Archontis G, Karplus M. Free energy simulations come of age: protein-ligand recognition. Acc Chem Res. 2002 Jun;35(6):430-7. PDF
- Mangoni M, Roccatano D, Di Nola A. Docking of flexible ligands to flexible receptors in solution by molecular dynamics simulation. Proteins. 1999 May 1;35(2):153- 62. PDF
- Verkhivker GM, Bouzida D, Gehlhaar DK, Rejto PA, Arthurs S, Colson AB, Freer ST, Larson V, Luty BA, Marrone T, Rose PW. Deciphering common failures in molecular docking of ligand-protein complexes. J Comput Aided Mol Des. 2000 Nov;14(8):731-51. PDF
- Shoichet BK, Leach AR, Kuntz ID. Ligand solvation in molecular docking. Proteins. 1999 Jan 1;34(1):4-16. PDF
Date: F 2/17
Presenting: Lincong
Topic: Protein-folding and Enzyme Dynamics
Reading:
- L. Wang, and B.R. Donald. The Conformation Ensemble of Protein in the Denatured State. Pre-print. [The draft can be downloaded from the Unix file system: ~donaldclass/Bio/Papers/]
- L. Wang, Y. Pang, T. Holder, J. R. Brender, A. V. Kurochkin and E. R. P. Zuiderweg (2001) Functional Dynamics in the Active Site of the Ribonuclease Binase. Proceedings of the National Academy of Sciences, USA, 98, 7684-7689. PDF
Date: W 2/20
Presenting: Chittu
Topic: More on Graph Cuts for
Nuclear Vector Replacement and Structure-Based NMR Assignment (2)
Reading:
- Spatially Coherent Matching and Bayesian Recognition (Boykov and Huttenlocher, CVPR '99) PDF. See also: http://www.cs.cornell.edu/~rdz/graphcuts.html
- Spatially Coherent Clustering with Graph Cuts (Zabih and Kolmogorov, CVPR '04)
PDF.
- What Energy Functions can be Minimized via Graph Cuts? (Kolmogorov
and Zabih, ECCV '02/PAMI '04) PDF.
- An Expectation/Maximization Nuclear Vector Replacement Algorithm for Automated NMR Resonance Assignments. Journal of Biomolecular NMR 2004; 29(2):111-138.
PDF
Date: M 2/27
Presenting: Igor
[Slides (PPT) ]
[See also: ~donaldclass/Bio/Slides06/CompBio2006_v2.ppt]
[Lecture Notes ]
Topic: Analyzing Protein Structure by Using Ensemble
Representation
Reading:
- Zagrovic B, Pande VS.. How does averaging affect protein structure comparison on the ensemble level? Biophys J. 2004 Oct;87(4):2240-6.
PDF
- L. Wang, and B.R. Donald. The Conformation Ensemble of Protein in the Denatured State. Pre-print. [The draft can be downloaded from the Unix file system: ~donaldclass/Bio/Papers/]
Date: W 3/1
Presenting: Xiaoduan
[Slides (PPT) ]
[Lecture Notes ]
Topic: NMR Resonance Assignment Assisted by Mass Spectrometry
Reading:
- Feng L, Orlando R, Prestegard JH.. Mass spectrometry assisted assignment of NMR resonances in 15N labeled proteins. J Am Chem Soc. 2004 Nov 10;126(44):14377-9.
PDF
- Megan A. Macnaughtan, Austin M. Kane, and James H. Prestegard. Mass Spectrometry Assisted Assignment of NMR Resonances in C13 Reductively 13C-Methylated Proteins. J. Am. Chem. Soc., 127 (50), 17626 -17627, 2005. PDF
Date: F 3/3
Presenting: John Thomas
[Slides (PPT) ]
[Lecture Notes ]
Topic: Automated NMR Resonance Assignment
Reading:
- Masse JE, Keller R.. AutoLink: automated sequential resonance assignment of biopolymers from NMR data
by relative-hypothesis-prioritization-based simulated logic. J Magn Reson. 2005 May;174(1):133-51.
PDF
Date: M 3/6
Presenting: Tony
[Slides (PPT) ]
[Lecture Notes ]
Topic: Enzyme Redesign by SVM
Reading:
- Christian Rausch, Tilmann Weber1, Oliver Kohlbacher, Wolfgang Wohlleben1 and Daniel H. Huson. Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs). Nucleic Acids Research 2005 33(18):5799-5808.
PDF
Date: W 3/8
Presenting: John MacMaster
[Slides (PPT) ]
[Lecture Notes ]
Topic: Flexible Ligand-Protein Docking
Reading:
- Murphy KP.. Predicting binding energetics from structure: looking beyond DeltaG. Med Res Rev. 1999 Jul;19(4):333-9.
PDF
- Gervasio FL, Laio A, Parrinello M.. Flexible docking in solution using metadynamics. J Am Chem Soc. 2005 Mar 2;127(8):2600-7.
PDF
Class projects are due today in class. You must
turn in a hardcopy by 1:45 Weds 3/8.
Date: TBA
Presenting: Fei
Topic: Receptor Flexibility in Ligand Design and Docking
Reading:
- Alberts IL, Todorov NP, Dean PM..Receptor flexibility in de novo ligand design and docking. J Med Chem. 2005 Oct 20;48(21):6585-96.
PDF
Date: TBA
Presenting: Ivelin
Topic: Computational Enzyme Design
Reading:
- Wilson C, Mace JE, Agard DA.. Computational method for the design of enzymes with altered substrate
specificity. J Mol Biol. 1991 Jul 20;220(2):495-506. PDF
- Chakrabarti R, Klibanov AM, Friesner RA.. Computational prediction of native protein ligand-binding and enzyme active site
sequences. Proc Natl Acad Sci U S A. 2005 Jul 19;102(29):10153-8. Epub 2005 Jul 5.
PDF
Date: TBA
Presenting: Chittu
Topic: Protein-Protein Docking with Multiple Residue Conformations
Reading:
- Lorber DM, Udo MK, Shoichet BK.. Protein-protein docking with multiple residue conformations and residue
substitutions. Protein Sci. 2002 Jun;11(6):1393-408.
PDF
Date: TBA
Presenting: Xiaoduan
Topic: Molecular Motions by Using Residual Dipolar and
Hydrogen-Bond Scalar Couplings
Reading:
- Bouvignies G, Bernado P, Meier S, Cho K, Grzesiek S, Bruschweiler R, Blackledge
M.. Identification of slow correlated motions in proteins using residual dipolar and
hydrogen-bond scalar couplings. Proc Natl Acad Sci U S A. 2005 Sep 19.
PDF
Date: TBA
Presenting: Jianyang (Michael)
Topic: Statistical Coil Model of the Unfolded State
Reading:
- Jha AK, Colubri A, Freed KF, Sosnick TR.. Statistical coil model of the unfolded state: Resolving the reconciliation
problem. Proc Natl Acad Sci U S A. 2005 Aug 30.
PDF
Date: TBA
Presenting: John MacMaster
Topic: Automated NMR Assignment and Structure Determination
Reading:
- Grishaev A, Steren CA, Wu B, Pineda-Lucena A, Arrowsmith C, Llinas M.. SABACUS, a direct method for protein NMR structure computation via assembly of
fragments. Proteins. 2005 Aug 3;61(1):36-43.
PDF
- Hamid R. Eghbalnia1, Arash Bahrami, Liya Wang, Amir Assadi and John L. Markley. Probabilistic Identification of Spin Systems and their Assignments including Coil-Helix Inference as Output (PISTACHIO). Journal of Biomolecular NMR, Volume 32, Number 3, Pages: 219 - 233, July 2005
PDF
Papers that we read last year included the following. We will read
different papers this year but this is to give you an idea of the kind
of papers we may read::
Date: TBA
Medline
(PubMed) Example
Presenting: TBA
RDCs, Dynamics and Ensembles
[Slides]
Reading:
- 1. Reconstruction of interatomic vectors by principle component analysis
of nuclear magnetic resonance data in multiple alignments,
Jean-Christophe Hus and Rafael Bruschweiler,
The Journal of Chemical Physics Vol 117(3) pp. 1166-1172. July 15,
2002 [PDF]
-
Dynamic and Structural Analysis of Isotropically
Distributed Molecular Ensembles,
PROTEINS: Structure, Function, and Genetics 46:177-189 (2002),
Jeanine J. Prompers and Rafael Bruschweiler
[PDF]
- Journal of Computational Biology,
Volume 10, Numbers 3/4, 2003.
Pp. 617-634. Understanding Protein Flexibility through Dimensionality
Reduction Miguel L. Teodoro, George N. Phillips, Jr., And Lydia
E. Kavraki [PDF]
- Here is a wonderful textbook
that covers the Singular Value Decomposition (SVD), Penrose
pseudo-inverse, and other useful numerical methods.
Date: TBA
Presenting: TBA
[Slides]
Protein Structure Determination using Residual Dipolar Couplings
A 4-5 page written project proposal is due on January 30.
Reading:
-
L. Wang and B. R. Donald.
Exact solutions for internuclear vectors and backbone dihedral angles
from NH residual dipolar couplings in two media, and their application in a
systematic search algorithm for determining protein backbone structure.
Jour. Biomolecular NMR, 29(3):223-242, 2004.
[PDF]
Date: TBA
Presenting: TBA
[Slides]
DNA Self-Assembly and Computation
Reading:
-
C. Mao, LaBean, T.H. Reif, J.H., Seeman, Logical Computation Using
Algorithmic Self-Assembly of DNA Triple-Crossover Molecules, Nature,
vol. 407, Sept. 28 2000, pp. 493?495; C. Erratum: Nature 408,
750-750(2000). [PDF1]
[PDF2]
Date: TBA
Presenting: TBA
[Slides]
Distance Geometry
Reading: - Journal of Global Optimization
22: 365-375, 2002.
A linear-time algorithm for solving the molecular
distance geometry problem with exact inter-atomic
distances.
PDF
- B. Hendrickson, "Conditions For Unique Graph Realizations." Siam
Journal of Computing, Vol. 21, No. 1, February 1992, pp. 65--84.
- B. Hendrickson, "The Molecule Problem: Exploiting Structure in Global
Optimization." Siam Journal of Computing, Vol. 5, No. 4, November
1995, pp. 835--857.
Date: W 10/27
Presenting: Bruce
Proteomics and Computatonal Structural Biology
Reading:
- How do we discover protein targets and biomarkers?
"Probabilistic Disease Classification of Expression-Dependent
Proteomic Data from Mass Spectrometry of Human Serum," Journal of Computational Biology,
10(6) 2003, pp. 925-946.
- How do we determine protein crystal structures using X-ray diffraction?
A Subgroup Algorithm to Identify Cross-Rotation Peaks Consistent
with Non-Crystallographic Symmetry. Dartmouth Computer Science
Department, Acta Crystallographica D: Biological Crystallography 2004;
D60, 1057-1067. [PDF]
- How do we redesign enzymes to have novel function?
A Novel Ensemble-Based Scoring and Search Algorithm for Protein
Redesign, and its Application to Modify the Substrate Specificity of
the Gramicidin Synthetase A Phenylalanine Adenylation Enzyme.
R. Lilien, B. Stevens, A. Anderson, and B. R. Donald.
Journal of Computational Biology 2005; 12(6-7):740-761.
Date: TBA
Guest lecture: TBA
Note unusual place: 006 Steele
Chiral
Mutagenesis of Insulin. Foldability and Function are Inversely Regulated by a
Stereospecific Switch in the B Chain
Michael Weiss, Professor and Chairman of the Biochemistry Department at the
Case Western Reserve University Medical School, will be visiting next Thursday
(10/28) and giving the Chemnistry Department Colloquium. His talk
is entitled: "Chiral
Mutagenesis of Insulin. Foldability and Function are Inversely Regulated by a
Stereospecific Switch in the B Chain". His
research focuses on the structural biology of proteins and enzymes, and the
regulation of gene expression. He is an expert in the use of high field NMR
spectroscopy to address questions on protein structure and function. For more
information, please see his CWRU website:
here.
Date: TBA
Presenting: TBA
[Slides]
Distance Geometry, continued (*).
Reading: - Journal of Global Optimization
22: 365-375, 2002.
A linear-time algorithm for solving the molecular
distance geometry problem with exact inter-atomic
distances.
PDF
- B. Hendrickson, "Conditions For Unique Graph Realizations." Siam
Journal of Computing, Vol. 21, No. 1, February 1992, pp. 65--84.
- B. Hendrickson, "The Molecule Problem: Exploiting Structure in Global
Optimization." Siam Journal of Computing, Vol. 5, No. 4, November
1995, pp. 835--857.
-
@inproceedings{Saxe,
author = "Saxe, J. B.",
title = "Embeddability of weighted graphs in $k$-space is strongly {NP}-hard",
booktitle = "Proceedings of the 17th Allerton Conference on Communications, Control, and Computing",
year = "1979",
pages = "480--489",
}
Below are the contents from Garey and Johnson's book I found useful in
our context of the class.
1) Strong NP-completeness : 95 - 107 : chapter 5
2) Applying NP-completeness to Approximation Problems: 137 - 148 :
chapter 6
[ Computers and Intractability
A guide to the Theory of of NP-Completeness
- Michael R. Garey & David S. Johnson
ISBN : 0-7167-1045-5]
However, it is good to go once thru chapter 1 of the above book to be
familiar with the terminologies of the book. Also, chapter 5 & 6 are
overall a good resource on NP issues for interested readers.
Date:TBA
Presenting: TBA
Protein design
[Slides]
Reading:
- Design of a Novel Globular
Protein Fold with
Atomic-Level Accuracy
Brian Kuhlman, Gautam Dantas, Gregory C. Ireton,
Gabriele Varani, Barry L. Stoddard, David Baker.
PDF
- Computational design of protein-protein interactions
Tanja Kortemme, and David Baker.
PDF
Date: TBA
Presenting: TBA
Enzyme design
[Slides]
Reading:
- J Mol Biol. 2001 Mar 16;307(1):429-45.
Generalized dead-end elimination algorithms make large-scale protein side-chain
structure prediction tractable: implications for protein design and structural
genomics.
Looger LL, Hellinga HW. PDF
- Computational Design of a
Biologically Active Enzyme
Mary A. Dwyer, Loren L. Looger, Homme W. Hellinga.
PDF
- See also:
Looger, Loren L., Dwyer, Mary A., Smith, James J., Hellinga, Homme
W. Computational design of receptor and sensor proteins with novel
functions. Nature, Vol. 423, May 8, 2003: 185-190. PDF
Date: TBA
Presenting: TBA
Bayesian Assignment and Direct Methods for NMR
[Slides]
Reading:
- J Biomol NMR. 2004 Jan;28(1):1-10.
BACUS: A Bayesian protocol for the identification of protein NOESY
spectra via unassigned spin systems. Grishaev A, Llinas M. PDF
- Grishaev, Alexander, Llinas, Miguel. CLOUDS, a protocol for
deriving a molecular proton density via NMR. PNAS, Vol. 99, No. 10,
May 14, 2002: 6707-6712. PDF
- Grishaev, Alexander, Llinas, Miguel. Protein structure
elucidation from NMR proton densities. PNAS, Vol. 99, No. 10, May 14,
2002: 6713-6718.
PDF
Date: TBA
Presenting: TBA
Slides
Rotating or Spinning Samples in order to Scale RDCs
Reading:
- 2004, Volume 29, Issue 3, J.Biomol.NMR.
Lancelot et al.
Measurement of Scaled Residual Dipolar Couplings in Proteins Using
Variable-angle Sample Spinning
PDF
- There are 12 more papers at
~brd/Bio/Papers/NMR/Residual-dipolar-coupling/Spinning/
on the unix file system.
Date: TBA
Presenting: TBA
Topic: Minimized DEE and using A* Search to approximate K*
Main reading:
Date:
No Class: office Dartmouth Holiday.
Date: TBA
Presenting: TBA
Class Project: MEMS and Nanotechnology Techniques for Aligning Proteins in
Solution
Reading: - Plantenga, T.M. et al, "13C NMR
Molecules Partially Alligned by Electric Field: A New Method for
Determining the Orientation of the Dipole Moment", Chem. Phys. 48
(1980) 359-560.
- Gaemers S. and A. Bax, "Morphology of Three Lyotropic Liqid Crystaline
Biological NMR Media Studied by Translation Diffusion Anisontropy.", J.
Am. Chem. Soc. 2001, 123, 12343-12352. PDF;
Supporting material
- Also review: Residual Dipolar Couplings in Structure Determination of
Biomolecules,
J. H. Prestegard et al, Chem Rev 2004.
[PDF]
Date: W 12/1
Presenting: John Thomas, John
MacMaster, Xiaoduan
Last day of class.
Class Projects
Date: TBA
Presenting: TBA
Topic
Reading:
Syllabus
For an example of the kind of papers we will read, please see This
page. . If you took my previous CS-Bio seminar, I estimate
that the papers we will read will have only about 20% overlap. I plan
for us to read a largely different corpus, reading new papers.
Supplementary material and links
Some other papers you
may read
- Here is a useful bibliography of
papers (and PDFs) in the area of this course.
- Whitepaper on Advanced Computational
Structural Genomics (read the long version, not the "lite" version).
- Fast detection of
common substructure in proteins, P. Chew, K. Kedem, J. Kleinberg,
and D. Huttenlocher (RECOMB'99).
- Rick Lathrop Lab
Other papers we may read include:
- Date: TBA
Presenting: TBA
Protein Similarity
Reading:
Some Relevant
WWW Links
AMMP.
AMMP is a modern full-featured molecular mechanics, dynamics and
modeling program. It can manipulate both small molecules and
macromolecules including proteins, nucleic acids and other
polymers. In addition to standard features, like numerically stable
molecular dynamics, fast multipole method for including all atoms in
the calculation of long range potentials and robust structural
optimizers, it has a flexible choice of potentials and a simple yet
powerful ability to manipulate molecules and analyze individual energy
terms. One major advantage over many other programs is that it is easy
to introduce non-standard polymer linkages, unusual ligands or
non-standard residues. Adding missing hydrogen atoms and completing
partial structures, which are difficult for many programs, are
straightforward in AMMP.
Read the white paper on Advanced Computational
Structural Genomics
Computational biology research at Dartmouth.
Check out
Donald Lab Papers at
RECOMB'99
Intelligent
Systems in Molecular Biology (ISMB) (all meetings).
Dartmouth M.D.-Ph.D. Program
Web sites of interest
to structural biologists.
A
large resource page on computational biology at George Mason University.
A
large resource page on bioinformatics at the Institut Pasteur.
CARB Biocomputing
Resources.
A
list of protein folding groups on the web.
The
WWW Virtual Library page on biomolecules.
Donald Lab.
The Journal of Computer-Aided Molecular Design
Some
resources and descriptions of problems in Computational Biology.
Notes
Related Resources on the World Wide Web
General Notes
Muscle-Specific Regulation of Transcription: A Catalog of Regulatory
Elements by Laura L. L-pez
and James W. Fickett presents a summary of published information on
muscle-specific transcriptional regulation.
Pedro's BioMolecular Research Tools
is a collection of
WWW links to information and services useful to molecular biologists. It
provides links to molecular biology search and analysis tools;
bibliographic, text, and Web search services; guides and tutorials; and
biological and biochemical journals and newsletters.
The World Wide Web Virtual Library: Biosciences
points to virtual library pages for
Biomolecules, and
Biochemistry and Molecular Biology. Each of these pages
presents a long list of Web resources. The World Wide Web Virtual Library
Biomolecules covers molecular sequence and structure databases,
metabolic pathway databases, and other lists of Web resources. The World
Wide Web Virtual Library: Biochemistry and Molecular Biology is a list of
resources listed by provider.
Cell & Molecular Biology Online is a
well-organized list of Web resources for cell and molecular biologists. For
each resource, a brief description is provided.
CSUBIOWEB, the California State
University Biological Sciences Web server, provides links to other Web sites
on cell biology and molecular biology.
The Dictionary of Cell Biology (London: Academic Press, 1995) defines transcription, leucine zipper, and
other terms used in this research commentary.
Biotech Life Science Dictionary is a free resource
that defines terms in biochemistry, biotechnology, botany, cell biology, and
genetics, including terms used in this research commentary.
Protein Synthesis is a tutorial on the processes involved in Protein Synthesis, starting from
the genetic information in DNA, through transcription to produce messenger
RNA, and translation of mRNA to a polypeptide. This tutorial is a section
of Principles of Protein Structure Using the Internet, a Birkbeck College
(University of London) accredited Advanced Certificate course.
Numbered Notes
Reading the Messages in Genes describes
transcription and provides a diagram. This page is a unit of Access
Excellence, a national educational
program sponsored by Genentech that provides high school biology
teachers access to their colleagues, scientists, and critical sources of new
scientific information via the Web.
The MIT Biology Hypertextbook is a Web-based textbook developed for introductory biology courses at MIT. Central
Dogma provides an
illustrated description of the process of transcription.
DNA binding proteins, enhancers, and the control of gene expression describes
transcription and transcription factors. This page was developed by Ronald
R. D. Croy as a component of Course Notes for Molecular Genetics I Lectures.
Control of Gene Expression in Eukaryotes
by Phillip McClean is a tutorial on gene regulation. The Transcription
Complex provides a brief discussion of transcription factors.
The Mechanisms of Gene Regulation are outlined in Microbial Genetics Lecture Notes, developed by L.
S. Pierson III and C. Kennedy for a class at the University of Arizona.
The Wolberger Lab lists
publications of Cynthia Wolberger and her co-workers.
Introduction to the Metazoa
describes the metazoan phyla. This introduction is a chapter of The
Phylogeny of Life, an
online exhibit developed by the University of California Museum of
Paleontology.
Protein Zippers describes the leucine zipper and provides an illustration.
Barbara Graves' research is described and selected publications are
listed on the Huntsman Cancer Institute Web page at
the University of Utah.
Some Useful References for the Course
Protein Science
- Introduction to Protein Structure, Branden, C. and Tooze, J. (1991) Garland
Publishing, New York
- Proteins, Creighton, T.E. (1993) 2nd edition, W.H. Freeman & Co., New York
- Principles of Protein Structure, Schulz, G.E. and Schirmer, R.H. (1979) Springer-Verlag, New York
- Protein Structure - New Approaches to Disease and Therapy, Perutz, M. (1992) W.H. Freeman & Co., New York
- Enzyme Structure and Mechanism, Fersht, A.R. (1976) 2nd ed., pub. W.H.Freeman & Co., New York
Biochemistry
- Biochemistry, Stryer, L., (1995) 4th edition, W.H. Freeman & Co., New York
- Biochemistry, Voet, D. and Voet, J.G. (1995) 2nd edition, John Wiley & Sons, New York
- Principles of Biochemistry, Zubay, G.L., Parson, W.W. and Vance, D.E. (1995) Wm. C. Brown, Dubuque, Iowa
Cell Biology
- Molecular Cell Biology, Darnell, J., Lodish, H. and Baltimore, D. (1995) 3rd edition,
W.H. Freeman & Co., New York
- Molecular Biology of The Cell, Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K. and Watson, J.D.
(1994) 3rd edition, Garland Publishing, New York
Hypertextbooks
BioComputing, for the VSNS-Biocomputing Division Course
Biology, developed by Shane Crotty, MIT
Course/Tutorial on Cell Biology, Mark Dalton, Cray Research
Principles of Biochemistry, Horton, Moran, Ochs, Rawn, Scrimgeour
Return to top of page