sequence similarity 1hpv structure protein interaction network
sequence similarity across organisms structure of HIV-1 protease + drug [1hpv] possible functional interactions [Jeong+]

Course description

Computation is vital for modern molecular biology, helping scientists to model and predict the behaviors of, and control the molecular machinery of the cell. This course will study algorithmic challenges in analyzing biomolecular sequences (what genes encode an organism, and how are genes related across organisms?), structures (what do the proteins corresponding to these genes look like, and what does that tell us about how they work?), and functions (what do these things do, and how do they interact with each other in doing it?). The course is application-driven, but focused on the underlying algorithms and information processing techniques, employing approaches from search, optimization, pattern recognition, and so forth.

For a beautiful visualization of the underlying molecules, see The Inner Life of the Cell, with either musical accompaniment or narration.

Administrative info

Chris Bailey-Kellogg | 250 Sudikoff | office hours: Mon 10am-noon, Th after class, Fri noon-2pm or by appointment
Teaching Assistants
Jun Gong | 063 Sudikoff | Sun 7-9pm, Mon 6-9pm
Rui Lu | 249 Sudikoff | Mon 3-6pm, Fri 3-6pm
In addition to office hours for the instructor and TAs, help will be available via Piazza (accessed through Canvas). See postings on Piazza for more info. I strongly encourage you to ask and answer questions there.
2A Period | TuTh 2:00-4:00 | 200 Life Sciences Center

One of the primary benefits of lectures, as opposed to books and videos, is the opportunity to interact. We will all enjoy the experience more, and everyone will learn more, if you do ask questions. It can of course be intimidating, but chances are that if you have a question, then at least one other student — and possibly many more — has the same question. You're doing the other students a favor by asking!

Laptops and phones are distracting, not just to you, but to everyone around you (it's human nature to wonder what's up over there). There is recent research that attests to the negative impacts of learning and retention when multitasking. It has also been shown that writing notes by hand rather than on a laptop engages different cognitive processes and has direct (positive) consequences for learning. Since class notes are made available anyway, there's really no need to type. So do me and everyone around you a favor by abstaining from electronic devices for a few hours each week.

None required. While there are several great books out there covering some of the material, I haven't found one that comprehensively introduces the topics we'll cover (biomolecular sequence, structure, and function) from a computational perspective. Thus I'll distribute lecture notes and provide references to the literature.


The course seeks to provide broad exposure to some important algorithmic challenges and approaches in bioinformatics. The lectures and readings will provide a fairly general (for a ten-week course) survey of the field. As this is an introductory course primarily developed for undergraduate and graduate students in computer science, a background in biology is not required. However, students should be interested in learning some basic molecular biology and biochemistry, in context.

The course is cross-listed as CS 75 for undergraduate students and CS/QBS 175 for graduate students. Please make sure you registered for the right one! Grades will be handled separately, and each assignment specifies additional requirements for graduate students.

Students from other related disciplines are encouraged to take the course; it has been approved as an elective for MCB students and to fulfill the applied math requirement for Thayer students. Naturally, a basic understanding of the underlying computational techniques is expected. The homework assignments require implementing bioinformatics algorithms, so programming experience is necessary. Please contact me to discuss your background and interests.

The coursework consists of the following:

Homeworks (75%)
Since one of the best ways to learn something is by doing it, the assignments for roughly the first half of the term will offer the opportunity to implement and apply some core bioinformatics algorithms. To standardize and enable us to provide some infrastructure, labs will be done in Python (version 2) with assistance from SciPy libraries. Since CS 1 is (recursively) a prereq for this course, I assume you know Python, but will have a tutorial session in case you're rusty or satisfied the prereq some other way.

For the latter portion of the term, in order to free up more time for your project work, homeworks will be shorter written assignments.

Assignments will be posted and submitted via Canvas. They are to be turned in before class on the due date. Penalties: < 8 hours: 10%; < 24 hours: 20%; < 48 hours: 40%; more: no credit. You are allowed at most one late submission (up to 48 hours) with no penalty; no excuse required. Indicate in your submission that you are electing to use your free pass; no undoing the choice.

Final project (25%)
The final project provides the opportunity to more freely and deeply explore a topic of interest, choosing an appropriate mixture of research, implementation, and application. A project proposal and a project update will ensure that we are on the same page; a project poster presentation will make for a fun class-wide exchange.

If you feel like more challenge than the basic homework gives you, please do find ways to enhance or more broadly apply your code. I'll offer up some possible suggestions for you to contemplate, but feel free to be creative. And you can work together on such additions, just be sure to note the team effort. The resulting extra credit will be handled very qualitatively. We will test your contribution and assess its "degree of difficulty", and simply make a note for end-of-term grade assignment. If your final grade is borderline between two discrete values (e.g., B+ and A-, or at the top end of P almost to HP), then consistent, high-quality extra work will result in the grade being rounded higher. Note that this also means your grade cannot be hurt by others doing extra credit.

Honor code

Dartmouth's honor code applies to this course, and academic misconduct policies will be strictly enforced. If you have questions, ask!

You may discuss the programming labs with other current CS 75/175 students, but your submission must be entirely your own. As part of a discussion, you may show (in person) another student your work. However, your submission must be created and documented by you alone. You may not copy anything directly from another student's work. For example, copying a portion of someone else's solution onto a piece of paper would violate the honor code, even if you eventually turn in a different answer. Similarly, e-mailing a portion of your code to another student, or posting it on-line for them to see would violate the honor code. Although all students must create and type in their own code, you may help other students debug their programs once you and they have already written your programs. Discussion of the labs is encouraged, subject to these rules; such discussion will be most useful when both students have already made serious attempts to solve the problems on their own.

Written problems are to be done individually, without sharing any details of solutions.

As mentioned above, we will use Piazza as a shared help system. You may not post your code publicly; if you want to share it with the course staff, make the post private. It will be more effective in either case to start with a high-level, general question which will enable others to share insights without having to dive into the depths of your code. The same goes for the written problems — ask and answer general conceptual questions without revealing your (partial) answer.

If your solution makes use of any code that is not your own, you must clearly attribute the source of the code with clear comments in the code that you submit. You do not need to acknowledge discussion with other students on your submitted work. Proper respect for copyright laws as applied to printed materials and software products is subsumed by Dartmouth College's Computing Policies.

The final project is to be performed in a group, all of whose members will receive the same grade. Building on existing code is acceptable, subject to the above comments regarding attribution.


Students with disabilities enrolled in this course and who may need disability-related classroom accommodations are encouraged to make an appointment to see the instructor before the end of the second week of the term. All discussions will remain confidential, although the Student Accessibility Services office may be consulted to discuss appropriate implementation of any accommodation requested.