Exploits of a mom (from XKCD) https://xkcd.com/327/

Course description

ORC. This course studies the management of large bodies of data or information. This includes schemes for the representation, manipulation, and storage of complex information structures as well as algorithms for processing these structures efficiently and for retrieving the information they contain. This course will teach the student techniques for storage allocation and deallocation, retrieval (query formulation), and manipulation of large amounts of heterogeneous data. Students are expected to program and become involved in a project in which they study important aspects of a database system: ways to organize a distributed database shared by several computers; transactions that are processed locally and globally; robustness guarantees of the stored data against failure; security and data integrity guarantees from unauthorized access; privacy; object-oriented schemes for multimedia data; indexing, hashing, concurrency control, data mining, data warehousing, mobile databases and storage file structures.

Learning objectives
The ultimate goal of this course is to equip you to use data to make smart decisions rather than making decisions on gut feel or guesswork. After completing this course you will be able to:
  1. Query existing relational databases for insight. We will spend the first few weeks of the course learning the standard database query language called Structured Query Language (aka SQL, aka 'sequel'). This will give you the tools necessary to query existing databases for insight into the data they hold.
  2. Design your own efficient databases. We will spend the next few weeks examining how to structure your own databases, laying out tables, and considering factors such as redundancy, reliability, and speed.
  3. Understand database internals. Next we will explore how databases operate, retrieving data quickly and accurately, even with multiple users accessing and updating data simultaneously.
  4. Describe alternative database technologies. Finally we will look at new database technologies such as NoSQL databases.

Prerequisite CS 50. I will also assume you are familiar with Python.

Who, when, where

Instructor
Tim Pierson | ECSC 222
office hours: most weeks Fri 3:30 pm - 4:30 pm (confirm via Canvas calendar), and by appointment.
Graduate teaching assistant
Noah Schaffer and Ruize Xu
office hours: maintained on Canvas.
Lectures
10A-hour | Tu/Th 10:10 am - 12:00 pm | Cummings 100
I do not plan to regularly use x-hours, but I may sometimes use them for missed classes, to catch up on material, or for optional, informal session to work through examples. Make sure to keep this time slot free in case we need to use it.
We will frequently have in-class exercises to try out new concepts on a live network. Google and StackOverflow will be your friend, do not hesitate to use them (unless instructed otherwise)!
One of the primary benefits of lectures, as opposed to books and videos, is the opportunity to interact. We will all enjoy the experience more, and everyone will learn more, if you do ask questions. It can of course be intimidating, but chances are that if you have a question, then at least one other student — and possibly many more — has the same question. You're doing the other students a favor by asking!
Help: Slack
Expect an invite to a Slack channel after the first day of class. I strongly encourage you to ask and answer questions there. DO NOT post code on Slack (I do not intend for Slack to be a group debugging tool!).
You can DM me on Slack, but unless I happen to be sitting at my computer when you do, I won't get your message until Slack sends me an email at some point in the future. You're better off emailing me directly rather than DM'ing me on Slack. Please don't message me via Canvas!
Announcements
Monitor Canvas for periodic course-wide announcements.
Textbook
Database System Concepts, 7th edition, by Silberschatz, Korth, and Sudarshan. I highly recommend the ebook version instead of the paper version (the paper version is not even bound — it is a collection of loose leaf papers!).
While the Silberschatz book will be our primary textbook, and all assigned reading will be from that textbook, another useful resource is Database Systems: Design, Implementation, & Management, 14th edition, by Coronel and Morris.
Another great free online resource is https://www.mysqltutorial.org.

Assessment

Grades in this class will be a combination of a term-long project, several lab assignments, two midterm exams, and class participation. A total score of at least 60% is required to pass.

Assessment summary

Class participation 5%
Labs 40%
Exams 30%
Final project 25%
Total 100%

Class participation (5%)

Participating in the classroom discussion benefits everyone. I hope you will contribute! I will award participation points with this rubric:

  • Everyone starts with 4 points out of 5
  • -1 for each day I *notice* you were not in class
  • +1 if you have participated in the class discussion
  • +2 if you participate regularly
  • +3 if you participate frequently
  • Score is clipped between 0 and 5.

In addition to participating in the classroom discussion, most classes will have a hands-on portion where we will work through a series of problems. At the end of this portion of class you may be randomly selected (with replacement) to present your solution. If you are unable to attend class for a medical or academic reason, you must let me know before class begins (or risk getting randomly selected). I will certainly notice your absence if you are randomly selected but are not present!

Labs (40%)

There will be four lab assignments (aside from Lab 0 which is simply to gather information) that together account for 30% of the grade in this course. The points for each lab are:
  • Lab 1: 5%
  • Lab 2: 10%
  • Lab 3: 15%
  • Lab 4: 10%.
Requirements for lab submissions:
Labs are designed to be completed outside of class and must be submitted electronically via Canvas before the deadline indicated on Canvas. Even when a lab has some written exercises, you are required to either type in a file or scan your written work and submit it electronically. To submit output from your program, submit a copy-pasted file in pdf format and/or a screenshot, as appropriate. For plain text, you can use a program like TextEdit, NotePad, or Emacs, or even Word, but be sure to save as a pdf. For a screen shot, you can use Preview on Mac (under the "File" menu) or the PrntScrn button on Windows.

You may work with one partner on these lab assignments (see Collaboration below). In addition:

  • If you worked with a partner, include your name and your partner's name, or if you worked alone, state "no partner" in a comment in your submission.
  • If you worked a partner and ended up with a single shared solution, indicate that in a comment in the solution and on the submission. Each partner should submit the same solution. The solution will then be graded once with the same grade assigned to both partners.
  • If you worked with a partner but wrote separate code, indicate the collaboration but that you have different submissions. Indicate this in both the code and on the submission. Each of you should upload your own solution, and each of you will get a separate grade.
  • Collect all your code files into a single zip file and upload that zip, rather than many separate files.
Late policy
Due via Canvas on the date and time noted on Canvas assignment. Penalties: < 8 hours: 10%; < 24 hours: 20%; < 48 hours: 40%; more: no credit.
You are allowed at most one late submission (up to 48 hours) with no penalty; no excuse required. Indicate in your submission that you are electing to use your free pass; no undoing the choice. This cannot be combined with a penalty (e.g., you can't take an 8-hour penalty on top of the 48-hour free pass). If you are working a partner, this counts as the free pass for both of you.
Grading
Specific grading rubrics will be provided for each lab.

Exams (30%)

There will be two midterms, each worth 15% of the final grade (no final — your project counts as the final). You are allowed to use one 8.5 x 11 inch note page for the exam, but you must not include answers or code from prior CS61 exams unless it was explicitly provided by the instructor or part of the material covered in class.

If you have questions about your exam score, or would like a question re-graded, see your TA within one week from the date that the exam was returned to the class. If you request a re-grade of a particular question, we reserve the right to re-grade your entire exam.


Project (25%)

Over the span of the term you will work on a database-related project of your choosing with three other students. Details of the project's requirements are here.

Collaboration

Much of the learning in this course comes from doing the programming exercises. Sometimes learning can happen more effectively when you can hash things out with someone else, so working with a partner will be allowed on lab assignments. You may work jointly with one other person on a given lab. If you choose to work with someone else, you and your partner must both submit the same joint assignment with both names on it, and you must work with the same person for the entire assignment (you cannot work with one person for some parts of an assignment and a different person for other parts).

If you work with a partner you are still responsible for understanding the entire assignment. That means that splitting the coding into pieces, doing your part, and never looking at your partner's parts is not a good idea. You can learn a lot by reading your partner's code and figuring out how it works, whether it is correct, and how it might be improved. You can also catch things like poor or missing comments that could cost you style points when the assignment is graded.

When working with a partner, I suggest that you borrow a practice from Extreme Programming, a method of writing code that many businesses find quite effective. One person (the driver) sits at the keyboard. The other person (the navigator) looks at the (virtual) screen as the driver types, asking questions, making suggestions, and catching errors. Both of you will understand the code better if you discuss it as it is written than if you just write it (or read it) by yourself. Regularly trade off who is driver and who is navigator.

The usual reaction to this idea is, "that will take twice as long!" In practice it is usually faster than each person programming alone. The reason is that errors are caught earlier, and the amount of time are saved when debugging more than makes up for the lack of parallelism in code writing. Also, the code tends to be better written. These are the reasons that this idea has been adopted in industry.

Honor code

Dartmouth's honor principle applies to this course, also the Arts and Sciences Academic Honor Policy for Undergraduates and Academic Honor Policy for Graduate and Professional Students under the Guarini School of Graduate and Advanced Studies. Academic misconduct policies will be strictly enforced. I will report suspected cases of cheating to the Undergraduate Judicial Affairs Officer. I also reserve the right to assign a failing grade for an assignment if I conclude that the honor principle has been violated, regardless of the finding from the Committee on Standards. If you have questions, ask!

Special note on Artificial Intelligence-based code generators
AI-based tools such as ChatGPT, CoPilot, Code Llama, and others can generate code for you based on natural language prompts that you provide. For this class, I do not consider it to be an honor principle violation for you to use these tools to create or debug your lab solutions. However, I strongly urge you to write the solutions yourself, rather than relying on these tools. Most of the true mastery of this course's material happens from striving to create correct and efficient code yourself, not from simply reading and copying an AI-based tool's code.
If you choose to use an AI-based tool you must:
  • Cite the tool you used for each method or function or SQL command created or debugged with one of these tools, even if you modify the tool-produced code
  • Be able to explain every line of code in your solution; specifically what the line does and why you included it.
If you choose to use an AI-based tool you must not:
  • Share the prompts you used with anyone other than your partner
  • Share the tool's output with anyone other than your partner. Other students must use the tool themselves and must evaluate the tool's output relative to their solution.
Remember: because code compiles and runs does not mean it is correct, efficient, or secure. Also, be aware that these tools typically store and analyze your prompts, potentially building a profile of you.

Attendance

You are expected to attend class in person unless you have made alternative arrangements due to illness or other medical reasons. For the health and safety of our class community, please: do not attend class when you are sick, nor when you have been instructed by Student Health Services to stay home. You will be able to view recordings of class in Canvas if you are unable to attend due to illness.

Accessibility Needs

Students requesting disability-related accommodations and services for this course are required to register with Student Accessibility Services (SAS; Apply for Services webpage; student.accessibility.services@dartmouth.edu; 1-603-646-9900) and to request that an accommodation email be sent to me in advance of the need for an accommodation. Then, students should schedule a follow-up meeting with me to determine relevant details such as what role SAS or its Testing Center may play in accommodation implementation. This process works best for everyone when completed as early in the quarter as possible. If students have questions about whether they are eligible for accommodations or have concerns about the implementation of their accommodations, they should contact the SAS office. All inquiries and discussions will remain confidential.

Mental Health

The academic environment at Dartmouth is challenging, our terms are intensive, and classes are not the only demanding part of your life. There are a number of resources available to you on campus to support your wellness, including your undergraduate dean, Counseling and Human Development, and the Student Wellness Center.

Religious Observances

Dartmouth has a deep commitment to support students’ religious observances and diverse faith practices. Some students may wish to take part in religious observances that occur during this academic term. If you have a religious observance that conflicts with your participation in the course, please meet with me as soon as possible—before the end of the second week of the term at the latest—to discuss appropriate course adjustments. To assist with calendar planning and awareness of our diverse religious and spiritual community, please refer to the Tucker Center for Spiritual and Ethical Life’s holy day calendar. The list represents major holy days which may impact campus events in general, as well as student course attendance, exams, Commencement, and participation in activities in the coming year. If you have any questions about these dates or other concerns, please contact Rev. Nancy Vogele, chaplain and director of the Tucker Center.