## CS 33 (Spring 2011): Homework 5: Normalization

### Algorithms for BCNF testing and 3NF synthesis

For this homework, you are asked to implement two of the normalization algorithms outlined in the textbook.

1. Implement the 3NF synthesis algorithm in Figure 8.12 (page 352) of the textbook, including the part where redundant relations are removed. Note that this requires an implementation of the canonical cover algorithm, which in turn requires an implementation of the attribute closure algorithm. Nowhere in your implementation should you be computing F+, the closure of a set F of functional dependencies.

Your algorithm should read in a text file (the file name will be supplied on the Unix command line) describing a set of functional dependencies among attributes in a single table. It should compute a 3NF decomposition of this table and output a list of new tables, with their corresponding attribute sets, that make up the decomposed database.

2. Implement the BCNF testing algorithm informally described at the bottom of page 349. As before, you should not be computing F+ for any set of functional dependencies.

Combine your algorithm with the previous one, so that it tests the decomposition output by the 3NF algorithm and reports whether or not it is in BCNF. If it is not in BCNF, this algorithm should output the name of a table and a functional dependency that causes a BCNF violation in that table.

### Input and output formats

• The input file will consist of one or more lines, with each line describing one functional dependency. Each line will have the following format:
`      left_1 left_2 ... left_m -> right_1 right_2 ... right_n `
where each left_i and right_i is the name of an attribute. An attribute name is a string consisting of letters, digits, and underscores not beginning with a digit (just like a variable name in C or C++). The attributes of the initial table are exactly the set of all attributes named in the input file.

• Note that the separator between attribute names, and between names and the -> sign, is simply white space, and there may be extra white space anywhere in the input file.

• Any line in the input file that does not contain "->" is to be ignored (treat such a line as a comment).

• The output should name the decomposed tables (for Part 1) as "Table 1", "Table 2", etc. For each output table, print one line on standard output in the format
`      Table i: attrib_1 attrib_2 ... attrib_k`
i.e., the table name, followed by ":", followed by a space-separated list of attributes of the table.

• The output should then give the result of the BCNF test (for Part 2), in the following format. If the decomposed database is in BCNF, then simply print the following line:
`      The above decomposition is in BCNF.`
Otherwise, produce two lines of output as follows:
```      The above decomposition has a BCNF violation in Table i:
left_1 left_2 ... left_p -> right_1 right_2 ... right_q ```
where "Table i" is the name of the violated table and the last line describes a functional dependency among attributes of this table that causes a BCNF violation.

### Example runs

1. Suppose the input file is called "class.txt" and its contents are as follows.
```      // Functional dependencies for class scheduling database
// Based on the example on page 351

course_id         -> title dept_name credits
building room_num -> capacity

// Each term, course sections must be scheduled to avoid time/room conflicts
course_id sec_id semester year -> building room_num time_slot ```
Suppose you write your code in Python and your script is called "hw5.py". We will probably run it as follows
`      python hw5.py class.txt`
and here is one acceptable result for the run:
```      Table 1: course_id title dept_name credits
Table 2: building room_num capacity
Table 3: course_id sec_id semester year building room_num time_slot
The above decomposition is in BCNF.```

2. Take another input file "advising.txt" that has the following contents.
```      ### This is our famous "BCNF-trouble" example
prof_id -> dept_name
student_id dept_name -> prof_id```
Suppose you write your code in Java and call the main file "Decomposition.java". We might run it like this
```      javac Decomposition.java
and here is one valid output that we could expect:
```      Table 1: prof_id student_id dept_name
The above decomposition has a BCNF violation in Table 1:
prof_id -> dept_name ```

Here is how we will grade the various parts of your implementation. The maximum score is 70 points.

1. Correct implementation of attribute closure algorithm
[10 points]

2. Correct implementation of canonical cover algorithm
[10 points]

3. Correct implementation of basic 3NF synthesis
[10 points]

4. Correct removal of redundant relations before output
[5 points]

5. Proper handling of input and output formats, as described above
[5 points]

6. For BCNF checking, correct and complete checking of conditions
[10 points]

7. Correct identification of violating table and f.d., if one exists, and proper output formatting
[5 points]

[5 points]

9. Quality of example runs, showing all facets of your implementation (see below)
[10 points]

### What to turn in

Using the homework submission form, turn in a single zip (or tar/gzip) archive consisting of the following files:

• All of your source code. Do not submit executable code.

• At least three example input files that together illustrate the working of all parts of your implementation. For maximum credit, at least one input file should try to model a complex enough "real life" situation.

• One README file with instructions for compiling and running your program. This file should also give the expected output for each of your supplied input files. (Note that we will run your code on some of our own input files.)