COSC 8 -- Problem Solving with CS

CS 8, Fall 2009

Problem Set 2: Graphs and Automata


General Instructions

Before you do anything, please read the entire assignment carefully. It contains many suggestions which can save you a lot of struggle later on.

You should print out hard copy of your code and any example outputs, and either turn them in at lecture or in the TA's mailbox at Sudikoff by class time on Wed. Oct. 14, 2009. Late assignments will be strongly penalized, so please start early.

For this assignment, you may work either alone, or with one (1) other partner. You may not share your code with anyone other than your partner, nor may you look at anyone else's code. See the Course Information for a review of the policies on joint work.

What to Turn In

For CS 8 programming assignments, such as this one, you must turn in a hard copy listing of any procedures you wrote yourself, or modified from the example code provided with the assignment. You must also turn in example output for each section of the problem set, as applicable.

You should not turn in printouts of any code we provided you which you did not make any changes to, and please be sure to label everything carefully, so that the graders will know what they are looking at.

You must also e-mail an electronic version of your code to cs8hw@cs.dartmouth.edu with the subject "CS 8 Problem Set 2", by no later than 2:00 am on the date it is due. Please organize your code in one or more plain text files and attach them as enclosures to your message. Do not include your code directly in the body of the e-mail message! If you are sending more than one file, please make sure the files have descriptive names, and that you include comments that will let the graders know what they are looking at.

Code submitted electronically must be sent as plain text, in one or more files with the filename extension ".hs" or ".lhs" (e.g., mycode.hs). Do not send your programs as Word documents, RTF, or other "formatted" types of files.

Output

For each part of any assignment, we require output which demonstrates that your code correctly solves the problem it is supposed to solve. If you turn in code with no output, you may receive only partial credit, or no credit, for that part of the assignment.

Some problems will give specific test cases you need to provide output for. However, even if specific test cases are not provided, and even if it is not listed under "What to Turn In" for a particular problem, you must provide some sample output that shows how your code behaves on typical inputs.

If you are unable to get your code working, please turn in whatever you do have, so that we can determine if you should receive partial credit. Please do not include output which suggests your code works if it doesn't -- however, if your code works for some of the test cases, and not for others, you should turn in whatever output you are able to generate, even if it is incorrect. You will get credit for the "Testing" part of the score, even if the tests indicate that the program is incorrect.

Output is not required to be submitted electronically, although you are welcome to do so if you wish. If you do choose to submit output electronically, put your output samples in a separate file from your code submissions. Output may be submitted in any reasonable file format, though plain text is easiest to read.


Background Information

Graphs

In this problem set, we will be investigating graphs and finite automata. What is a graph? It is a mathematical structure consisting of two components -- a set of points, or vertices together with a set of lines connecting pairs of these points, which are called called edges. (Incidentally, the singular form of "vertices" is "vertex", not "vertice"). A graph turns out to be a fantastically useful abstraction for a wide variety of problems, including pattern matching, game playing, path finding, and automated reasoning. We will be looking at a specific application to finding regular patterns in sequences of input data such as strings or lists.

Formally, a graph can be described as an ordered pair G = (V, E) where V is a set of vertices and E is a set of edges. A vertex can be described by giving it a name, which is a unique symbol that serves to identify the vertex. Each edge is represented by an ordered pair of vertex names. Edges may have a direction, in which case we say that the graph is a directed graph, or not, in which case the graph is said to be an undirected graph.

Suppose u and v are vertices in some graph G. If G is a directed graph, the edge (u, v) represents an edge from a source vertex u to a destination vertex v, and is distinct from an edge (v, u) which goes in the opposite direction. In an undirected graph, (u, v) and (v, u) are considered the same edge. We'll be focusing our attention on directed graphs (digraphs) in this problem set.

Directed graph example Here is a graphical example of a directed graph. The vertices are A, B, C, and D; in set notation, V = { A, B, C, D }. The arrows in the diagram indicate the edges, and the arrowheads indicate their directions, so in this example we have E = { (A, B), (B, C), (B, D), (D, C) }. Note that the order in which the edges are listed in the set does not matter.

In many cases each edge has a label associated with it. The type of label depends on the application. It could be a distance in a graph of cities, a road name in a graph representing a road map, or whatever else is convenient.

Implementation of Graphs

A graph can be viewed as an abstract data type, with the six operations given below. You will implement a module which provides types and functions for this abstract data type, using Haskell's built-in list type as the data structure that holds directed graphs. We will see an alternate (faster) implementation later in the term.

A template that you will complete to create this module is provided in DigraphTemplate.hs. The form of the data structure that you will implement is called an adjacency list structure, because each vertex is paired with a list of all adjacent vertices. There are three predefined types in the code, AdjList, Digraph, and Edge. The Digraph type is a list of pairs, each of which consists of a vertex and an adjacency list. The vertex is the source of all the edges leaving it. The adjacency list (of type AdjList) is a list of of all destination vertices, each paired with the label on the corresponding edge. An <Edge> is a triple consisting of the source vertex, the destination vertex, and a label.

The declarations are below. Note that the type declarations allow arbitrary types for vertices and edge labels.


type AdjList v e = [(v, e)]

type Digraph v e = [(v, AdjList v e)]

type Edge v e    = (v, v, e)

There are also six functions whose names and type signatures are provided. Note that the vertex type must be in type class Eq, which allows vertices to be compared using ==.

empty            :: Digraph v e

insertVertex     :: (Eq v) => v -> Digraph v e -> Digraph v e 
                    
insertEdge       :: (Eq v) =>  Edge v e -> Digraph v e -> Digraph v e
   
makeDigraph      :: (Eq v) => [v] -> [Edge v e] -> Digraph v e

getAdj           :: (Eq v) => v -> Digraph v e -> Adjlist v e

vertexInGraph    :: (Eq v) => v -> Digraph v e -> Bool

The function empty returns an empty diagraph. (This is very easy with this implementation, but we will later look at alternate implementations where it is a bit more interesting.)

The function insertVertex inserts a new vertex into the digraph with an empty adjacency list. You need not verify that the vertex is not already present.

The function insertEdge inserts a new edge into the digraph. You should call error if the source vertex has not been inserted. You need not verify that the destination vertex is present.

The function makeDigraph constructs a complete digraph when given a list of vertices and a list of edges. For example, the call

makeDigraph ['a', 'b', 'c'] 
            [('a', 'b', 0), ('a', 'c', 1), ('c', 'b', 2)]

should produce a digraph equivalent to

[('a',[('b',0),('c',1)]),
 ('b',[]),
 ('c',[('b',2)])]

The order of elements with lists does not matter.

The function makeDigraph can be written very compactly given insertVertex and insertEdge. Can you see a way to use foldl or foldr?

The function getAdj should return the adjacency list paired with the vertex that is its parameter. You should report an error if the vertex is not in the graph.

The function vertexInGraph returns True if the given vertex is in the given graph and False otherwise.

You should feel free to write additional utility functions as part of your solution, as you see fit. Many of the built-in functions on lists may also be useful.

Finite Automata

A finite automaton, or FA, is a directed graph in which each edge carries a label, which is an object associated with the edge. As we will see, finite automata are a useful device for recognizing regular patterns in sequences. Note that the word "automaton" is distinct from the word "automation" -- you can hear the pronunciation of "automaton" here if you are unsure of how to say it aloud).

The set of objects that are used as labels is called the alphabet of the automaton. In a finite automaton, it is customary to call the vertices states and the labelled edges transitions.

A finite automaton can be thought of as a rule describing how to scan a sequence of input symbols, one at a time, and decide whether or not the sequence matches a certain criterion. If the sequence matches the criterion, we say that the automaton "accepts" the sequence; otherwise, we say that the automaton "rejects" the sequence. In a sense, a finite automaton is like a simple machine for dividing sequences into two categories -- an operation that is at the heart of common pattern-matching tasks such as spell-checking, find-and-replace operations, and parsing the basic tokens of a programming language so they can be compiled.

One analogy for a finite automaton is to think of a road-map, in which the states of the automaton represent towns or cities, and the edges represent one-way roads connecting them. One of the "towns" (states) is designated as the start state, and some subset of the states are designated as final states. Your goal is to travel along (traverse) the roads, beginning at the start state, and hopefully ending at one of the final states.

Each of the roads has a "toll" associated with it, and different toll booths require different kinds of tokens. At the outset, you are given a tube of tokens (the input), and you may only remove the top token from the tube at each leg of the journey (you have to use them in order). At each city, you take the next token out of the tube, and if there is a road whose toll booth accepts that token, you traverse that road to the next city. If you ever get to a point where there is no road that will take your next token, you are "stuck" in that city, and you "lose" the game.

At some point, you will run out of tokens. If, at that point, you are in one of the final states, you "win"; otherwise, you "lose". In terms of the finite automaton, we say that a sequence of tokens that makes you win is "accepted" by the automaton, and sequences which make you lose are said to be "rejected" by the automaton. The power of a finite automaton comes from the types of sequences they can accept.

Finite automata may be deterministic or non-deterministic. A deterministic finite automaton, or DFA, is a finite automaton in which you have exactly one choice for each possible symbol at each state. In a non-deterministic automaton, you may have no choices, or more than one choice, at each state.

Formally, a finite automaton is specified by the following five quantities:

Q The set of states. This corresponds to the set of vertices of the graph.
S The input alphabet, the set of labels that can be associated with edges in the graph.
d The transition relation, which takes a state and an input symbol as input, and returns new states. In other words, this corresponds to the edges of the graph.
q0 The start state. This must be an element of Q. Traversal of the FA starts in this state.
F The set of final states. This is a subset of Q, and specifies which states of the graph are considered "accepting" states.

For simplicity and clarity, our examples in this problem set will always denote states by letters and use the fixed input alphabet {0, 1}. However, the code you write should permit an arbitrary choice of state labels and input alphabet.


Programming Exercises

There are five problems in this assignment.


Problem 1: Completing the Digraph module

Part 1: Implement the six functions in the Digraph module whose type signatures are given. The descriptions are given above.

What to Turn In

For Problem 1, turn in your code for the Digraph module. Also present a transcript of a run demonstrating that your program works. In particular, show that

makeDigraph ['a', 'b', 'c'] [('a', 'b', 0), ('a', 'c', 1), ('c', 'b', 2)]
produces the correct digraph and that the other five functions work correctly. Include the two error cases that you are asked to test (insertEdge when the source vertex is not in the graph and getAdj on a vertex not in the graph).

Problem 2: Create an FA module

Define a new module FA. Define the Automaton type

data Automaton a b = FA {graph :: Digraph a b, start :: a, final :: [a]}
     deriving Show

Note that this data type has only one constructor, which is allowed. We could have defined a type as was done for Digraph. However, using field labels (see p. 216 in the textbook) makes it easier to access the various fields of the finite automaton and makes code more readable.

Write the function makeAutomaton, which takes as parameter a list of states, a list of labeled edges, a start state, and a list of final states. It creates an Automaton. The type signature is given below.

makeAutomaton :: (Eq a, Eq b) => [a] -> [Edge a b] -> a -> [a] -> Automaton a b
Note:
For this exercise, you do not need to check the graph to make sure it is deterministic; you may simply assume that it is, as a precondition for correctness.
Test makeAutomaton by adding the following code to your module:
dfa1 = makeAutomaton ['a', 'b', 'c', 'd'] 
         [('a', 'a', 0), ('a', 'b', 1), ('b', 'a', 1), ('b', 'c', 0), ('c', 'b', 0), 
          ('c', 'c', 1), ('d', 'a', 0), ('d', 'b', 1)]
         'd' 
         ['a']

This DFA, as you will see, accepts sequences of 0's and 1's representing binary numbers that are divisible by 3. You should draw it out on a piece of paper and convince yourself that this is true (step through it using a few numbers expressed in binary as the input).

What to Turn In

For Problem 2, turn in your implementation of makeAutomaton as part of your completed module. Demonstrate that your constructor works correctly. Include printing out dfa1 as defined above.


Problem 3: DFA Traversal

Now that we have defined most of the underlying structure for a DFA, only one thing is still missing -- the notion of the transition relation, which takes a state and an input symbol, and returns a new state. The encoding for this relation is in the labeled edges of the graph we are using to represent a DFA.

Write a function stepDFA that takes an Automaton, a state name, and an input symbol, and returns the state of the DFA which is reachable from the given state with the given symbol. If there is no such state, the function should report an error. Your function should be strict about enforcing determinism -- if there is any more than one state reachable from the given state, your function should report an error.

What to Turn In

For Problem 3, turn in your definition of stepDFA. Show that works correctly by giving various states and labels to dfa1. Include giving a state not in the dfa.


Problem 4: DFA Simulation

Now that we have step-dfa, we can simulate the operation of a DFA on an entire input list. Define a function called simulateDFA which takes an automaton and an input sequence. Your function should return True if the sequence drives the DFA to a final state (i.e., the sequence is accepted), or False otherwise.

Example outputs:

simulateDFA dfa1 [1, 0, 0, 1]
=> True

simulateDFA dfa1 [1, 0, 1, 1]
=> False

What to Turn In

For Problem 4, turn in your definition for simulateDFA and test data (including the examples above) demonstrating that it works.


Problem 5: Non-Determinism

A nondeterministic finite automaton, or NFA, has the advantage that you can have any number of transitions for each state and each symbol (including none). A traversal through the machine requires you to choose which transition to take, whenever there is more than one possibility. An NFA accepts its input if any path from the start state, whose edges are labelled by the corresponding input symbols, leads to some final state.

To implement nondeterminism, the basic structure of the finite automaton does not change, since we have not implemented anything that forces the automaton to be deterministic. In this problem, you will begin with the functions you wrote for Problem 3 and Problem 4, and modify them to interpret nondeterminism correctly.

Write stepNFA and simulateNFA as analogues of the stepDFA and simulateDFA functions you wrote previously. Don't discard the earlier versions of the functions, but feel free to copy them, rename them, and perform surgery on them in order to get your solutions for this problem. Note that after a step you will in general have a number of states where you could be. You could be in any of these states, so when you next do a step you need to see where you can get to from all of the states. Thus stepNFA will have to take a list of input states and return a list of output states. Again, you may find it useful to write helper functions.

Test your functions on the DFA examples from above (since every DFA is also an NFA, they ought to work just the same). Also try them on the following NFA, which recognizes sequences of 0's and 1's that contain two consecutive zeroes, two consecutive ones, or both.

nfa1 = makeAutomaton ['a', 'b',  'c',  'd', 'e']
         [('a', 'a', 0), ('a', 'a', 1), ('a', 'b', 1), ('a', 'c', 0), ('b', 'd', 1),
          ('c', 'e', 0), ('d', 'd', 0), ('d', 'd', 1), ('e', 'e', 0), ('e', 'e', 1)]
         'a' 
         ['d', 'e']

Nondeterministic finite automaton example

State A is the starting state, while the double-outlined states D and E are the final states. Transition arrows labelled "0, 1" are a shorthand for two transitions which have the same endpoints, but different labels. There should be two separate edges in the digraph.

What to Turn In

For Problem 5, turn in your code for the functions stepNFA and simulateNFA. Also turn in a set of test cases that show that your functions work correctly.


Extra Credit: Error Testing (optional)

In Problem 1, you were told that you did not need to verify that a new vertex was not already in the digraph and that the destination of an edge had been inserted into the digraph. Check for these errors.

Recall that in a deterministic automaton, each state is require to have exactly one transition for each possible input symbol.

Write a new constructor makeDFA which takes the same parameters as makeAutomaton, and which constructs a DFA. Unlike makeAutomaton, however, makeDFA should report an error if the arguments it is given do not specify a deterministic automaton.

Positive example for DFA verification Negative example for DFA verification

What to Turn In

If you choose to do the extra credit problem, turn in the code for the modified insertVertex and insertEdge functions and your makeDFA function, along with outputs demonstrating that your code works correctly. You might try it out on the examples given above, for instance -- but if necessary, make up your own test cases to show that your code works.

The rubric for grading can be found here.



Department of Computer Science, Dartmouth College, Hanover, New Hampshire, USA

Last modified: 05-October-2009