COSC 8 -- Problem Solving with CS

CS 8, Fall 2009

Short Assignment 8: Parsing

Instructions

You should print out your solutions to this assignment, and either turn them in at lecture, or leave them in the TA's mailbox at Sudikoff by class time on Fri. Nov. 6, 2009.

Please also e-mail an electronic version of your code to cs8hw@cs.dartmouth.edu, with the subject "CS 8 Short Assignment 8". Please attach your code as one or more enclosures to the message, and do not include it directly in the body of the e-mail message.

You must work alone on this short assignment.


Assignment

This assignment asks you to extend Parser.hs, which we saw in class. It then asks you to use the extensions to improve on the tree and sentence parsing examples.

  1. Write #++

    The function #: combines a Parser a with a Parser [a] and returns a Parser [a]. This is useful when the left-hand parser returns a Char and the right-hand parser returns a String. But cases arise in where it is useful to combine two parsers of type Parser [a] and return a parser of type Parser [a] that appends together the results of the two calls. One of these is when processing a number in the expression parser Expressions.hs. We want to process the digits before the "." and return them as a list, and then process the decimal point and trailing digits and return them as a list. We need to append these lists to get the whole number. Write an operator #++ that does this operation.

  2. Write takeRepeatParse

    The parser takeWhileParse is useful, but it is valid for it to return an empty list as the parse if the parser m does not succeed the first time it is applied. Sometimes this is exactly what you want (e.g. when you are trying to skip over whitespace). Other times it is not appropriate, because you want to fail if you don't get at least one parse (e.g. when you want to get a word, meaning the largests sequence of alphabetic characters possible).

    Write takeRepeatParse m which is like takeWhileParse, except that it fails if m fails the first time it is applied.

  3. Write parseSpaces

    Write a parser parseSpaces that will eat up and return all leading whitespace (things accepted by the Parser space) and return it as the parse value. An empty string is an acceptable thing to return.

  4. Write parseWord

    Write a parser parseWord that will eat up any leading whitespace, will parse all consecutive alphabetic characters (things accepted by parser letter), and then will eat up any trailing white space. It should return the alphabetic characters as the parse, but should fail if it cannot parse at least one alphabetic character. In writing this I wrote a function elimSpacesAround which when passed a parser creates a new parser that eliminates spaces before and after the thing parsed.

  5. Modify word

    The current sentence parser has a problem. It parses strings like "scotsits" and "thehappypersonwatchesahungrycat". The problem is with word. It looks for word w by determining if the first length w characters match w. Re-write word so that it instead finds the first actual word (eliminating white space before and after) and then checks to see if it is w. Show that after this change the parser s will fail to parse the examples above, but will parse them if there is whitespace (even lots of whitespace) between individual words.

  6. Modify parseTree

    The current parseBinary2 only works on parenthesized "*", and allows no spaces anywhere. The output from the hierarchical clustering of stock data looked more like: ((CLX FDC) (UPS (ASN HLT))) Note that this data has words at the leaves, and has whitespace between leaves. Modify Tree to allow strings at the leaves (this is the version of Tree on p. 82 of SOE and in PS 3) and write a parseTree that allows white space anywhere except in the middle of a word. Your parseTree should be able to parse: " ( ( CLX FDC ) ( UPS ( ASN HLT ) ) ) "

Hand in

Hand in a listing of your modifed Parser.hs, along with test runs that show that the new and modified parsers assigned in the six parts above work correctly.