Topic: More Parsing Date: Nov. 2, 2009 Number: 19 Examples: Expressions.hs Reading: SA 8 out. -- Natural Language Parsing First, build up tools to recognize words, white space, etc. -- Parses a single alphabetical character letter :: Parser Char letter [] = Nothing letter (c:cs) = if isAlpha c then Just (c,cs) else Nothing -- Parses a single whitespace character (space, return, tab) space :: Parser Char space [] = Nothing space (c:cs) = if isSpace c then Just (c,cs) else Nothing Once again, gets boring. There is a common pattern here: parse something, and if succeeds apply a predicate to see whether to accept it or return Nothing. -- Parses using m, but only succeeds if m succeeds and p is true. infix 7 ? (?) :: Parser a -> (a -> Bool) -> Parser a (m ? p) cs = case m cs of Nothing -> Nothing Just (a,cs) -> if p a then Just (a,cs) else Nothing Example: -- Parses a single character char :: Parser Char char [] = Nothing char (c:cs) = Just (c,cs) then: letter2 = (char ? isAlpha) space2 = (char ? isSpace) But applying parsers for letters is most useful if you can repeat the operation. Will want to combine the first letter with a list of things found by a recursive call. If use # will get pair: (Char, [Char]). So useful to have a way to combine these: -- Parses two things and conses their results into one list infixl 6 #: (#:) :: Parser a -> Parser [a] -> Parser [a] m #: n = m # n >-> (\ (x,xs) -> x : xs) Then can say: -- Like take -- apply parser m i times takeParse :: Parser a -> Int -> Parser [a] takeParse m 0 = returnParse [] takeParse m i = m #: takeParse m (i-1) -- Like takeWhile -- apply parser m as long as it succeeds takeWhileParse :: Parser a -> Parser [a] takeWhileParse m = m #: takeWhileParse m ! returnParse [] Note that while m succeeds, make first choice. When fails, second succeeds. Uses a couple of useful functions: -- Returns a parser that always succeeds, returning a fixed value. returnParse :: a -> Parser a returnParse v cs = Just (v,cs) -- A parser that always fails failParse :: Parser a failParse cs = Nothing NOTE: Why are these needed? Especially in the second case, why not just use Nothing? But Nothing is not a Parser, and we can only use our combinators to combine Parsers. So these are parsers that create the needed results. Given this, we can extract a single specified word (and eliminate the following whitespace): word :: String -> Parser String word w = (takeParse letter (length w) ? (==w)) #- (takeWhileParse space) Fails if the first (length w) characters are not w. So first need type to hold the parse of a sentence. Compare to the grammar saw at beginning of class. -------------------- data Parse = S Parse Parse | NP Parse Parse Parse | VP Parse Parse | Name String | Det String | Adj String | Noun String | IVerb String | TVerb String deriving Show Now we can create a parser for each entity: -- Each parses the thing named (where s is sentence) s = np # vp >-> (\(np,vp) -> S np vp) np = name ! (det # adj # noun >-> (\((d,a),n) -> NP d a n)) name = wordChoice ["scot", "chris"] >-> Name det = wordChoice ["a", "the"] >-> Det adj = wordChoice ["happy", "hungry", "blue"] >-> Adj noun = wordChoice ["person", "cat"] >-> Noun vp = iverb ! (tverb # np) >-> (\(t, n) -> VP t n) iverb = wordChoice ["sits", "jogs"] >-> IVerb tverb = wordChoice ["eats", "watches"] >-> TVerb We use the following to handle the choice of a list of words (as opposed to doing lots of ! operations). -- Match the first word in the list wordChoice :: [String] -> Parser String wordChoice [] = failParse wordChoice (w:ws) = (word w ! wordChoice ws) Note - method of finding a word has a problem. Happily parses sentences like: "thehappypersonwatchesahungrycat" Should find word first (everything up to space) THEN see if it matches. Will deal with this in short assignment. --- Expression parsing --- I want to show an example of a more realistic parsing program. It parses expressions. The BNF (Backus-Naur Form) grammar for expression parsing is: expr = term [{'+' | '-'} term]... term = factor [{'*' | '/' } factor]... factor = number | variable | '(' expr ')' number = digits | digits '.' digits variable = letter [letter | digit]... digits = digit [digit]... I am introducing several new notations. 1) curly braces "{ }". Use them to group, like you would normally use parentheses. 2) Square braces "[ ]". Means whatever is included inside is optional. 3) [ ]... means repeat the expression inside 0 or more times. So an expression is a "term" followed by 0 or more "+ term" or "- term". (Could have written expr = term | term {'+' | '-'} expr and generated the same grammar, but the association is then to the right. This form is easier to translate into a parser.) Why separate expr, term, and factor? A matter of precedence. This gives * and / higher precedence than + and -, because we swallow up all available * and / as part of a term before we get to a second + or -. We save our data in a modification of SOE expression in chap. 7. We add Var for variables. data Expr = Const String | Var String | Expr :+ Expr | Expr :- Expr | Expr :* Expr | Expr :/ Expr deriving Show Then our parser becomes: -- Parses a single digit digit :: Parser Char digit = char ? isDigit Very similar to letter -- Parses a maximal sequence of digits. Must be at least one. parseDigits :: Parser String parseDigits = takeRepeatParse digit Compare to: digits = digit [digit]... -- Parses a legal variable name variable :: Parser Expr variable = elimSpacesAround (letter #: takeWhileParse (letter ! digit)) >-> Var Compare to: variable = letter [letter | digit]... So must have a letter, then cons with as many letters or digits as you can. Finally convert the string to an Expr by calling the Var constructor. Note - could have written: -- Parses a single letter or digit alphaNum :: Parser Char alphaNum = char ? isAlphaNum because this is a function in Data.Char. -- Parses a number, eliminating whitespace on both sides number :: Parser Expr number = elimSpacesAround (parseDigits #++ (lit '.' #: parseDigits) ! parseDigits) >-> Const Allows a number to be digits, decimal point, digits or just digits. Order is important - must check for decimal point first! Whatever I get I apply the Const constructor. Compare to: number = digits | digits '.' digits Note: An alternate way to define number would be: number = digits ['.' digits] How could this be implemented? number = elimSpacesAround (parseDigits #++ ((lit '.' #: parseDigits) ! returnParse [])) >-> Const The idea is that optional things can just be left out - in this case an empty list indicates that nothing is parsed. -- Parses a factor factor :: Parser Expr factor = number ! variable ! elimSpacesAround ( lit '(' -# expr #- lit ')' ) Compare to: factor = number | variable | '(' expr ')' Straightforward - factor is a number, a variable, or an expression surrounded by (). Note the -# and #- keep only the expr. Both expr and term have similar form: expr = term [{'+' | '-'} term]... term = factor [{'*' | '/' } factor]... They call a parser repeatedly, separated by particular operators. We abstract this idea out: -- Parses an series of the same precedence function, associating -- left -- Passed a parser the desired expression type and a parser that -- parses the operations parseSeries :: Parser Expr -> Parser (Expr -> Expr -> Expr) -> Parser Expr parseSeries m opParser = elimSpacesAround (m # takeWhileParse (opParser # m) >-> convertPairList) m is the parser (term or factor), and opParser returns the constructor associated with the symbol. So -- Parses a term term :: Parser Expr term = parseSeries factor (lit '*' >-> const (:*) ! lit '/' >-> const (:/)) -- Parses an expression expr :: Parser Expr expr = parseSeries term (lit '+' >-> const (:+) ! lit '-' >-> const (:-)) Returns a rather messy data structure: (Expr, [((Expr -> Expr -> Expr), Expr)]) Need to convert this into a single expression Do it with a foldl. The left side starts as the first Expr, and thereafter the lastest value gets passed in. It is combined with a pair of a constructor and a right-hand expression: -- Coverts a list of (operation, Expr) pairs into an expression convertPairList :: (Expr, [((Expr -> Expr -> Expr), Expr)]) -> Expr convertPairList (first, exprList) = foldl (\l (op, r) -> l `op` r) first exprList Notice how similar this code is to the BNF above. "|" becomes "!", []... becomes takeWhileParse, sequencing becomes # (or some variant). ----- This does the parsing. We already saw evaluating expressions with only constants. Adding variables is done by associating an aList (list of (key, value) pairs) to look up the variables in: -- Evaluates an Expr, given an association list of variable bindings -- Modification of evaluate from SOE eval :: Expr -> [(String, Float)] -> Float eval expr aList = evaluate expr where evaluate :: Expr -> Float evaluate (Const x) = read x evaluate (Var name) = case lookup name aList of Nothing -> error ("Unassigned variable " ++ name) (Just x) -> x evaluate (e1 :+ e2) = evaluate e1 + evaluate e2 evaluate (e1 :- e2) = evaluate e1 - evaluate e2 evaluate (e1 :* e2) = evaluate e1 * evaluate e2 evaluate (e1 :/ e2) = evaluate e1 / evaluate e2 -- Examples e1 = "x + 5*y*factor2 - x/2 + (x - factor2 + 10.25)*(y - 2)" (Just (ex1, "")) = expr e1 v1 = eval ex1 [("x",4), ("y", 6), ("factor2", 2)]