Topic: Recursive Functions and Pattern Matching Date: Sep. 25, 2009 Number: 3 Examples: recursiveExamples.hs, dna2proteins.hs, wordLength.hs -- Built-in functions: There are many built-in functions provided for you, e.g., + -- addition abs -- absolute value - -- subtraction sqrt -- square root * -- multiplication exp -- exponentiation / -- division log -- logarithm ^ -- raise to integral power on Ints and Integers: div, mod (also quot, rem) Difference - on negative numbers. div rounds down (so div 5 (-2) => -3) while quot rounds toward 0 (so quot 5 (-2) => -2). mod and rem are then defined so that (div x y) * y + (mod x y) = x and (quot x y) * y + (rem x y) = x So you should use either the div, mod pair or the quot, rem pair. Don't mix them. sin, cos, tan, asin, ... -- Trig functions See p. 342 for list of numerical functions. -- Recursion in Haskell The standard pattern for recursion in Java was to use an if statement to select between the base case (or cases) and the recursive case (or cases). We can do the same in Haskell, but we instead have conditional expressions: if predicate then expr1 else expr2 We will see these conditional expressions later, but less often than you might expect. Instead of using conditional expressions, we often use pattern matching. The natural recursion for a list is to handle the first item directly and the rest of the list recursively. Here we use that decomposition to compute the length of a string. -- Computes the length of a string strLength :: [Char] -> Int strLength [] = 0 strLength (c:rest) = 1 + strLength rest Recursive idea simple - either empty (length 0) or is a list with first character and the rest. So 1 plus strLength rest. [] matches empty string (c:rest) matches a character (c) and the rest of the list (rest). Sort of like saying: c = head lst rest = tail lst where lst is a parameter. But all done at once! This sort of matching can be done on constructors. Here the constructor is ":". The pattern match says, "Call the LHS of the last list construction operation c and the RHS of the last list construction operation rest." Note - Function calls bind tighter than operators! So don't need () around strLength rest Note - MUST have () around the (c:rest) pattern on the left. -- listSum implemented recursively (from SOE) listSum :: [Double] -> Double listSum [] = 0.0 listSum (num:rest) = num + listSum rest Same idea, except instead of 1 we add in the value of num, the first item in the list. -- Reverse a string Look at reverse function. Natural way to think of this is to take the first item of the list, put it at the end, and reverse what is left. Saw in CS 5. This leads to a straightforward definition: reverseString :: [Char] -> [Char] reverseString [] = [] reverseString (x:xs) = reverseString xs ++ [x] Note : Book likes to use x as first item of a list and xs as the rest of a list. We will often follow this convention. Note: ++ concatenates two lists, making one list out of them. We also say the second list is appended to the first. Note the difference from the ":" list constructor, which combines a single element with a list of elements. Unfortunately, this takes O(n^2) time: concatenate to end of lists of length 0, 1, 2, ..., n-1. This is because the only way to get to the end of the linked list representing a the list is to walk all the way to the end. (No tail pointer, and even if you could lists are immutable, like Java Strings. Draw linked list to show what happens.) How can we do better? Reverse is easy with a stack - pile stuff up on the top, end up with a stack in reverse order that things came in. So use this - create an auxilary function with a parameter to accumulate the reversed list. reverseStringFast xs = rev [] xs where rev acc [] = acc rev acc (x:xs) = rev (x:acc) xs Keep taking first thing off the list and putting on front of acc, returning acc when done. O(n). Demo by hand. NOTE: "where". Can be used when defining a function. Of form: functionName parm1 parm2 ... = expression where bindings Alternate: "let" let bindings in expression Can use anywhere (e.g. after an "="). Allows you to create local names and bind them to values (function definitions are also values). "rev" can't be seen outside of the function. Show not defined at top level. Also, indentation is significant! Can do Haskell with ";" at end of statements, "{}" surrounding things like stuff inside where or let. But it is position-sensitive, with the rules: 1) If you have a statement, lines indented further are part of the same statement. Ones indented same or less start new statement. 2) If you have where, let, do, ... that allow multiple statements, the first word after the where, let, ... defines the indentation level for the list of statements. The first line indented less ends the list. 3) The "if", "then", and "else" can normally line up (have the same amount of indentation.) HOWEVER, within a "do" you get a parse error unless the "then" and "else" are indented further than the "if". (The "then" and "else" can have the same indentation level.) There are a number of things (e.g. "=") that work differently within a "do", and apparently if-then-else is one of them. HINT: DO NOT use tabs in your Haskell programs. The number of spaces shown in the editor may be different than the number assumed by the compiler, so groupings can get messed up. This idea of using an auxillary helper function with an extra parameter as an accumulator is one we will see often. -- Convert DNA sequence to amino acids DNA has 4 bases: A, C, T, and G. These are used to encode proteins. Each possible triple for bases encodes an aminio acid (except for "TAA", "TAG", and "TGA" which encode nothing). Most amino acids can be encoded in multiple ways. We will use a lookup table, called an "association list" or "a-list". It is a list of (key, value) pairs. There is a builtin function lookup that will do this lookup, which we will see later. So we need to take triples of bases, look them up in the list, and put the corresponding amino acid in the output list. How get a triple? Pattern match the first three items: -- Converts a sequence of bases in DNA to a sequence of amino acids dna2proteins :: [Char] -> [Char] dna2proteins (b1:b2:b3:rest) = lookupSure [b1, b2, b3] codes : dna2proteins rest where codes :: [([Char], Char)] codes = [("ATA", 'I'), ("ATC", 'I'), ("ATT", 'I'), ("ATG", 'M'), ("ACA", 'T'), ("ACC", 'T'), ("ACG", 'T'), ("ACT", 'T'), ("AAC", 'N'), ("AAT", 'N'), ("AAA", 'K'), ("AAG", 'K'), ("AGC", 'S'), ("AGT", 'S'), ("AGA", 'R'), ("AGG", 'R'), ("CTA", 'L'), ("CTC", 'L'), ("CTG", 'L'), ("CTT", 'L'), ("CCA", 'P'), ("CCC", 'P'), ("CCG", 'P'), ("CCT", 'P'), ("CAC", 'H'), ("CAT", 'H'), ("CAA", 'Q'), ("CAG", 'Q'), ("CGA", 'R'), ("CGC", 'R'), ("CGG", 'R'), ("CGT", 'R'), ("GTA", 'V'), ("GTC", 'V'), ("GTG", 'V'), ("GTT", 'V'), ("GCA", 'A'), ("GCC", 'A'), ("GCG", 'A'), ("GCT", 'A'), ("GAC", 'D'), ("GAT", 'D'), ("GAA", 'E'), ("GAG", 'E'), ("GGA", 'G'), ("GGC", 'G'), ("GGG", 'G'), ("GGT", 'G'), ("TCA", 'S'), ("TCC", 'S'), ("TCG", 'S'), ("TCT", 'S'), ("TTC", 'F'), ("TTT", 'F'), ("TTA", 'L'), ("TTG", 'L'), ("TAC", 'Y'), ("TAT", 'Y'), ("TAA", '_'), ("TAG", '_'), ("TGC", 'C'), ("TGT", 'C'), ("TGA", '_'), ("TGG", 'W')] dna2proteins _ = [] -- Note 1 or 2 bases at end not converted. NOTE - "_" matches anything. So order of the two definitions is important! Use "_" when you won't use the matched value. -- Looks up a key in a list of (key, datum) pairs. -- Throws an exception if the key is not found. lookupSure :: [Char] -> [([Char], Char)] -> Char lookupSure str ((key, datum) : rest) = if str == key then datum else lookupSure str rest lookupSure str _ = error (str ++ " not found") Note: error throws an exception, returning string that follows it. Also, ++ works for string concatenate, as in Java. But also means "append these two lists into a single list". (Remember that a string is nothing but a list of characters.) Here is a case where we use a condtional expression. Also, "then" and "else" can line up with the "if". Or can all be on a single line. Note how pattern matching simplifies taking apart the list and the ordered pair! The equivalent conditional expression for lookupSure would be: lookupSure2 :: [Char] -> [([Char], Char)] -> Char lookupSure2 str aList = if aList == [] then error (str ++ " not found") else if str == fst (head aList) then snd (head aList) else lookupSure2 str (tail aList) Note: fst and snd are functions to get the first and second items in an ordered pair. ONLY for pairs; won't work on longer tuples. --- Finding average word length in a piece of text. (Will do more clever things later.) What is needed? Want to 1) Break into words 2) Compute the length of each word 3) Sum the lengths 4) Compute the average length To break into words, when have punctuation, etc., we will: a) replace newline '\n', return '\r', tab '\t' by spaces. whitespace2spaces does this. b) eliminate all non-letters (keep spaces) filter on isLetterOrSpace c) Accumulate word until get to space; make list of words. -- Replaces newlines and tabs by spaces whitespace2spaces :: [Char] -> [Char] whitespace2spaces ('\n' : rest) = ' ' : whitespace2spaces rest whitespace2spaces ('\t' : rest) = ' ' : whitespace2spaces rest whitespace2spaces ('\r' : rest) = ' ' : whitespace2spaces rest whitespace2spaces (ch : rest) = ch : whitespace2spaces rest whitespace2spaces [] = [] -- Returns true if the character is a letter or a space isLetterOrSpace :: Char -> Bool isLetterOrSpace ch = elem ch "abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ" NOTE: elem determines if an item appears in a list. Could have written our own, but... Also, could have written whitespace2spaces more briefly as an if expression with an elem call as its predicate. How? -- Breaks the input string into a list of words breakWords :: [Char] -> [[Char]] breakWords str = breakHelper [] (filter isLetterOrSpace (whitespace2spaces str)) -- Breaks the input string into words using an extra parameter. -- The first parameter is the word accumulated so far, -- the second is the remainder of the input text. -- Returns a list of words -- This version is not as efficient as it could be breakHelper :: [Char] -> [Char] -> [[Char]] breakHelper [] [] = [] breakHelper word [] = [word] breakHelper [] (' ' : rest) = breakHelper [] rest breakHelper word (' ' : rest) = word : breakHelper [] rest breakHelper word (letter : rest) = breakHelper (word ++ [letter]) rest Somewhat tricky. Easiest to read from the bottom up. The last line takes a letter off of the input and adds it to the end of the accumulated word. But if the next letter is a space, then we should make the current word the first item in the list, follow it by the rest of the words in the text, and start a new (empty) word. The second to last line does this. The rest handles special cases. If you get two spaces in a row you don't want the empty "current word" to be added as a word. You want to ignore the space and continue. The third line does this. Finally, we want to deal with the end of the text. If we have a word accumulated, that is the only word in the list. If the accumulated word is empty, we just want an empty list. The first two lines handle this. -- Computes the average word length of a piece of text. averageWordLength :: [Char] -> Double averageWordLength str = totalLength / intToDouble (length wordList) where wordList = breakWords str totalLength = intToDouble (foldl (+) 0 (map length wordList)) NOTE: wordList is computed once, used two places. Note that expressions in where or let can refer to other names defined in the same where or let. Can even define mutually recursive functions this way! -- Converts an Int to a Double intToDouble :: Int -> Double intToDouble n = fromInteger (toInteger n) Here is a case where you would like Haskell's type system to be more forgiving. Can't do / on a pair of Integers or Ints, so have to convert. Will see later why this works, but each number type has a fromInteger which converts an Integer into that type, and type system is smart enough to figure out what type you are converting to (here, the type signature tells us that we need to return a Double). Also, we must convert from an Int to an Integer to use fromInteger.