Topic: Streams Date: Nov. 16, 2009 Number: 24 Examples: streams.hs Reading: Chapter 14 Finish type classes from last class. ---- Streams are basically infinite lists. We have seen them already, in the forms: repeat 0 [1..] But we haven't created them recursively and dealt with them as infinite objects. So to start: ones = 1 : ones How does this work? Well, let's unroll it, repeatedly substiting the right side of the definition of "ones" for "ones": ones = 1 : ones = 1 : (1 : ones) = 1 : 1 : (1 : ones) and so on forever. So take 10 ones gives us the first 10 of them. (demo) Can do something a bit more complicated: integersFrom :: Integer -> [Integer] integersFrom n = n : integersFrom (n + 1) Then can define: naturals = integersFrom 0 But this is not our only option. A stranger way to do it: naturals2 = 0 : zipWith (+) ones naturals2 This seems very strange. The stream naturals2 is definined in terms of itself! How can this work? Important point: the first value IS supplied (0). It gives us something to bootstrap from. First, see that it makes sense: ones = 1 1 1 1 1 1 ... naturals = 0 1 2 3 4 5 zipWith (+) = 1 2 3 4 5 6 Note that the zipWith row computes the thing that will be needed in naturals for the NEXT zipWith. So we stay one ahead of ourselves. Draw diagram like on p. 194 of SOE. Circle for : (in ones a loop), square for zipWith. So zipWith can work on infinite lists. What else? How about map? naturals3 = 0 : map (+1) naturals3 Here naturals3 is defined without needing another stream - it builds itself. At each step we add one to the previous item. Note that it is very important that we started with 0:, so that the first item is available. How about filter? Why not? odds = filter (\x -> x `mod` 2 == 0) naturals foldl will always fail and foldr will usually fail (why)? But scanl works: naturals4 = scanl (+) 0 ones Or more useful: triangular = scanl (+) 0 (tail naturals) scanr works in cases where foldr works. Look at book example of Fibonacci numbers: fib = 1 : 1 : zipWith (+) fib (tail fib) Here we include not one but two recursive references, one to the tail. Note that the first zipWith result adds (head fib) and (head (tail fib)). That means that we need TWO initial values. But that is the case for the recursive definition of fibonacci numbers, also. Book points out an important point - Haskell is smart enough to realize that fib is the same in both occurences on the right side. It in effect creates a "where" clause. Thus we can compute fib very efficiently. Demo fib !! 1000 and fib !! 10000 This contrasts with recursive version of fib: fibRec :: Integer -> Integer fibRec n = if n < 2 then 1 else fibRec (n-1) + fibRec (n-2) This version takes exponential time. Here Haskell cannot share the call to fibRec(n-1) and fibRec(n-2). It makes fib n calls to the base case alone (can prove by induction), and fib n is about (golden ratio)^n . There is an approach called memoization, which remembers all of the previous values, but doing that requires state in one form or another. The memoized version takes O(n) time to compute fibRec n, just like fib !! n does. --------- Another idea - find primes via sieve method. Invented by the Greek Eratosthenes before 200 BC. (He is also credited with being the first person to calculate the circumference of the Earth.) Idea - start with all integers > 1. Declare 2 a prime, then cross off all multiples of 2. Next thing left is 3, so declare 3 a prime and cross off all multiples of 3. Next thing left is 5, etc. primes = sieve [2..] sieve :: [Integer] -> [Integer] sieve (p:lst) = p:sieve (filter (\b -> b `rem` p /= 0) lst) Here the recursion is a bit trickier. It is like we used in uniques. The first thing in the list is always a prime, and then we filter its multiples out of what remains. Note that after the kth prime is computed there are k filters running down the rest of the stream! When we ask for the next one each will test it. (Lazy evaluation - they didn't need to test it earlier.) Lazy evaluation makes all of this work. We should discuss it in more detail. The rule is always to evaluate the OUTERMOST function first, and to evaluate its parameters left to right. Thus the sieve call the first time is able to take the 2. The next time it has to call filter to test 3, which is then passed on to the p in the sieve pattern. The third time there will be two filters. 4 fails to get through the 2 filter, but 5 gets through both. The 6 gets tested by filters for 2, 3, and 5, etc. -- Hamming's problem So far we have used streams to find alternate ways of doing things that could easily be done without them (although for the sieve it would be necessary to know in advance what range you wanted to sieve if you had to set it up in advance). Here is an example of something that is a good deal trickier without streams: Hamming proposed the following problem: generate all integers whose only factors are 2, 3, and 5 in increasing order. How can we do that? First, seems clear that if we want this in increasing order we need some sort of merge function, and one that eliminates duplicates: merge :: Ord a => [a] -> [a] -> [a] merge xs@(x:xt) ys@(y:yt) = if x < y then x:(merge xt ys) else if y < x then y:(merge xs yt) else x:(merge xt yt) Now how should the stream work? Hamming starts with 1. For every number n in hamming, 2*n, 3*n, and 5*n are also in hamming. Gives us: -- Computes all numbers of form 2^i * 3^j * 5^k hamming = 1 : (merge (map (*2) hamming) (merge (map (*3) hamming) (map (*5) hamming))) Do the bubble diagram and show how it works. Show how the three streams are created, merged.