Topic: Representing Shapes and Simple IO Date: Jan. 9, 2009 Number: 4 Examples: Shape.hs, lhs2hs.hs Reading: SOE Chap. 2, begin Chap 3 Announce - SA 2 due Monday -- Modules A module is a way of bundling a group of related functions so that they can be used in (imported into) other programs. In particular, it gives a way of controlling which functions, user-defined types, etc. are visible outside of the module. SOE introduces a Shape module to represent various shapes: Rectangles, ellipses, right triangles, and polygons. The SOE/src code is in Literate Haskell, which allows text to be written around the code. The suffix is ".lhs" rather than ".hs". -- The lines of code to be compiled start with ">". -- Lines beginning with "<" are in the text, but not necessarily executable. -- lines beginning with "|" are also in the text, but are often just expressions or code fragments. I have written a program lhs2hs that can be used to turn .lhs files into .hs files. It is linked as an example program. Use it now; we may look at how it works later. Let's look at the Shape module. The first thing is the header: module Shape ( Shape (Rectangle, Ellipse, RtTriangle, Polygon), Radius, Side, Vertex, square, circle, distBetween, area ) where This gives the module name (file should have SAME name), which must be capitalized. Rule - all module names and type names are capitalized! If just say "module Foo where" everything in the module is exported. Otherwise only listed things are visible in programs that import the module. Above we export a data type (Shape), its constructors (in parentheses), types, and functions. -- Type declarations create a new name (type synonym) for an existing type. Examples: type Radius = Float type Side = Float type Vertex = (Float,Float) Two reasons for doing this: 1) Documentation - can choose names that explain how the data is used rather than what its internal representation is. 2) Flexibility. If want to change the underlying data structure (e.g. use Double instead of Float), we can do it in one place, and it will work correctly everywhere. Note: Standard Prelude defines: type String = [Char] so can use String as a type. -- Data type declarations create something new data Shape = Rectangle Side Side | Ellipse Radius Radius | RtTriangle Side Side | Polygon [Vertex] deriving Show What do we have? First, keyword data. Then the name of the data type, which must be capitalized. Finally, a list of constructors. The constructor names must be capitalized, to distinguish them from normal functions. Note that we get some polymorphism this way - a Shape variable or parmeter could be any of Rectangle, Ellipse, RtTriangle, and Polygon. Each constructor has its own set of data that is used to define it and is stored. So a Rectangle has two Sides. (Could have said "Rectangle Float Float". This clearer.) Ellipse has two radii. Polygon has list of Vertices, each a pair of Floats. "deriving Show" says to automatically create a "show"function, which is equivalent to the Java "toString" function. This function is needed for the print function to print the data type. We will see more about "deriving" later. Note that a Polygon has coordinates, so is located in the plane. However, the other three have only size information, not location information. Seems strange, if we want to draw them. We will see later how SOE handles this. Define one or more of each, show what prints when you print it. See that the constructor is part of the data stored. Note that we can define functions for other shapes in terms of the shapes we have: square s = Rectangle s s circle r = Ellipse r r However, these are regular functions that call constructors, not constructors themselves. Still only 4 different types of Shapes to deal with. Define a square and circle, show what prints. So next we can create functions that have data types as parameters. First, we give type signature for the function area: area :: Shape -> Float Sort of like OO programming, but while DATA is saved in the data type, the FUNCTIONS aren't. They are free-standing, but need to be defined for all of the constructors. We have "dispatch on constructor type", but done via pattern matching. Given a Shape variable, we can safely call area on it no matter which type of shape it is. The area function behaves differently for different contructors (and the associated data). Look at the easy ones first: area (Rectangle s1 s2) = s1*s2 area (RtTriangle s1 s2) = s1*s2/2 area (Ellipse r1 r2) = pi*r1*r2 Note that we are pattern-matching on the constructor and the associated data. Actually, not so different from what we did before. [] and (:) are constructors for the List type, with "syntactic sugar". How about a Polygon? Not so easy. First approach: recursive decomposition. Cut off an "ear". (Draw, show get triangle and smaller polygon.) So what is left? Computer the area of the triangle, add to area of smaller polygon. For convex polygon, any three consecutive vertices form an ear, so: WRITE: area(Polygon (v1 : v2 : v3 : vs)) = triArea v1 v2 v3 + area (Polygon (v1 : v3 : vs)) -- Left out v2 area(Polygon _) = 0 -- < 3 vertices must have 0 area. This is fine, but it is kind of awkward. We take apart and rebuild polgons at each step. All start with vertex v1. So if do this recursively: (Draw a few levels, show that we eventually get a triangle for each edge, with 1 as the vertex opposite the edge.) Book takes advantage of this: area (Polygon (v1:vs)) = polyArea vs where polyArea :: [Vertex] -> Float polyArea (v2:v3:vs') = triArea v1 v2 v3 + polyArea (v3:vs') polyArea _ = 0 Note polyArea is a locally-define function. Not visible outside of area. But that is not the main reason for not doing it separately. Note that the call to triArea uses v1. But v1 is not a parameter to polyArea. Where does it come from? Everything in the "where" construct can see all of area's parameters. So by including it inside the area definition we save passing v1 everywhere. polyArea gets a chain of vertices, and takes off successive edges, pairs them with v1. What about triArea? Heron's formula, known to the Greeks (but not many people these days): triArea :: Vertex -> Vertex -> Vertex -> Float triArea v1 v2 v3 = let a = distBetween v1 v2 b = distBetween v2 v3 c = distBetween v3 v1 s = 0.5*(a+b+c) in sqrt (s*(s-a)*(s-b)*(s-c)) distBetween :: Vertex -> Vertex -> Float distBetween (x1,y1) (x2,y2) = sqrt ((x1-x2)^2 + (y1-y2)^2) Go through idea of saving partial calculations - s to avoid recalculating, a, b, c to avoid long formulas. What would this look like as a where clause? Problem with this - only works for convex polygons. Actually for star-shaped from v1 (means that whole polyon interior if visible from v1 without crossing edges). How do more generally? Exercise 2.5. Use trapezoids. Show how it works, using horizontal line through min y value. See SA 2 for more details. -- Haskell I/O I/O is the Achille's heel of functional languages. A major strength of functional languages is the substitution rule - you can replace a name by its value anywhere, or go the other way. If you call a function ten times with the same parameters you get the same answer 10 times. There are no side effects. If you have to evaluate two values it doesn't matter what order you evaluate them (as long as you have to evaluate them both). Unfortunately, I/O doesn't fit this very well. If you call putStr "Hello, world" once it is very different from calling putStr "Hello, world" ten times. Each time you read a character from a file you want to get a different character. And when printing two lines the order DOES matter. Therefore I/O functions are a bit different from normal functions. Haskell creates a special value called an "action". When you go to evaluate an expression whose value is an action, instead of just evaluating and using the result (which is the usual result of evaluating an expression) Haskell performs the action, whatever it is. Expressions that evaluate to actions are called "commands". The type of an Input/Output action is IO . (Will later see that IO is only one type of possible action.) Some commands just have side effects (like printing or drawing) and return nothing. They are void functions, in Java terminology. In Haskell, they have type IO (). The type () is called the unit type and has one value, also written (). The simplest command whose type is IO () is "return ()". It does nothing. That is sometimes very useful. Other commands return values - e.g. getLine reads and returns a line from the input. Its type is IO String (or IO [Char] - String is a type synonym for [Char]). The getChar command reads and returns a character, so has type IO Char. If x is an Integer, then "return x" has type IO Integer. -- Lazy Evaluation Commands reveal something that has previously been somewhat hidden - Haskell uses lazy evalution. You have seen that in Java: if (x != 0 && 1/x > 5) y = 1/x else y = x When evaluating "x != 0 && 1/x > 5" Java does not evaluate both inequalities and && them. It evaluates the first, and only evaluates the second if it needs it to determine the value of the &&. Thus we avoid a divide-by-zero error. Haskell evaluates EVERYTHING lazily. That is why when we say let message = putStrLn "Hello, world" nothing happens. Haskell can make the assignment of the command to the name without performing the command. When you then type message the expression has to be evaluated, so that is when it prints. Note that you can say: let first = putStrLn "First message" let second = putStrLn "Second message" let msglist = [first, second] No actions have been performed yet. Then: first second head msglist head (tail msglist) Normally typing an expression to the interpreter evaluates that expression and prints the result. For actions, what happens is the action is performed. All of these result in messages being printed. But: msglist gives an error. To understand this error, we need to know more about what the interpreter does. Each time you type an expression in the interpreter the interpreter evaluates it, calls a function "print". This function first calls "show" to convert the expression to a string (think "toString" in Java), and uses putStrLn to print it out. msglist evaluates to a list of commands of type IO (). Haskell doesn't have a version of show that converts something of type IO () to a string. It makes sense perform a command, and its value is the action. That action may be to print something, or to draw something, or to do nothing. But it doesn't make sense to convert that action to a string! Thus we get an error message. So what can we do? sequence_ msglist does what we want. Note that book says: sequence_ :: [IO a] -> IO () What is "IO a"? Will see in Chap. 5 that it is IO of an arbitrary type. Hudak got ahead of himself. What if I don't want to make a list of actions, but want to perform them one after the other? The "do" keyword is like where or let, but it lets you do a sequence of commands with last being IO , and the whole collection is of same type as the last command. That is because the "value" of the "do" is the value of the last command. The book claims all statements in a "do" must be of type IO (), but this is not true. However, for our uses of the "do" construct we will almost always have the last line be IO (), at least until late in the course. So do first; second performs each command IN ORDER! Important. The let and where don't say things are performed in order. (Using ";" to separate because I can't use separate lines in the interpreter.) Also, do :type first to discover that first is indeed of type IO (), and :type do first; second to discover that the whole command is of type IO (). What about doing input? There is a getLine method whose type is IO String. (Show this by using :t, and then by typing "getLine" into the interpreter, typing a string, and seeing that it prints the string.) Note that "show" of a string returns the string in quotes, so the value of the getLine command that is printed out is a string surrounded by quotes. To see that the value of a "do" is the value of the last command in the do, consider: do getLine; getLine When you run this the value printed is the value of the last line typed. The other value is thrown away. You might think that you can read a line by saying let line = getLine but nothing happens, and line is of type IO String. If I now type "line" the interpreter waits for me to type input line, then prints it as a string (in quotes). So how can I save the value read? Use "<-" line <- getLine Echos the same, but now line has value of the input string. Note that line is type String, not IO String. The "<-" converts a value of type IO into a value of type and binds the value to the name on the left side of the the "<-". It is actually a bit more complicated, but this simplification works for now. Note: We will learn more about this "<-" when we study the "IO modad". It is really "syntactic sugar" for a more complicated Mondad operation called "bind" that we will learn about later. The interpreter converts it to this other expressions before using it. Because of this "line <- getLine" is not considered an expression, and it has no type. (Use :t to demonstrate this.) So can do: do msg <- getLine; putStrLn ("You entered: " ++ msg) You can only use <- within a do, and not as the last thing. Last thing must be IO . Thus: do msg1 <- getLine; msg2 <- getLine invalid, but do msg1 <- getLine; msg2 <- getLine; putStrLn (msg1 ++ msg2) works fine. An additional point - can't do normal "=" within a "do". But can use a special version of let to bind a value that will last until the end of the do. So the following is possible: do let x = 5 let y = 2*x print (x + y) In fact, the whole top level of the interpreter is basically one big do statement! So need to use "let" when assign values in the interpreter. Show: let x = 4 let y = 2*x print (x + y) Also, show without (). Error because parenthesizes as (print x) + y (function application binds tighter than operator).