=====================[ Readings ]===================== Please read Chapter 4--6 of "Real World Haskell". -----------------[ Chapter highlights ]---------------- Chapter 4 introduces more elements of the Haskell style. Do the examples to continue getting used to the syntax; it also gives some idioms for interacting with files (without explaining them). In Chapter 5, http://book.realworldhaskell.org/read/writing-a-library-working-with-json-data.html pay attention to how defining a new type that represents JSON at the same time sets up the patterns for any functions working on it, by defining a data constructor for every possible JSON value (JString, JBool, Jnull, JObject, JArray). This is almost like the _grammar_ for JSON. (You may want to know that this basic JSON grammar belongs to the Context-Free class, which requires only a simple pushdown automaton to parse valid JSON inputs and reject invalid ones. We believe that this simple syntax makes JSON-based systems less prone to bugs than those with more complex data formats). Note that the JObject constructor encloses "a list of pairs, each of which contains a String and another JValue". This list can be empty, or as long as you like (as far as Haskell knows, it can even be infinite!). So you have a _model_ of your JSON data structures: how these objects are allowed to nest and/or repeat. Note the use of Maybe right from the start! (e.g., in getString) It is a staple of Haskell code, for reasons we discussed in class today (see below). Also, the chapter shows you what goes into a module. This will be handy when we read standard modules like Data.List, GHC.List, GHC.Types, Data.Maybe, etc. -------- Chapter 6 http://book.realworldhaskell.org/read/using-typeclasses.html explains how functions like "show" can be called for all types, and +.-,*,/ can be called for all numeric types. Without the *typeclasses* explained in Ch. 6, we'd have to have not just separate implementations of these functions for different classes---we have to do it anyway, since printing a list is not like printing a number, and adding floating point numbers is not the same as adding 4-byte integers, nor like adding bignum integers---but also different _names_ for each class! This might come as a surprise. Why can't we just define a single function like "show" or "+", and then keep adding rules to it for each new type we define, matching on each data constructor of each type to do the thing appropriate for that type? After all, isn't this what pattern-matching is for? It'd be like Ruby mix-ins, which can add any method to any class at any time. But then, _what argument type would this function have?_ Inferring something about its arguments would be a nightmare---with new cases being added (to what type?), some things that type-checked before would no longer type-check, and vice versa. If we added a more general pattern previously, a more restrictive one added later would never be reached when matching; and so on. Type inference on this function would be useless, because who knows what operations its arguments would support, and when. Thus we define *typeclasses* up-front, as collections of the functions/operations they support, explicitly typed, and then allow defining specialized *instances* of these functions for each type. In type inference, we use the typeclasses as constraints on the context: "Show a =>" meaning "a must support show, i.e., for a, show must be defined as a valid function matching a as an argument", "Num a =>" meaning that a must support +, -, etc. Our simple in-class example of defining a "Point a" class that defines an instance of the Show typeclass for itself: making-a-point-type.txt ==============[ Theory: type inference via Unification ]============== The way that Haskell defines its types (i.e., both builds up its own internal types like lists and different integers, and allows the programmer to construct new types with _data_, alias them with _type_, and pattern-match on them) is not accidental. It is founded on a powerful mathematical result that proves this type system optimal with respect to the algorithm of how Haskell does its type inference. It is one of these things about Haskell that were discovered rather than invented (quoting Philip Wadler yet again). Type inference in Haskell is done via the algorithm known as Unification. This algorithm accumulates and applies type variable substitutions until no more substitutions are possible for any expression in the program. Then the inference is done and the algorithm stops, returning the inferred types (i.e., constraints on all variables in the program). This algorithm is what gets us the types on function definitions in Haskell---and the type errors, when it fails to unify the types. This algorithm is fairly simple, but on particular type systems---including Haskell's---its output is *the most general type expression* that satisfies the requirements of the program's functions/operations to their arguments. It is called a "principal type". This means that *any* type for which these requirements hold can be obtained from this output by substituting some type variables with type expressions. (This is called "specialization": for example, [Int] is a specialization of [a], obtained by substituting Int for a, (a,a) is a specialization of (a,b), (Int, Int) is a specialization of (a,a), and so on.) Note that this type does not depend on the order in which various substitutions are done! This is an important mathematical property that tells us the result is well-defined, i.e., all who apply the unification process will get the same answer, no matter in what order they unify expressions. Haskell, Ocaml, ML, and other similar languages use the Hindley-Milner type system that is proven to behave as above under unification. This is the reason this type system is considered one of the most significant achievements in PL theory, and this is why we use types like Maybe and Either, to express the "sums" and "unions" of types ("this type can have either this kind of value or a special value Nothing, or either this or that kind of value) instead of trying some heuristics. Such heuristics might work for some cases, but H.-M. and Unification provably work for _all_ possible cases. A nice overview of the Unification and H.-M. can be found in https://cs.brown.edu/courses/cs173/2002/Lectures/2002-11-13.pdf Step through the algorithm to get a feel for it! ==============[ Implementation of Num ]====================== What makes + and - work for all numeric types? Let's look up there these are defined: Prelude Data.Maybe Data.List> :i (+) class Num a where (+) :: a -> a -> a ... -- Defined in ‘GHC.Num’ infixl 6 + So https://hackage.haskell.org/package/base-4.6.0.1/docs/src/GHC-Num.html defines class Num: class Num a where (+), (-), (*) :: a -> a -> a -- | Unary negation. ... And here's where the special case of fixed-sized Int is handled: instance Num Int where I# x + I# y = I# (x +# y) I# x - I# y = I# (x -# y) negate (I# x) = I# (negateInt# x) I# x * I# y = I# (x *# y) abs n = if n `geInt` 0 then n else negate n signum n | n `ltInt` 0 = negate 1 | n `eqInt` 0 = 0 | otherwise = 1 {-# INLINE fromInteger #-} -- Just to be sure! fromInteger i = I# (integerToInt i) The # at the end of the type is a convention, to signal that this type is "primitive", built-in. These identifies are allowed by -XMagicHash option when running GHC. There is no other meaning to them than convention that they refer to primitive types. No scope or semantics are changed, just the permissions on name syntax. But the pattern-matching is standard---objects of type Int are constructed with the data constructor I#, and only with it; when this pattern matches, a special plus +# is then called. Same for -# and *#, and also for comparisons. The "bignum" Integer is here too, and it (as we'd fully expect) uses different functions (defined elsewhere---locate them!) instance Num Integer where (+) = plusInteger (-) = minusInteger (*) = timesInteger negate = negateInteger fromInteger x = x abs = absInteger signum = signumInteger The declarations for Int and I# , D#, F#, W# are in GHC.Types: https://downloads.haskell.org/~ghc/7.4.1/docs/html/libraries/ghc-prim-0.2.0.0/src/GHC-Types.html import GHC.Prim infixr 5 : data [] a = [] | a : [a] data Bool = False | True data Char = C# Char# data Int = I# Int# data Float = F# Float# data Double = D# Double# ... Finally, GHC.Prim defines Int#, +# and suchlike: https://downloads.haskell.org/~ghc/7.2.2/docs/html/libraries/ghc-prim-0.2.0.0/src/GHC-Prim.html data Int# (+#) :: Int# -> Int# -> Int# (+#) = let x = x in x (-#) :: Int# -> Int# -> Int# (-#) = let x = x in x This is madness, with let x = x in x in *all* definitions---or is it? Not really. These expressions are just for the typechecking, and they produce the "bottom" type that matches all others. This is the type of a recursive expression that, if evaluated, loops forever---and therefore its type cannot be determined. So it's taken to match _every_ type variable. This way, errors don't force us to create a separate type for them, and things like "head []" that result in exceptions do not change head's normal type signature, [a] -> a . The actual functions +#, -#, etc. all have compiled bodies that will be pasted by the compiler into the actual code. The following are explanations from Stackoverflow: -------------------- being Stackoverflow paste --------------------- http://stackoverflow.com/questions/21767336/questionable-code-in-ghc-prim Incidentally, let x = x in x is equivalent to undefined, and is valid for any data type; x isn't constrained by being defined in any way other than self-referentially, so can be any type. These functions are "primitive" meaning operations so basic they're not defined in Haskell code, and quite possibly just translated directly to machine instructions. undefined is defined in the prelude, which imports the primitives, not the other way round. Actually, undefined throws an error. undefined = error "Prelude.undefined" and error s = throw (ErrorCall s), whereas let x = x in x is just a loop. https://hackage.haskell.org/package/base-4.8.2.0/docs/src/GHC.Err.html: undefined :: a undefined = error "Prelude.undefined" -------- http://stackoverflow.com/questions/15893524/what-is-the-meaning-of-let-x-x-in-x-and-data-float-in-ghc-prim-in-haskell It's magic :) These are the "primitive operators and operations". They are hardwired into the compiler, hence there are no data constructors for primitives and all functions are bottom since they are necessarily not expressable in pure haskell. (Bottom represents a "hole" in a haskell program, an infinite loop or undefined are examples of bottom) To put it another way: These data declarations/functions are to provide access to the raw compiler internals. GHC.Prim exists to export these primitives, it doesn't actually implement them or anything (eg its code isn't actually useful). All of that is done in the compiler. It's meant for code that needs to be extremely optimized. If you think you might need it, some useful reading about the primitives in GHC -------- A brief expansion of jozefg's answer [above]: Primops are precisely those operations that are supplied by the runtime because they can't be defined within the language (or shouldn't be, for reasons of efficiency, say). The true purpose of GHC.Prim is not to define anything, but merely to export some operations so that Haddock can document their existence. The construction let x = x in x is used at this point in GHC's codebase because the value undefined has not yet been, um, "defined". (That waits until the Prelude.) But notice that the circular let construction, just like undefined, is both syntactically correct and can have any type. That is, it's an infinite loop with the semantics of ⊥, just as undefined is. ... and an aside [read more at the link!] -------------------- end Stackoverflow paste --------------------- ==============[ The great usefulness of Maybe ]============== Reporting that a function or a pipeline of functions has _not_ been able to come up with a successful result is crucial. Consider the case of the IP geolocation service whose API failed to define the type of such a result, and used "special" coordinates instead: http://fusion.net/story/287592/internet-mapping-glitch-kansas-farm/ Misled by its API, its clients (including Internet giants and federal agencies!) assumed those "special" coordinates were real, with embarrassing results. Haskell allows the programmer to "join" one or more special values to the main result type of a function or expression (or even a pipeline of functions), and to announce it in the type signature. Moreover, it auto-generates the code that handles these values automatically, passing them harmlessly through the pipeline, no matter at which stage the failure has occurred. Compare this with how you'd define these functions in C or Ruby. In C, you will likely use a tagged union of structs, with one common tag field in each struct of the union; the tag would be an enum of constants for each case of different contents. Whenever handling a value, you'd look up this ptr->tag and interpret the rest of the struct differently based on it, in a switch statement, case by case, recasting the ptr to the proper type. In Ruby, you would explicitly check what .class returns, and act accordingly. But in each case you will need to write all the checks explicitly, in your mainline code. See maybe-and-data-list-log.txt for the in-class example.