=====================[ Readings ]=====================

Please read Chapter 4--6 of "Real World Haskell".

-----------------[ Chapter highlights ]----------------

Chapter 4 introduces more elements of the Haskell style.
Do the examples to continue getting used to the syntax;
it also gives some idioms for interacting with files
(without explaining them).

In Chapter 5, http://book.realworldhaskell.org/read/writing-a-library-working-with-json-data.html
pay attention to how defining a new type that represents JSON
at the same time sets up the patterns for any functions working
on it, by defining a data constructor for every possible JSON 
value (JString, JBool, Jnull, JObject, JArray). This is almost
like the _grammar_ for JSON. 

(You may want to know that this basic JSON grammar belongs to
 the Context-Free class, which requires only a simple pushdown
 automaton to parse valid JSON inputs and reject invalid ones.
 We believe that this simple syntax makes JSON-based systems less
 prone to bugs than those with more complex data formats). 
 
Note that the JObject constructor encloses "a list of pairs,
each of which contains a String and another JValue". This list
can be empty, or as long as you like (as far as Haskell knows,
it can even be infinite!). So you have a _model_ of your
JSON data structures: how these objects are allowed to nest
and/or repeat.

Note the use of Maybe right from the start! (e.g., in getString)
It is a staple of Haskell code, for reasons we discussed in class
today (see below).

Also, the chapter shows you what goes into a module. This
will be handy when we read standard modules like Data.List,
GHC.List, GHC.Types, Data.Maybe, etc.

--------

Chapter 6  http://book.realworldhaskell.org/read/using-typeclasses.html
explains how functions like "show" can be called
for all types, and +.-,*,/ can be called for all numeric
types. Without the *typeclasses* explained in Ch. 6, we'd
have to have not just separate implementations of these
functions for different classes---we have to do it anyway,
since printing a list is not like printing a number, and
adding floating point numbers is not the same as adding 
4-byte integers, nor like adding bignum integers---but
also different _names_ for each class! 

This might come as a surprise. Why can't we just define a single
function like "show" or "+", and then keep adding rules to it for each
new type we define, matching on each data constructor of each type to
do the thing appropriate for that type?  After all, isn't this what
pattern-matching is for? It'd be like Ruby mix-ins, which can add any
method to any class at any time.

But then, _what argument type would this function have?_ Inferring
something about its arguments would be a nightmare---with new cases
being added (to what type?), some things that type-checked before
would no longer type-check, and vice versa. If we added a more general
pattern previously, a more restrictive one added later would never
be reached when matching; and so on. Type inference on this function
would be useless, because who knows what operations its arguments
would support, and when. 

Thus we define *typeclasses* up-front, as collections of the
functions/operations they support, explicitly typed, and then allow
defining specialized *instances* of these functions for each type.  In
type inference, we use the typeclasses as constraints on the context:
"Show a =>" meaning "a must support show, i.e., for a, show must be
defined as a valid function matching a as an argument", "Num a =>"
meaning that a must support +, -, etc.

Our simple in-class example of defining a "Point a" class that defines
an instance of the Show typeclass for itself:

  making-a-point-type.txt

==============[ Theory: type inference via Unification ]==============

The way that Haskell defines its types (i.e., both builds up its
own internal types like lists and different integers, and allows
the programmer to construct new types with _data_, alias them
with _type_, and pattern-match on them) is not accidental. 

It is founded on a powerful mathematical result that proves this type
system optimal with respect to the algorithm of how Haskell does its
type inference. It is one of these things about Haskell that were
discovered rather than invented (quoting Philip Wadler yet again).

Type inference in Haskell is done via the algorithm known as
Unification. This algorithm accumulates and applies type variable
substitutions until no more substitutions are possible for any
expression in the program. Then the inference is done and the
algorithm stops, returning the inferred types (i.e., constraints on
all variables in the program).

This algorithm is what gets us the types on function definitions in
Haskell---and the type errors, when it fails to unify the types. This
algorithm is fairly simple, but on particular type systems---including
Haskell's---its output is *the most general type expression* that
satisfies the requirements of the program's functions/operations to
their arguments. It is called a "principal type".  

This means that *any* type for which these requirements hold can be
obtained from this output by substituting some type variables with
type expressions. (This is called "specialization": for example,
[Int] is a specialization of [a], obtained by substituting Int for a,
(a,a) is a specialization of (a,b), (Int, Int) is a specialization
of (a,a), and so on.)

Note that this type does not depend on the order in which various
substitutions are done!  This is an important mathematical property
that tells us the result is well-defined, i.e., all who apply the
unification process will get the same answer, no matter in what order
they unify expressions.

Haskell, Ocaml, ML, and other similar languages use the Hindley-Milner
type system that is proven to behave as above under unification.  This
is the reason this type system is considered one of the most
significant achievements in PL theory, and this is why we use types
like Maybe and Either, to express the "sums" and "unions" of types
("this type can have either this kind of value or a special value
Nothing, or either this or that kind of value) instead of trying some
heuristics. Such heuristics might work for some cases, but H.-M. 
and Unification provably work for _all_ possible cases.

A nice overview of the Unification and H.-M. can be found in
https://cs.brown.edu/courses/cs173/2002/Lectures/2002-11-13.pdf
Step through the algorithm to get a feel for it!

==============[ Implementation of Num ]======================

What makes + and -  work for all numeric types? Let's look up
there these are defined:

Prelude Data.Maybe Data.List> :i (+)
class Num a where
  (+) :: a -> a -> a
  ...
        -- Defined in ‘GHC.Num’
infixl 6 +

So
https://hackage.haskell.org/package/base-4.6.0.1/docs/src/GHC-Num.html
defines class Num:

class  Num a  where
    (+), (-), (*)       :: a -> a -> a
    -- | Unary negation.
...

And here's where the special case of fixed-sized Int is handled:

instance  Num Int  where
    I# x + I# y = I# (x +# y)
    I# x - I# y = I# (x -# y)
    negate (I# x) = I# (negateInt# x)
    I# x * I# y = I# (x *# y)
    abs n  = if n `geInt` 0 then n else negate n

    signum n | n `ltInt` 0 = negate 1
             | n `eqInt` 0 = 0
             | otherwise   = 1

    {-# INLINE fromInteger #-}   -- Just to be sure!
    fromInteger i = I# (integerToInt i)

The # at the end of the type is a convention, to signal that this type
is "primitive", built-in. These identifies are allowed by -XMagicHash
option when running GHC. There is no other meaning to them than
convention that they refer to primitive types. No scope or semantics
are changed, just the permissions on name syntax.

But the pattern-matching is standard---objects of type Int are
constructed with the data constructor I#, and only with it; when this
pattern matches, a special plus +# is then called. Same for -# and *#,
and also for comparisons.

The "bignum" Integer is here too, and it (as we'd fully expect)
uses different functions (defined elsewhere---locate them!)

instance  Num Integer  where
    (+) = plusInteger
    (-) = minusInteger
    (*) = timesInteger
    negate         = negateInteger
    fromInteger x  =  x

    abs = absInteger
    signum = signumInteger

The declarations for Int and I# , D#, F#, W# are in GHC.Types:
https://downloads.haskell.org/~ghc/7.4.1/docs/html/libraries/ghc-prim-0.2.0.0/src/GHC-Types.html

import GHC.Prim

infixr 5 :

data [] a = [] | a : [a]

data Bool = False | True

data Char = C# Char#
data Int = I# Int#
data Float = F# Float#
data Double = D# Double#
...

Finally, GHC.Prim defines Int#, +# and suchlike:
https://downloads.haskell.org/~ghc/7.2.2/docs/html/libraries/ghc-prim-0.2.0.0/src/GHC-Prim.html

data Int#

(+#) :: Int# -> Int# -> Int#
(+#) = let x = x in x

(-#) :: Int# -> Int# -> Int#
(-#) = let x = x in x

This is madness, with let x = x in x in *all* definitions---or is it?

Not really. These expressions are just for the typechecking, and they
produce the "bottom" type that matches all others. This is the type
of a recursive expression that, if evaluated, loops forever---and 
therefore its type cannot be determined. So it's taken to match _every_
type variable. This way, errors don't force us to create a separate
type for them, and things like "head []" that result in exceptions
do not change head's normal type signature, [a] -> a . 

The actual functions +#, -#, etc. all have compiled bodies that will
be pasted by the compiler into the actual code.


The following are explanations from Stackoverflow:

-------------------- being Stackoverflow paste ---------------------

http://stackoverflow.com/questions/21767336/questionable-code-in-ghc-prim

Incidentally, let x = x in x is equivalent to undefined, and is valid
for any data type; x isn't constrained by being defined in any way
other than self-referentially, so can be any type.

These functions are "primitive" meaning operations so basic they're
not defined in Haskell code, and quite possibly just translated
directly to machine instructions.

undefined is defined in the prelude, which imports the primitives, not
the other way round. Actually, undefined throws an error. undefined =
error "Prelude.undefined" and error s = throw (ErrorCall s), whereas
let x = x in x is just a loop.

https://hackage.haskell.org/package/base-4.8.2.0/docs/src/GHC.Err.html:
undefined :: a
undefined =  error "Prelude.undefined"

--------

http://stackoverflow.com/questions/15893524/what-is-the-meaning-of-let-x-x-in-x-and-data-float-in-ghc-prim-in-haskell	
It's magic :)

These are the "primitive operators and operations". They are hardwired
into the compiler, hence there are no data constructors for primitives
and all functions are bottom since they are necessarily not
expressable in pure haskell.

(Bottom represents a "hole" in a haskell program, an infinite loop or
undefined are examples of bottom)

To put it another way:

These data declarations/functions are to provide access to the raw
compiler internals. GHC.Prim exists to export these primitives, it
doesn't actually implement them or anything (eg its code isn't
actually useful). All of that is done in the compiler.

It's meant for code that needs to be extremely optimized. If you think
you might need it, some useful reading about the primitives in GHC

--------

A brief expansion of jozefg's answer [above]:

Primops are precisely those operations that are supplied by the
runtime because they can't be defined within the language (or
shouldn't be, for reasons of efficiency, say). The true purpose of
GHC.Prim is not to define anything, but merely to export some
operations so that Haddock can document their existence.

The construction let x = x in x is used at this point in GHC's
codebase because the value undefined has not yet been, um,
"defined". (That waits until the Prelude.) But notice that the
circular let construction, just like undefined, is both syntactically
correct and can have any type. That is, it's an infinite loop with the
semantics of ⊥, just as undefined is.  

... and an aside [read more at the link!]

-------------------- end Stackoverflow paste ---------------------

==============[ The great usefulness of Maybe ]==============

Reporting that a function or a pipeline of functions has _not_
been able to come up with a successful result is crucial.
Consider the case of the IP geolocation service whose API
failed to define the type of such a result, and used "special"
coordinates instead:
http://fusion.net/story/287592/internet-mapping-glitch-kansas-farm/
Misled by its API, its clients (including Internet giants and 
federal agencies!) assumed those "special" coordinates were real,
with embarrassing results. 

Haskell allows the programmer to "join" one or more special values
to the main result type of a function or expression (or even a 
pipeline of functions), and to announce it in the type signature.
Moreover, it auto-generates the code that handles these values
automatically, passing them harmlessly through the pipeline,
no matter at which stage the failure has occurred.

Compare this with how you'd define these functions in C or Ruby.  In
C, you will likely use a tagged union of structs, with one common tag
field in each struct of the union; the tag would be an enum of
constants for each case of different contents. Whenever handling a
value, you'd look up this ptr->tag and interpret the rest of the
struct differently based on it, in a switch statement, case by case,
recasting the ptr to the proper type. In Ruby, you would explicitly
check what .class returns, and act accordingly. But in each case you
will need to write all the checks explicitly, in your mainline code.

See maybe-and-data-list-log.txt for the in-class example.