========================[ Readings ]======================== Read Chapters 3 of Ruby Under the Microscope. Pay special attention to cut-ins that discuss the source code of Ruby implementation. Follow the book's discussion with the code open in your favorite editor. Your objective is to see how Ruby code gets compiled to bytecode from s-expressions and how the bytecode gets evaluated by the virtual machine, and to notice patterns of recursion, handling different scope, and creating closures of functions and blocks over lexical scopes. It also helps to note the tricks with which master programmers like Yukihiro Matsumoto organize their C code. You will notice plenty of macros that treat C pointers as objects or symbols, creating a de-facto type system and a macro language not unlike LISP. In effect, a project such as Ruby creates a new language on top of C---deliberately or unwittingly. In the latter case, as the joke goes, Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp. -- https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule In fact, many PL researchers chose to simply implement their ideas in Scheme :) Compare the style of Ruby implementation with that of the Emacs Elisp's virtual machine: https://github.com/emacs-mirror/emacs/blob/master/src/bytecode.c Read ahead: Chapters 4 and 5. -------[ Getting Ruby sources & setting up browsing ]------- Get the sources from https://github.com/ruby/ruby . I use the ctags and etags programs to browse code in Vim and Emacs respectively. Both editors let you jump around the tags and return to the places in the code you detoured from to look up a definition of a macro or a function. "ctags" is a separate package under MacPorts, etags is a part of the "emacs" package. The "man ctags" page's "HOW TO" section gives a brief intro for both editors. I use Vim and "view" a lot, usually as follows: "alias v=view +'syn on'", then "v -t function_var_or_macro_name" from command line to find a tag. From within Vim/view, pressing Ctrl+] on a name and Ctrl+t to get back work best for me, but there are many others: http://vim.wikia.com/wiki/Browsing_programs_with_tags With Emacs, I use M-. and M-* , occasionally C-u M-. (but I have to look it up as often as not). If you wonder why "go see a tag" is M-. and "go back" is "M-*", so do I :) At one point it occurred to me that a useful mnemonic might be to think of an archery arrow facing away from you, with . for the tip and * for the fletching. It works for me :) To index specific files, do "ctags *.h *.c" or some such. To index an entire project directory recursively, do "ctags -R ." in it. Use etags for Emacs :) Another good command-line tool is cscope. If you prefer using your own web browser, then OpenGrok + a local webserver are a good bet. If you find a free GUI that's nice and not heavy like Eclipse and does nice tab completion (unlike Visual Studio), let me know :) ----------[ Getting the .inc files generated ]------- Actual code for VM bytecode execution is only generated when you "make" ruby. In the Ruby's source directory, see README.md; I had to do "autoconf, then "./configure", then "make". The .inc files such as vm.inc get made (with Ruby! I had to switch my default old ruby 1.8 to a newer one for it to succeed) among the first. You need not wait till the end of the compilation if you only want to browse code, not build and install ruby2.4. =================[ Why not just eval the AST?]================= In principle, an interpreted language could work straight off of the parsed AST, evaluating it recursively from the top: call the evaluator function at the top node. If it represents a value, return it. If it represents some operation on the child nodes, call it recursively to evaluate the child nodes and replace them with resulting values, then perform the operation. Ruby 1.8 did just that, and so did early LISPs. This approach works, and evaluator functions for it come out naturally recursive & nice; however, such recursion can be quite heavy, because it essentially traverses the AST both from the top down and then upward. Thus, faster representations of code were eventually devised. Still, recursive evaluation of the AST is a fundamental programming pattern, you must master it---because the very same pattern is now used by compilers, as Chapter 3 of the Ruby book explains. It's easier to learn it in LISP, though (also, see the above point about Scheme long being the language of choice for PL research). Suggestion: Write such a recursive evaluator for arithmetic expressions in LISP, for S-expressions in LISP. For example: (arithm-eval '(+ 2 (* 2 3))) => 8 (arithm-eval 1) => 1 (arithm-eval '(- 5 3)) => 2 etc. It should be a very simple function with a few cases for actual arithmetic operations, and writing such a function should come simply to you. Remember, trees and recursion are made for each other; you almost never need to write a function that does something for the whole tree, you only need to handle a node and call the function on its immediate children if any---and then take some action on the return values if needed. Suggestion: write a (recursive) function that takes s-expressions ("sexps", for short) of arithmetic expressions as above, and prints the corresponding arithmetic expression in the ordinary notation. For example: (arithm-print '(+ 2 (* 2 3))) => 2 + 2 * 3 (arithm-print 1) => 1 (arithm-print '(- 5 3)) => 5 - 3 etc. Note that this function will be somewhat similar to _flatten_ from OnLisp's Fig. 4.3 if done efficiently---except flatten gets the symbols in wrong order. Note that "arithm-read" that would take strings of ordinary infix-syntax arithmetic operations and make a sexp out of them _with the proper order of operation precedence_ is a more complex problem than either of the above. Bison and similar tools solve it with variations of what's known as the shift-reduce algorithm, http://blog.reverberate.org/2013/07/ll-and-lr-parsing-demystified.html (or https://en.wikipedia.org/wiki/Shift-reduce_parser) Personally, I prefer the Reverse Polish notation; it's a bytecode ready for execution by a stack machine ( "3 2 * 2 +" , read and execute left to right; you can think of execution as pushing each number onto a stack, and, whenever you encounter an operation, consuming two top tokens from the stack and replacing them with the op's result). My first programmable computer was a clone of the Texas Instruments microcalculator that did just that; the top of the stack was what showed on the calculator's numeric display. It was glorious :) Suggestion: write a (recursive) evaluator for a list of tokens representing a Reverse Polish notation arithmetic expression, e.g.: (rev-polish-eval '(3 2 * 2 +)) => 8 The "accumulator" argument in this case will hold the current contents of the stack. =================[ So why bytecode and JITs? ]================= Compared with a CPU's linear fetch-decode-execute traversal of its instruction stream. If any part of the code executes in a tight loop---which is not infrequent in real programs---the up-and-down cost of evaluating the AST it paid on every iteration. For the economics facet of CS, this is too much---and this part happens to pay many PL researchers' salaries. Thus the solution is to push as much of the work needed to "interpret" the program---i.e., to translate it to faster, simpler operations---up front, and then, in the tight loops, to only run the results of this translation. It turns out that the gains of translation can be huge (and this is, perhaps, one of the most important lessons of PL practice to date). This is why the recursive patterns now mostly apply to _compilation_ of ASTs rather than their direct execution. The ASTs are compiled into simpler-to-execute stack machine bytecode, such as in Elisp, Ruby's YARV, Python's, etc. The compilation recursively visits all the AST's nodes as described in the Ch. 3 treatment of compile.c and iseq_compile_each() on p. 42. This is what the book means when it speaks of the tight relationship between the compiler and the VM. The only hard limits here are the CPU speed at its own elementary instructions, and---most importantly---the limits of knowledge about values that we have at the translation time vs the run time. For example, when we make functions the fly, like the lambdas returned by _made-adder_ and _make-multiplier_, we *cannot* know the state (e.g., the values of variables) that these functions would close over---that state will only be prepared/computed at runtime. Thus, a major challenge of PL has been to push as much program interpretation work as possible forward of the tight loops, to "front-load" it to the limit of knowledge that we cannot have until actual runtime. We compile as much of a program as possible to either bytecode or native code, and parametrize this code with what we don't know. This parametrization can take shape of referencing these unknown-till-runtime values in some table, stack frame, or an environment struct, where they reside at some known positions or offsets; it can also take the shape of references-by-name in a hash table (such as with LISP's global symbol tables for dynamic scoping). Whatever it is, it's point is separating what we certainly know at compile time from whatever clues the programmer provides from those things we cannot possibly know. The bytecodes of VMs such as Elisp's, Perl's, Python's, Ruby's, and Java's various implementations from the stack-based JVM to the Android's register-based Dalvik and the newer ART are really bets in this compile-vs-runtime separation and parametrization, taken on their designers' intuition. Other modern environments such Node.js actually de-emphasize the language clues in favor of just-in-time (JIT) compilation of whatever code seems to be in most need of it---e.g., it compiles the code or bytecode snippets that appear to be executed repeatedly, into the native binary code such as x86. With enough repetition, the cost of this compilation is amortized; also, its actual execution environment may feature more constants than variables, and may also exclude some execution paths, which makes for simpler native assembly. This approach bets on the runtime being able to better guess where a program's bottlenecks might be than the programmer; we'll see how this bet pans out. These JITs are getting very fast; the problem is debugging them or your code when something goes wrong. In the meantime, understanding the language's internal VM's bytecode remains important---and this is what we are trying for with LISP and Ruby. =================[ Tail-call optimization and Tail Recursion ]================= Recursion is great to write, as it follows the natural structure of the data such such lists or trees, and thus minimizes the amount of code and helps eliminate bugs in "utility" code like for-loop counter handling by eliminating the need for that code---but it tends to cost. There is one form of recursion, though, where its primary cost---creating stack frames and traversing them backwards---can be optimized away. It is called tail-call recursion. In tail-call recursion, the last operation of a recursive function is the recursive call to itself, without any operations left to perform or any values left to look up for these operations in the current stack frame. In this case, climbing back up the stack has no point, and recursion ends with the last recursive call. This means that the stack need not keep any local information in the function's frames, nor does it need to keep the return addresses except the very first one from where the function was called. This, in turn, means, that the function's call frames need not be stacked, but can instead simply reuse the same frame, in which the return address is saved from the first call and copied onward. A tail-recursive function can never run out of stack space. This is the nature of the Tail Call Optimization (TCO). Where a non-TCO program runs out of stack, a TCO-optimized program will succeed, with the help of the TCO-performing compiler or interpreter. ---------------[ A tail-recursive factorial ]--------------- Our example in class was the factorial in Steel Bank Common Lisp (SBCL). Consider CL-USER> (defun fact (x) (cond ((< x 2) 1) (t (* x (fact (- x 1)))))) This function fails for a sufficiently large argument, as it is not tail-recursive. After its call to self returns, there's still work to be done: multiplying by the saved value of x. This needs both space to store x, and the space to store the position into the pending computation (multiplication by x) to be returned to. CL-USER> (fact 100000) Control stack guard page temporarily disabled: proceed with caution ; Evaluation aborted on #. It's actually quite amazing that it succeeds for (fact 10000) ! But, the trick of using the accumulator (see OnLisp Fig. 4.3) allows us to rewrite this function recursively, by adding one argument, the accumulator, which saves all the state that we want to pass forward. Then backing up through the frames becomes unnecessary, and the SBCL compiler can use its TCO optimization: CL-USER> (defun fact (x res) (cond ((< x 2) res) (t (fact (- x 1) (* res x))))) WARNING: redefining COMMON-LISP-USER::FACT in DEFUN Then CL-USER> (fact 100000 1) succeeds (but slowed down my Emacs to a crawl---indeed, the resulting factorial when written as a decimal number is over 480K long, all in one line!) Scheme mandates that all compliant implementations of it MUST be tail-recursive. Common Lisps differ in this: read http://0branch.com/notes/tco-cl.html for SBCL and GCL in particular. Python repudiates TCO in principle, as complicating debugging by collapsing the stack traces (of course, TCO is exactly about eliminating unneeded stack frames!) Ruby takes a middle-ground position: it doesn't mandate TCO, but allows and optionally implements it. ---------------[ TCO in Ruby, by request to internals ]--------------- By default, Ruby has no TCO (but Ruby2.3 has an impressively deep stack, which suffices at least till 10000) $ cat fact.rb def fact n, acc = 1 if n.zero? acc else fact n - 1, acc * n end end puts fact 1000 $ ruby2.0 fact.rb 40238726007709377354370243392300398571937486421071463254379991042993851239862902059204420848696940480047998861019719605863166687299480855890132382966994459099742450408707375991882362772718873251977950595099527612087497546249704360141827809464649629105639388743472000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 $ cat fact.rb def fact n, acc = 1 if n.zero? acc else fact n - 1, acc * n end end puts fact 10000 $ ruby2.0 fact.rb fact.rb:2: stack level too deep (SystemStackError) $ ruby2.3 fact.rb However, when we go to 1000000, even that fails: $ ruby2.3 fact.rb fact.rb:5:in `fact': stack level too deep (SystemStackError) from fact.rb:5:in `fact' from fact.rb:5:in `fact' from fact.rb:5:in `fact' from fact.rb:5:in `fact' from fact.rb:5:in `fact' from fact.rb:5:in `fact' from fact.rb:5:in `fact' from fact.rb:5:in `fact' ... 10067 levels... from fact.rb:5:in `fact' from fact.rb:5:in `fact' from fact.rb:5:in `fact' from fact.rb:9:in `
' But not if we ask Ruby to do TCO explicitly! $ cat tco.rb # http://stackoverflow.com/questions/824562/does-ruby-perform-tail-call-optimization # See also http://nithinbekal.com/posts/ruby-tco/ # and http://blog.tdg5.com/tail-call-optimization-in-ruby-deep-dive/ source = <<-SOURCE def fact n, acc = 1 if n.zero? acc else fact n - 1, acc * n end end fact 100000 SOURCE # This should print a very large number. If you don't set TCO option # to true, you should get the "stack too deep" error. i_seq = RubyVM::InstructionSequence.new source, nil, nil, nil, tailcall_optimization: true, trace_instruction: false # ^^^^^^^^^^^^^^^^^^^^^^^^^^ # Indeed, prints "#" for 2.0--2.2 # For 2.3, prints the number for 10,000 anyway, but for 100000 # behaves as above #puts i_seq.disasm begin value = i_seq.eval p value rescue SystemStackError => e p e end $ ruby2.3 tco.rb $ ruby2.3 tco.rb | wc 1 1 456575 The same works for 2.0: $ ruby2.0 tco.rb | wc 1 1 456575 Read the above links for the technical details of TCO implementation; note the differences between SBCL and GCL in the Common Lisp analysis of TCO.