===========[ Tail-call Optimization and Tail Recursion ]=========== Recursion is great to write, as it follows the natural structure of the data such such lists or trees, and thus minimizes the amount of code and helps eliminate bugs in "utility" code like for-loop counter handling by eliminating the need for that code---but it tends to cost. There is one form of recursion, though, where its primary cost---creating stack frames and traversing them backwards---can be optimized away. It is called tail-call recursion. In tail-call recursion, the last operation of a recursive function is the recursive call to itself, without any operations left to perform or any values left to look up for these operations in the current stack frame. In this case, climbing back up the stack has no point, and recursion ends with the last recursive call. This means that the stack need not keep any local information in the function's frames, nor does it need to keep the return addresses except the very first one from where the function was called. This, in turn, means, that the function's call frames need not be stacked, but can instead simply reuse the same frame, in which the return address is saved from the first call and copied onward. A properly optimized tail-recursive function would never run out of stack space. This is the nature of the Tail Call Optimization (TCO). Where a non-TCO program runs out of stack, a TCO-optimized program will succeed, with the help of the TCO-performing compiler or interpreter. ---------------[ A tail-recursive factorial ]--------------- Our example in class was the factorial in Steel Bank Common Lisp (SBCL). Consider CL-USER> (defun fact (x) (cond ((< x 2) 1) (t (* x (fact (- x 1)))))) This function fails for a sufficiently large argument, as it is not tail-recursive. After its call to self returns, there's still work to be done: multiplying by the saved value of x. This needs both space to store x, and the space to store the position into the pending computation (multiplication by x) to be returned to. CL-USER> (fact 100000) Control stack guard page temporarily disabled: proceed with caution ; Evaluation aborted on #. It's actually quite amazing that it succeeds for (fact 10000) ! But, the trick of using the accumulator (see OnLisp Fig. 4.3) allows us to rewrite this function recursively, by adding one argument, the accumulator, which saves all the state that we want to pass forward. Then backing up through the frames becomes unnecessary, and the SBCL compiler can use its TCO optimization: CL-USER> (defun fact (x res) (cond ((< x 2) res) (t (fact (- x 1) (* res x))))) WARNING: redefining COMMON-LISP-USER::FACT in DEFUN Then CL-USER> (fact 100000 1) succeeds (but slowed down my Emacs to a crawl---indeed, the resulting factorial when written as a decimal number is over 480K long, all in one line!) Scheme mandates that all compliant implementations of it MUST be tail-recursive. Common Lisps differ in this: read http://0branch.com/notes/tco-cl.html for SBCL and GCL in particular. Python repudiates TCO in principle, as complicating debugging by collapsing the stack traces (of course, TCO is exactly about eliminating unneeded stack frames!) Ruby takes a middle-ground position: it doesn't mandate TCO, but allows and optionally implements it. ---------------[ TCO in Ruby, by request to internals ]--------------- By default, Ruby has no TCO (but Ruby2.3 has an impressively deep stack, which suffices at least till 10000) $ cat fact.rb def fact n, acc = 1 if n.zero? acc else fact n - 1, acc * n end end puts fact 1000 $ ruby2.0 fact.rb 40238726007709377354370243392300398571937486421071463254379991042993851239862902059204420848696940480047998861019719605863166687299480855890132382966994459099742450408707375991882362772718873251977950595099527612087497546249704360141827809464649629105639388743472000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 $ cat fact.rb def fact n, acc = 1 if n.zero? acc else fact n - 1, acc * n end end puts fact 10000 $ ruby2.0 fact.rb fact.rb:2: stack level too deep (SystemStackError) $ ruby2.3 fact.rb However, when we go to 1000000, even that fails: $ ruby2.3 fact.rb fact.rb:5:in `fact': stack level too deep (SystemStackError) from fact.rb:5:in `fact' from fact.rb:5:in `fact' from fact.rb:5:in `fact' from fact.rb:5:in `fact' from fact.rb:5:in `fact' from fact.rb:5:in `fact' from fact.rb:5:in `fact' from fact.rb:5:in `fact' ... 10067 levels... from fact.rb:5:in `fact' from fact.rb:5:in `fact' from fact.rb:5:in `fact' from fact.rb:9:in `
' But not if we ask Ruby to do TCO explicitly! $ cat tco.rb # http://stackoverflow.com/questions/824562/does-ruby-perform-tail-call-optimization # See also http://nithinbekal.com/posts/ruby-tco/ # and http://blog.tdg5.com/tail-call-optimization-in-ruby-deep-dive/ source = <<-SOURCE def fact n, acc = 1 if n.zero? acc else fact n - 1, acc * n end end fact 100000 SOURCE # This should print a very large number. If you don't set TCO option # to true, you should get the "stack too deep" error. i_seq = RubyVM::InstructionSequence.new source, nil, nil, nil, tailcall_optimization: true, trace_instruction: false # ^^^^^^^^^^^^^^^^^^^^^^^^^^ # Indeed, prints "#" for 2.0--2.2 # For 2.3, prints the number for 10,000 anyway, but for 100000 # behaves as above #puts i_seq.disasm begin value = i_seq.eval p value rescue SystemStackError => e p e end $ ruby2.3 tco.rb $ ruby2.3 tco.rb | wc 1 1 456575 The same works for 2.0: $ ruby2.0 tco.rb | wc 1 1 456575 Read the above links for the technical details of TCO implementation; note the differences between SBCL and GCL in the Common Lisp analysis of TCO.