Notating Sort-Merge

It is easy to "speak" the sort-merge algorithm. Given a sequence, break it into two parts, sort each of them, and merge the results. Here, as best as can be rendered in HTML, is a subscript-style merge algorithm from a popular algorithms book.

MERGE(A, p, q, r)

  1. n1 = q - p + 1
  2. n2 = r - q
  3. create arrays L[1..n1+1] and R[1..n2+1]
  4. for i = 1 to n1
  5.     do L[i] = A[p+i-1]
  6. for j = 1 to n2
  7.     do R[j] = A[q+j]
  8. L[n1+1] = infinity
  9. R[n2+1] = infinity
  10. i = 1
  11. j = 1
  12. for k = p to r
  13.     do if L[i] <= R[j]
  14.         then A[k] = L[i]
  15.              i = i + 1
  16.         else A[k] = R[j]
  17.              j = j + 1

Here is a list-style sort-merge as "spoken" above in an ad hoc notation:

sort(a) =
  if length(a) <= 1
  then a
  else merge(sort(left(a)), sort(right(a)))

merge(b, c) =
  if length(b+c) <= 1
  then b+c
  elseif head(b+infinity) <= head(c+infinity)
  then head(b)+merge(tail(b), c)
  else merge(c, b)

All of the second presentation depends on you "understanding" some notation. First off, you need to know that "+" means list concatenate and that the right has to know what the left did.

left(a)+right(a) = a

Secondly you must think about the partially defined nondeterministic choice functions left and right. One does not want either to produce the empty string (why?). In fact, the subscript-style merge algorithm insisted on them having the same length (within one). What if they don't?

So, the question is left to you. Which notation is better? You might consider that the first is more precise and leads directly to a C implementation. But then do you understand the algorithm better from the "speaking" of it, the "blizzard" of subscripts, or the freewheeling list-oriented presentation? Or, do you want all three?

One thing we often value in a notation is what it prevents us from writing. Notationally, the subscript-style merge can place any value anywhere. The list-style merge only examines the head of the list. Thus there are lots of bad things that cannot happen, and therefore do not need to be understood.

Would you prefer an iterative merge instead of the recursive one? It takes a little more notational machinery.

merge(b, c) =
  d=empty;
  while length(b+c) >= 1
    if head(b+infinity) <= head(c+infinity)
    then
      d = d+head(b);
      b = tail(b);
    else
      d = d+head(c);
      c = tail(c);
  return d;

Conclusion

There may be a scientifically valid answer to the questions I raise here. I do not know of one. I find myself expressing preferences that are clearly related to my education and interests. Winning notations somehow generalize the personal -- they manage to occupy a Darwinian niche in the clashes of individuals and nationalities and disciplines. They are forever insecure, replacable by something luckier or perhaps better adapted to a new environment (such as HTML or UNICODE or PDF). Such is life.


Created: Tuesday, April 16, 2002
Last modified: May 22, 2002
email: McKeeman{at}Mathworks{dot}COM