CS118 Programming Languages

A Simple Grammar Notation

Grammars are a major topic in this course. Here we are only concerned with a clean notation for writing down concrete grammars. At its simplest, a grammar is a set of substitution rules. There is a left hand side (naming the symbol being defined) and a right hand side (giving the definition).

Grammars for binary and hex integers, roman numerals and decimal integers serve as examples. Any contiguous sequence of non-blank characters is a symbol. A symbol that appears on the left margin is being defined (for instance, in the binary integer grammar referenced above, symbol binary-integer is defined). All other symbols stand for themselves (literals). The definitions, represented sequences of symbols (separated by whitespace), appear immediately below the defined symbol and are indented. An empty definition is allowed and is represented by a blank line. Empty lines at the end of a sequence of definitions are ignored. Referring again to the grammar for binary number, there are four definitions. The first two give the simplest cases. The other two extend the simple cases, one binary digit at a time.

Note the arbitrary difference in style between the grammars for binary and hex integers. One could instead have given

binary-integer
   binary-digit
   binary-integer binary-digit

binary-digit
   0
   1

One grammar is slightly shorter than the other -- there is no other significant difference.

Operationally, a grammar starts with some selected symbol. If there are one or more definitions for that symbol, the definition may be substituted for the selected symbol, possibly increasing the number of defined symbols. Given the resulting substituted sequence, one may repeat substitutions indefinitely, in any order, until there are no more symbols with definitions remaining in the sequence. In the case of binary numbers, one can easily see that

  1. The substitutions may continue indefinitely.
  2. Whenever there are no more substitutions to make (only literals remain), the result is, in fact, a binary integer
  3. There is a sequence of substitutions leading to every finite binary integer.

The set of defined symbols is called the non-terminal symbols.

Grammars can be used for many things. Computer languages usually have grammars. For C the grammar is an international standard. For Java 1 the grammar is a proprietary standard.

Regular Expressions

tbd


Created: Tuesday, August 17, 1999
Last modified: March 27, 2002
email: McKeeman{at}Mathworks{dot}COM