File Analysis.html    Author McKeeman    Copyright © 2007    index

Analysis of Programming Languages

`The question is,' said Alice
`whether you can make words mean so many different things.'
`The question is,' said Humpty Dumpty,
`which is to be master -- that's all.'
-- Lewis Carroll, Alice in Wonderland

Kinds of Analysis

When a text is written down there must be an agreement so that the readers understand the writers. The agreement includes the form and the meaning of what is written.

From the viewpoint of a compiler, a user program is a sequence of characters; to the programmer it is a much richer entity. The compiler must inspect the input text, deduce the content intended by the programmer, and synthesize a corresponding executable.

Programming language text has two levels of structure. The first level defines symbols analogous to the words and punctuation of natural language. The second level defines phrase structure, relating nearby sequences of symbols. The phrase structure of natural language contains concepts such as subject, object and predicate. The phase structure of an imperative programming language instead uses phrase names such program, statement and expression.

Once the shape of the program is determined, the compiler must find the meaning of the symbols and their relations. Information about meaning can be inherent in the symbol (such as '+'), or provided explicitly by the programmer, for example, in the form of declarations. In X, the programmer-intended meaning of each non-reserved symbol is inferred from its use.

When the compiler cannot analyze its input, it issues a diagnostic to assist the programmer in making the input acceptable. Locating the exact point of failure is part of the diagnostic process.

Front-end Components

  • Grammars. Grammars, in particular context-free grammars and some extensions thereto, are the basis of programming language description. Mathematical notation is defined and used; it is also summarized separately.
  • Lexing. Separating out the words and punctuation of computer programs.
  • Parsing. Discovering the phrase structure of computer programs.
  • Tree. Building the syntax tree or abstract syntax tree from the canonical parse.
  • Symbols. Associating attributes with names found in computer programs.