Next: , Previous: Lexical analyzer, Up: Internals



12.3 Syntax parsing: GPC's Parser

The file parse.y contains the “bison” source code of GNU Pascal's parser. This stage of the compilation analyzes and checks the syntax of your Pascal program, and it generates an intermediate, language-independent code which is then passed to the GNU back-end.

The bison language essentially is a machine-readable form of the Backus-Naur Form, the symbolic notation of grammars of computer languages. “Syntax diagrams” are a graphical variant of the Backus-Naur Form.

For details about the “bison” language, see the Bison manual (see Top (bison)). A short overview how to pick up some information you might need for programming follows.

Suppose you have forgotten how a variable is declared in Pascal. After some searching in parse.y you have found the following:

     simple_decl_1:
         ...
       | p_var variable_declaration_list
           { [...] }
       ;
     
     variable_declaration_list:
         variable_declaration { }
       | variable_declaration_list variable_declaration
       ;

Translated into English, this means: “A declaration can (amoung other things like types and constants, omitted here) consist of the keyword (lexical token) var followed by a `variable declaration list'. A `variable declaration list' in turn consists of one or more `variable declarations'.” (The latter explanation requires that you understand the recursive nature of the definition of variable_declaration_list.)

Now we can go on and search for variable_declaration.

     variable_declaration:
         id_list_limited ':' type_denoter_with_attributes
           { [...] }
         absolute_or_value_specification optional_variable_directive_list ';'
           { [...] }
       ;

The [...] are placeholders for some C statements, the semantic actions which (for the most part) aren't important for understanding GPC's grammar.

From this you can look up that a variable declaration in GNU Pascal consists of an identifier list, followed by a colon, “type denoter with attributes”, an “absolute or value specification” and an “optional variable directive list”, terminated by a semicolon. Some of these parts are easy to understand, the others you can look up from parse.y. Remember that the reserved word var precedes all this.

Now you know how to get the exact grammar of the GNU Pascal language from the source.

The semantic actions, not shown above, are in some sense the most important part of the bison source, because they are responsible for the generation of the intermediate code of the GNU Pascal front-end, the so-called tree nodes (which are used to represent most things in the compiler). For instance, the C code in “type denoter” returns (assigns to $$) information about the type in a variable of type tree.

The “variable declaration” gets this and other information in the numbered arguments ($1 etc.) and passes it to some C functions declared in the other source files. Generally, those functions do the real work, while the main job of the C statements in the parser is to call them with the right arguments.

This, the parser, is the place where it becomes Pascal.