                ZEBU -- AN LALR(1) PARSER SYSTEM IN SCHEME
                                Version .9

                Copyright (C) 1989, by William M. Wells III
                            All Rights Reserved
        Permission is granted for unrestricted non-commercial use.





     INTRODUCTION

     ZEBU is a lalr(1) parser generator package.  It facilitates the
     automatic construction of syntax driven processors within the
     SCHEME [2] programming environment.  Zebu bears resemblance to
     the UNIX program YACC [1].

     Zebu reads grammar files that contain the description of a
     language as well as the actions to be taken as the language is
     parsed.  The language is described by production rules, while the
     actions are described by associated lambda expressions.

     Zebu generates parse table files which describe an lr parsing
     machine that processes the specified language.

     Zebu has a parser driver component which processes the specified
     language using the parse table file.

     A good discussion of lalr(1) parser generators and syntax direct-
     ed translation appears in PRINCIPLES OF COMPILER DESIGN [3].




     I.  Sample Session

     Zebu is best introduced by way of example.  The following is a
     transcript of an interaction with Zebu.

          ;;; Zebu is used to generate the parse table file "ex1.tab"
          ;;; from grammar file "ex1.grm"

          1 ]=> (compile-slr-grammar "ex1.grm" "ex1.tab")
          reading grammar from ex1.grm, start symbols is: E
          7 productions,  11 symbols
          Dumping parse tables to ex1.tab
          ;No value

          ;;; The resulting parse table file is loaded into Zebu's
          ;;; parser driver:
          1 ]=> (load-parse-tables "ex1.tab")
          ;No value

          ;;; Zebu's parser driver is used to parse an expression
          ;;; from the language described in grammar file "ex1.grm"

          1 ]=> (list-parser '(ned "+" jed))
          ;Value: (EXPRESSION (EXPRESSION (TERM (FACTOR NED)))




                                  -- 1 --






            "+" (TERM (FACTOR JED)))

          1 ]=> (list-parser '(fred "+" ned "*" jed))
          ;Value: (EXPRESSION (EXPRESSION (TERM (FACTOR FRED)))
            "+" (TERM (TERM (FACTOR NED)) "*" (FACTOR JED)))




     II. About Grammar Files

     This is the grammar file for the example above.


          (E E "+" T)
          (LAMBDA (EE PLUS TT) (LIST 'expression EE PLUS TT))

          (E T)
          (LAMBDA (TT) (LIST 'expression TT))

          (T T "*" F)
          (LAMBDA (TT TIMES F) (LIST 'term TT TIMES F))

          (T F)
          (LAMBDA (F) (LIST 'term F))

          (F "(" E ")")
          (LAMBDA (LP EE RP) (LIST 'factor LP EE RP))

          (F IDENTIFIER)
          (LAMBDA (ID) (list 'factor id))



     The file contains alternating encodings of grammar productions
     and associated actions to be performed at parsing time.

     The productions are encoded in scheme lists.  The first element
     of the list is the left hand side of the production; the rest of
     the list is the right hand side of the production.  Thus a pro-
     duction which normally might be written as:


          E -> E + T

     is encoded as:

          (E E "+" T)  .

     Grammar symbols may be written either as scheme symbols or scheme
     strings.  For Zebu's parsers, tokens are classified according to
     their relation to the grammar symbols in the sense of (equal?) .

     Zebu assumes that the start symbol of the grammar appears on the
     left hand side of the first production.




                                  -- 2 --






     The associated lambdas are applied during parsing.  As the lr
     parser processes the input text it performs "reductions" using
     the productions of the grammar.  When a reduction is performed,
     the action (lambda) associated with the production is applied.

     The arguments to the lambdas arise in two ways.  The arguments to
     the lambda are in correspondence with the right hand side of the
     production.  If an argument corresponds to a terminal grammar
     symbol, then the parser driver supplies the instantiation of the
     grammar symbol as seen by the tokenizer.  If an argument
     corresponds to a non-terminal symbol, then the driver supplies
     the result of the application of the lambda that is associated
     with the production from which the non-terminal was reduced.
     This is not as complicated as it sounds.  The value returned by
     the parser is the value given by the application of the lambda
     associated with the production which is used to do the final
     reduction of the input text to the start symbol.

     The lambdas in the above example are arranged to construct a
     parse tree corresponding to the parse of the input text.

     The lambdas may use side effects, if desired.  The sequencing of
     the application of the lambdas is governed by the ordering of the
     reduce operations of the lr parser driver.

     The lambdas are used only at parsing time, not table generation
     time.  If the Zebu parser driver isn't going to be used, tables
     may be built without supplying meaningful lambdas.  As far as
     parse table construction is concerned, the lambdas may be re-
     placed with arbitrary scheme expressions as place holders (such
     as #f).

     The format of Zebu's parse table files is described in the source
     file dump.scm.





     III.  LALR(1) and SLR(1)

     Zebu has two parser generating algorithms. (COMPILE-SLR-GRAMMAR
     ...) requires an slr(1) grammar, while (COMPILE-LALR1-GRAMMAR
     ...) requires a lalr(1) grammar.

     Although all slr(1) languages are lalr(1), the reverse is not
     true. (compile-slr-grammar ...) is faster than (compile-lalr1-
     grammar ...) and so may be preferred if the language at hand is
     slr(1).

     The example grammar file "ex6-2.grm" is an example of a grammar
     which is lalr(1) but not slr(1).







                                  -- 3 --








     IV.  Using Ambiguous Grammars

     A simple feature allows the Zebu system to be used with some am-
     biguous grammars.

     If the variable ALLOW-CONFLICTS is true, then when conflicts ar-
     ise at table construction time, ths system simply prefers the ac-
     tions already in the tables.  This feature works for the "dan-
     gling else" problem.

     Here is an example of building parse tables for a "dangling else"
     grammar:

          1 ]=> (set! allow-conflicts #t)
          ;Value: ()
          1 ]=> (compile-slr-grammar "dangelse.grm" "dangelse.tab")
          reading grammar from dangelse.grm, start symbols is: S
          4 productions,  7 symbols

          ACTION CONFLICT!!! -- state: 6  old entry: (5 s 2)
          new entry: (5 r 2)
          WARNING: continuing to build tables despite conflicts...
          Will prefer old entry: (5 s 2)
          Dumping parse tables to dangelse.tab
          ;No value

          1 ]=> (load-parse-tables "dangelse.tab")

          ;Value: (("+" . 4) ("*" . 6) ("(" . 8) (")" . 9) (IDENTIFIER . 10))

          1 ]=> (list-parser '(i i a e a))

          ;;; Note that the tables "do the right thing" with the dangling
          ;;; else construct. (The else is associated with the nearest
          ;;; un-elsed if).

          ;Value: (I (I (A) E (A)))

     The reported action conflict is a "shift-reduce" conflict that
     has been resolved in favor of shifting.

     When conflicts arise, it may be useful to use (PRINT-COLLECTION
     #F) or (PRINT-COLLECTION #T) which dump the lr0 item sets (which
     correspond to the states of the parser) in order to understand
     the problem the parser is facing.  (See the source file
     tables.scm for details.)

     Obtaining useful parse tables from ambiguous grammars requires
     care.  PRINCIPLES OF COMPILER DESIGN [2] has a good discussion
     about using ambiguous grammars.







                                  -- 4 --







     V.  The Parser Driver and Tokenizers

     Zebu has two components: the parser generator (the bulk of the
     code) and a lr parser driver (contained in the file driver.scm).
     The parser generator is needed only when parse table files are
     being generated.

     The parser driver is quite simple.  It contains a lr parsing en-
     gine (LR-PARSE) which may be used with arbitrary user supplied
     tokenizers.  (LR-PARSE) is configured using (LOAD-PARSE-TABLES).

     Two parsers are supplied which use (LR-PARSE).  (LIST-PARSER)
     will parse a token stream appearing as a scheme list.  (READ-
     PARSER) will parse a token stream gotten via the scheme function
     (READ).

     For some applications, the scheme function (READ) may not be a
     suitable tokenizer.  In these cases, a custom tokenizer should be
     constructed for use with (LR-PARSE).

     The two parsers (LIST-PARSER) and (READ-PARSER) have special case
     behavior on the terminal symbols NUMBER and IDENTIFIER.  If the
     terminal symbol NUMBER is present in the grammar, then the token
     classifier will categorize objects that satisfy the scheme func-
     tion (NUMBER?) as instances of the terminal symbol NUMBER. Other
     objects which match terminal symbols in the grammar will be
     categorized as such.  Otherwise objects will be categorized as
     instances of the terminal symbol IDENTIFIER, if it is present in
     the grammar.



     APPENDIX 1. Loading Zebu

     To load the Zebu system, first load the file Zebu.scm from the
     directory where Zebu is stored:

          (load "...  /zebu.scm")


     then use the function

          (load-zebu)

     to load Zebu.



     APPENDIX 2. Function Summary

          ;;; Load the zebu bootstrap file:
          (load "...  /zebu.scm")

          ;;; Load the Zebu system.




                                  -- 5 --






          (load-zebu)

          ;;; Compile an slr(1) grammar:
          (compile-slr-grammar <grammar file> <parse table file>)

          ;;; Compile an lalr(1) grammar:
          (compile-lalr1-grammar <grammar file> <parse table file>)

          ;;; Load a parse table file into Zebu's parsing engine:
          (load-parse-tables <parse-table-file>)

          ;;; Parse some tokens from a list:
          (list-parser '< a list of tokens ... >)

          ;;; Parse some tokens from the scheme function (read)
          ;;; (this will detect end of tokens at end of file or upon reading
          ;;; the scheme symbol :eof
          (read-parser)



     APPENDIX 3. Interactive Calculator Example

     Here is a sample session that demonstrates an interactive calcu-
     lator implemented with Zebu.

          ;;; Load up the parse table file describing the calculator.
          1 ]=> (load-parse-tables "calc.tab")
          ;Value: ((I . 4) (E . 5) (A . 6))

          ;;; Load the file containing the rest of the calculator.
          1 ]=> (load "calc.scm")
          ;Value: CALC

          ;;; Do some calc-ulating...
          1 ]=> (calc)
          ? 1 + 2 * 3 :
          7
          ? 5 -> jed :
          5
          ? jed :
          5
          ? 1 + 3 -> fred -> ned :
          4
          ? ned * jed :
          20
          ? :eof
          Bye now.
          ;No value


     Here is the content of calc.scm:







                                  -- 6 --






          (define bindings '())

          (define (calc)
            (display "? ")
            (read-parser))



     And here is the grammar file describing the calculator:

          ;;; This grammar and the associated lambdas define
          ;;; a simple interactive calculator.


          ;;; A session is a sequence of transactions with the user.
          ;;; When we reduce by this production, we're all done.
          (session transaction-sequence)
          (lambda (ignore) (display "Bye now."))

          ;;; A transaction-sequence is a sequence of transactions separated
          ;;; by colons
          (transaction-sequence transaction-sequence transaction :)
          (lambda (a b c) 'dont-care)

          ;;; There might be no transactions.
          (transaction-sequence)
          (lambda () 'not-used)

          ;;; When the parser carries out the reduction by this
          ;;; production, it causes the value of the expression
          ;;; to be printed, and a new prompt to be shown.

          (transaction expression)
          (lambda (value) (display value) (newline) (display "? "))



          ;;; The assignment operator.  Store the built up value of
          ;;; the expression under the identifier.  The id argument
          ;;; to the lambda is supplied with the symbol seen in the
          ;;; token stream.  The lambda arranges to store away the
          ;;; expression value under the symbol (in an alist).
          (expression expression -> identifier)
          (lambda (exp arrow id)
            (set! bindings (cons `(,id . ,exp) bindings))
            exp)


          ;;; Implement addition:  When reduction occurs by this
          ;;; production, we perform addition on the previously
          ;;; built up sub expression values.
          (expression expression + term)
          (lambda (exp plus trm) (+ exp trm))

          ;;; Similar to addition.




                                  -- 7 --






          (expression expression - term)
          (lambda (exp minus trm) (- exp trm))

          ;;; Arrange for typical distributive properties...
          (expression term)
          (lambda (x) x)

          ;;; Implement multiplication.  Like addition.
          (term term * factor)
          (lambda (trm mult fac) (* trm fac))

          ;;; Division.
          (term term / factor)
          (lambda (trm div fac) (/ trm fac))

          (term factor)
          (lambda (x) x)


          ;;; Do paren style grouping using < and > .
          (factor < expression > )
          (lambda (open-bracket exp close-bracket)
            exp)

          ;;; The terminal symbol number.  When the parser driver
          ;;; applies this lambda it supplies the value of the number
          ;;; as seen in the token stream.
          (factor number)
          (lambda (x) x)

          ;;; The terminal symbol "identifier".  When the parser driver
          ;;; applies this lambda, it supplies the symbol as seen in the
          ;;; token stream.  The value associated with this name is looked
          ;;; up by this lambda.
          (factor identifier)
          (lambda (id)
            (cdr (assq id bindings)))



     REFERENCES

     [1] S.C. Johnson. YACC -- Yet Another Compiler Compiler.  Comput-
     ing Science Technical Report 32, AT&T Bell Laboratories, Murray
     Hill, NJ, 1975

     [2] J. Rees and W. Clinger (eds). Revised^3 Report on the Algo-
     rithmic Language Scheme.  MIT AI Laboratory AI Memo 848a, 1986

     [3] A.V. Aho and J.D. Ullman. PRINCIPLES OF COMPILER DESIGN,
     Addison Wesley 1979








                                  -- 8 --


