;\c	    Copyright (C) 1990 Pertti Kellomaki
;\c	 
;\c	 This file is part of Taurus, a parser generator producing Scheme
;\c	 
;\c	 Taurus is free software; you can redistribute it and/or modify
;\c	 it under the terms of the GNU General Public License as published by
;\c	 the Free Software Foundation; either version 1, or (at your option)
;\c	 any later version.
;\c	 
;\c	 Taurus is distributed in the hope that it will be useful,
;\c	 but WITHOUT ANY WARRANTY; without even the implied warranty of
;\c	 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;\c	 GNU General Public License for more details.
;\c	 
;\c	 You should have received a copy of the GNU General Public License
;\c	 along with Taurus; see the file COPYING.  If not, write to
;\c	 the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
;
;\node The Error Recovery Scheme, , , 
;\comment  Node-Name,  Next,  Previous,  Up
;\chapter{The Error Recovery Scheme}
;
;Taurus uses an error recovery scheme described in
;\cite{waite:compiler-constr}, pp.  311--317, but slightly modified to
;suit the EBNF form Taurus uses. The designated productions are
;replaced by synchronizing tokens, but conceptually the algorithm is
;the same.
;
;When a syntax error is encountered, a set of allowable tokens
;{}(the \emph{anchor set}) is constructed.  Tokens are then deleted from
;the input stream until an allowable token is found. After that,
;synchronizing tokens are inserted into the input stream so that the
;parser can continue parsing with the remaining input. The number of
;tokens deleted or inserted may be zero.
;
;The anchor set is constructed by using
;\code{call-with-current-continuation}. When a syntax error is
;encountered, the state of computation (\emph{current continuation}) is
;saved in a variable. The behavior of the parser is then changed so
;that the anchor set and synchronizing tokens are collected. When they
;have been collected, error recovery takes place and the saved
;continuation is invoked.  With languages without first class
;continuations this is not possible in a recursive descent parser, but
;instead the anchor set and synchronizing tokens must always be kept
;current. When saved continuations are used, the sets are only
;constructed when a syntax error is encountered.
;
;While the parser is collecting recovery information, semantic actions
;are disabled. No tokens are read from the input stream, but instead the
;routines that normally check the presence of tokens are used to collect
;the anchor set and the synchronizing tokens.
;
;\node Synchronizing Tokens And The Grammar, , , 
;\comment  Node-Name,  Next,  Previous,  Up
;\section{Synchronizing Tokens And The Grammar}
;
;The synchronizing tokens that the parser may insert into the input
;stream are used to drive the parser automaton towards terminating the parse.
;Consuming a synchronizing token should thus not lead to recursion.
;
;Synchronizing tokens are derived from the grammar. For a sequence, the
;synchronizing tokens are simply the tokens that are needed to drive the
;parser thru the sequence. For an iteration, they are either the tokens
;that are needed to drive the parser thru the iterand
;{}(\code{nonzero-iteration}) or none (\code{zero-iteration}).
;
;For an alternative expression, one of the branches must be
;selected.  There is no obvious reason why one of the branches should be
;preferred over the others, so the last one is arbitrarily selected. This is
;partly due to the way the anchor set is collected. The last branch of an
;alternative expression should thus be the ``simplest'' one.
;
;\node After Error Recovery, , , 
;\comment  Node-Name,  Next,  Previous,  Up
;\section{After Error Recovery}
;
;After error recovery has taken place, the input stream may contain
;synchronizing tokens that have no semantic meaning. They should thus not
;be used in semantic actions. This is accomplished by enabling semantic
;actions only after the generated tokens have been consumed.
;If the inserted tokens were only ``syntactic sugar'', semantic actions
;are not disabled.
;While parsing inserted tokens, all nonterminals return the semantic
;value that was specified for their \code{error} value. If none was
;specified, they return \code{undefined}. All inserted tokens have also
;\code{undefined} as their semantic value.
;
;\node Code Generation, , , 
;\comment  Node-Name,  Next,  Previous,  Up
;\chapter{Code Generation}
;
;This section describes how the grammar is transformed into an
;executable parser. \xref{A Sample Grammar}
;for an example of a grammar and \pxref{A Generated Parser} for the
;parser that was generated from the grammar. These should give a fairly
;good idea about what is going on.
;
;These procedures form the module
;
(module tcodegen)

;\node Overall Structure Of The Parser
;\comment  Node-Name,  Next,  Previous,  Up
;\section{Overall Structure Of The Parser}
;
;The parser consists of four parts:
;\begin{enumerate}
;\item
;local variables
;\item
;a fixed set of routines 
;\item
;routines generated from the grammar
;\item
;call to the starting nonterminal of the grammar plus some setup and
;cleanup code
;\end{enumerate}
;
;Local variables are used as follows:
;\begin{description}
;\item[*pushback-tokens*]
;holds  tokens that have been read from the input stream but pushed back
;\item[*anchor-set*]
;collects the anchor set during error recovery
;\item[*recovery-info*]
;records recovery information during error recovery
;\item[*continuation*]
;holds the continuation where parsing is to resume after error recovery
;\item[*lookahead*, *semantic-lookahead*]
;hold the lookahead token and its semantic value, respectively
;\item[*line-number*, *input-file*]
;hold the line number and file name of the lookahead
;\item[*mode*]
;tells the mode of operation of the parser automaton
;\item[*inserted-tokens*, *deleted-tokens*]
;hold the inserted and deleted tokens after error recovery, used for
;issuing error messages
;\item[*sugar*]
;holds the tokens that are purely syntactic sugar, used for issuing
;error messages
;\end{description}
;
;The fixed routines are the same for every parser. The names of all the
;routines start with the prefix \code{tg-}. \xref{Fixed Parts} for more
;information. 
;
;\code{generate-code} is the procedure that is called in order to
;produce executable code from \code{grammar}. 
;Note the check after the call to the starting nonterminal in the
;template of \code{parse} in \code{generate-code}. It is central to the
;way recovery information is collected. When the call to the starting
;nonterminal returns, the parser has recognised a legal sentence. If the
;parser is in normal mode (\code{*mode*} is \code{parsing}), everything
;is ok and the result of the call can be passed on. If, however, the
;parser is in recovery mode (\code{*mode*} is \code{collecting}), it has
;now collected the anchor set and synchronizing tokens needed in
;recovering. The input stream is then altered by \code{tg-recover}. After
;that, the state that was saved in \code{*continuation*} is invoked with
;argument \code{#t} and parsing continues.
;
;The procedure \code{parse}, which does the actual parsing, is wrapped
;within the procedure \code{parser}. The idea is, that the parser is a
;self-contained entity, and any interfacing is done in message passing
;style. Currently the parser responds to two messages: \code{parse}
;and \code{get-token-maker}, which start the parser and return the
;internal procedure that is used for constructing tokens.
;
;The plan is to make it possible to customize the parser by installing
;user defined error handlers etc.  This has not been implemented, yet.
;
;Note that the parser uses the \code{error} procedure of \file{scc},
;which differs from that in R3RS. If you plan to use some other Scheme,
;replace it by the \code{error} procedure that your implementation uses.
;
;\findex{generate-code}
(define (generate-code grammar sugar)
  (set! *attribute-count* 0)
  `(define parser
     (let ((*mode* 'parsing)
	   (*error-level* #f)
	   (*debug* #f)
	   (*lookahead* 'undefined)
	   (*semantic-lookahead* 'undefined)
	   (*line-number* 0)
	   (*input-file* "")
	   (*pushback-tokens* '())
	   (*continuation* #f)
	   (*recovery-info* '())
	   (*anchor-set* '())
	   (*inserted-tokens* '())
	   (*deleted-tokens* '())
	   (*sugar* ',sugar))
       ,@(fixed-code)
       ,@(code-for-rules grammar)
     (define (parse)
       (set! *mode* 'parsing)
       (set! *lookahead* 'undefined)
       (set! *semantic-lookahead* 'undefined)
       (set! *line-number* 0)
       (set! *input-file* "")
       (set! *pushback-tokens* '())
       (set! *continuation* #f)
       (set! *recovery-info* '())
       (set! *anchor-set* '())
       (set! *inserted-tokens* '())
       (set! *deleted-tokens* '())
       (set! *sugar* ',sugar)
       (tg-get-token)
       (let ((result
	     (,(nonterminal-name (rule-nonterminal
				  (car grammar))))))
	 (if (eq? *mode* 'collecting)
	     (begin
	       (tg-recover)
	       (tg-issue-error-message)
	       (*continuation* #f))
	     result)))

     (define (dispatch op . args)
       (case op
	 ((parse) (apply parse args))
	 ((token-maker) tg-make-token)
	 ((start-debugging) (set! *debug* #t))
	 ((stop-debugging) (set! *debug* #f))
	 (else
	  (taurus-error
		 "parser: Unknown message --" op))))

     dispatch)))

;\node Code Generated For Grammar Rules
;\comment  Node-Name,  Next,  Previous,  Up
;\section{Code Generated For Grammar Rules}
;
;For each rule, a corresponding procedure is generated:
;
;\findex{code-for-rules}
(define (code-for-rules grammar)
    	 (map (lambda (rule)
		;(display ";Generating code for " stderr-port)
                ;(display (nonterminal-name
		;	  (rule-nonterminal rule))
		;	 stderr-port)
		;(newline stderr-port)
                (code-for-a-rule rule grammar))
              grammar))

;Grammar rules map to procedure definitions.  Functions are named after
;the corresponding nonterminals.
;
;The semantic values of terminals and nonterminals are stored in local
;variables \code{$1}, \code{$2} etc., generated by \code{attribute-holder-bindings}.
;
;When the procedure is about to be exited, the state of the
;parser automaton is checked. If \code{*mode*} is \code{parsing}, the
;automaton is parsing normally, and the semantic value can be returned.
;Otherwise, a syntax error has been encountered and no error message has
;yet been issued. The variable \code{*error-level*} is checked, and if
;it is the same as the current nonterminal, an error message is issued.
;
;\findex{code-for-a-rule}
(define (code-for-a-rule rule grammar)
  `(define (,(nonterminal-name (rule-nonterminal rule)))
     ;(if *debug*
     ;	 (format #t
     ;		 "~S, entering ~S~%"
     ;		 *mode*
     ;		 ',(nonterminal-name (rule-nonterminal rule))))
     (let (($$
	    (let (,@(attribute-holder-bindings
		     (rule-expr rule)))
	      (begin
		,@(if (and (sequence? (rule-expr rule))
			   (nonterminal?
			    (car (sequence-elements (rule-expr rule)))))
		      (build-check-for (first-set (rule-expr rule)
						  grammar)
				       (nonterminal-name
					(rule-nonterminal rule)))
		      '())
		,@(expression-code (rule-expr rule)
				   rule
				   grammar
				   '())))))
      ; (if *debug*
      ;	   (format #t "~S, ~S returns ~S~%"
      ;		   *mode*
      ;		   ',(nonterminal-name (rule-nonterminal rule))
      ;		   $$))
       (cond ((eq? *mode* 'parsing) $$)
	     ((eq? *mode* 'collecting))
	     ((equal? *error-level*
		      ',(nonterminal-name
			 (rule-nonterminal rule)))
	      (tg-issue-severe-error *line-number*
				     *input-file*
				     ',(nonterminal-name
					(rule-nonterminal rule))
				     *lookahead*
				     *semantic-lookahead*
				     *PUSHBACK-TOKENS*
				     *deleted-tokens*
				     *inserted-tokens*)
	      (set! *mode* 'parsing)
	      (set! *error-level* #f)
	      ,(rule-error-value rule))
	     (else
	      ,(rule-error-value rule))))))

;\node Generating Error Messages, , , 
;\comment  Node-Name,  Next,  Previous,  Up
;\section{Generating Error Messages}
;
;The parser automatically issues error messages. There are a few
;different cases, for which the parser issues different error messages.
;
;The simplest case is pure deletion, ie. the parser has discarded a
;number of tokens and inserted none. In this case, the user is notified
;but no further actions are taken. 
;Pure insertion is also handled as a special case if the inserted tokens
;carry no semantic information, ie. they are purely syntactic sugar.
;Both of these cases are handled in 
;\code{tg-issue-error-message} (\pxref{Overall Structure Of The Parser}) right
;after error recovery has taken place.
;
;If recovering from an error has involved generating tokens with
;semantic values, a different error message is issued. In this case, an
;error message of the form {\tt ``\var{nonterminal} expected''}. If
;tokens have been deleted, they are reported, and if any generated
;tokens remain in the input stream, they are also reported. \xref{Fixed
;Parts} for more information.
;  
;\node Generating Code For Expressions
;\comment  Node-Name,  Next,  Previous,  Up
;\section{Generating Code For Expressions}
;
;The body of each generated procedure consists of Scheme forms that
;are generated from the expression defining the nonterminal. The
;procedure \code{expression-code} handles the generation of these
;forms. The \code{follow-set} that is passed along in the routines is the
;FOLLOW set of the expression \emph{within the rule\/}. It is used in
;some places to catch syntax errors before they are really encountered,
;to improve error recovery.
;
;\findex{expression-code}
(define (expression-code expr rule grammar follow-set)
    (cond
        ((terminal? expr)
	 (terminal-code expr rule grammar))
	((action? expr)
	 (action-code expr rule grammar))
	((nonterminal? expr)
	 (nonterminal-code expr rule grammar))
	((empty? expr)
	 (empty-symbol-code expr rule grammar))
	((sequence? expr)
	 (sequence-code expr rule grammar follow-set))
	((alternative? expr)
	 (alternative-code expr rule grammar follow-set))
	((zero-iteration? expr)
	 (zero-iteration-code expr rule grammar follow-set))
	((nonzero-iteration? expr)
	 (nonzero-iteration-code expr rule grammar follow-set))
	(else (taurus-error "expression-code: bad expression "
			    expr))))

;\node Code Generated For Terminals
;\comment  Node-Name,  Next,  Previous,  Up
;\section{Code Generated For Terminals}
;
;Terminals map to code that tests that the lookahead token is the
;desired one.  If the test succeeds, the next token is read to the
;lookahead. Otherwise, error recovery is invoked. The semantic value
;returned by the token is stored in a local variable. The procedure
;\code{tg-expect} (\pxref{Fixed Parts}) handles the error recovery if
;necessary.
;
;\findex{terminal-code}
(define (terminal-code expr rule grammar)
    `((set! ,(attribute-holder)
            (tg-expect ',(terminal-name expr)
		       ',(nonterminal-name
			  (rule-nonterminal rule))))))

;\node empty
;\comment  node-name,  next,  previous,  up
;\section{empty}
;\comment \section{\code{empty}}
;
;\c WORK HERE: is this used ?
;For the empty symbol, no actual code is produced.
;
;\findex{empty-symbol-code}
(define (empty-symbol-code expr rule grammar)
    '('the-empty-alternative))

;\node Code Generated For Nonterminals
;\comment  Node-Name,  Next,  Previous,  Up
;\section{Code Generated For Nonterminals}
;
;Nonterminals map to a call to the procedure generated from the
;nonterminal.
;
;\findex{nonterminal-code}
(define (nonterminal-code expr definition grammar)
    `((set! ,(attribute-holder)
            (,(nonterminal-name expr)))))


;\node Code Generated For Actions
;\comment  Node-Name,  Next,  Previous,  Up
;\section{Code Generated For Actions}
;
;Actions map to code that checks the state of the parser and executes
;the user supplied semantic action if appropriate.
;
;\findex{action-code}
(define (action-code expr definition grammar)
  `((cond ((eq? *mode* 'parsing)
	   (set! ,(attribute-holder)
		 ,(action-source-code expr))))))

;\node Code Generated For Sequences
;\comment  Node-Name,  Next,  Previous,  Up
;\section{Code Generated For Sequences}
;
;Sequences map to sequences of Scheme expressions.  Note: it is
;important to generate code for the elements left to right, because
;\code{attribute-holder} generates a new symbol each time it is called.
;This way the attribute holder symbols are correctly numbered.
;
;\findex{sequence-code}
(define (sequence-code expr rule grammar follow-set)
  (let loop ((elements (sequence-elements expr))
	     (code '()))
    (if (null? elements)
	code
	(loop (cdr elements)
	      (append code
		      (expression-code
		       (car elements)
		       rule
		       grammar
		       (append follow-set
			       (first-set
				(make-sequence (cdr elements))
				grammar))))))))

;\node Code Generated For Alternatives
;\comment  Node-Name,  Next,  Previous,  Up
;\section{Code Generated For Alternatives}
;
;Alternatives map to conditional statements that test the lookahead
;token.  If one of the tests succeeds, the corresponding piece of code
;is executed. Prior to entering the conditional statement, the
;lookahead token is examined with \code{tg-check-for}, and if the
;lookahead token can not start any of the expressions, error recovery
;is invoked. If the FIRST set of the alternative expression includes
;@code{empty}, also the symbols in @code{follow-set} are considered. If
;also @code{follow-set} includes @code{empty}, the test is omitted.
;This is done because error recovery would otherwise miss the allowed
;tokens that can start the different branches of the alternative
;expression.
;
;The last branch in the \code{cond}
;is made the default by appending an \code{else}. When parsing, the
;\code{else} is superfluous, because \code{tg-check-for} has already made
;sure that one of the branches can be selected. During recovery, however,
;the situation is different, because \code{tg-next} only records recovery
;information and returns \code{#f}.
;
;\code{alternative-code} builds the \code{cond} form, preceded by a call
;to \code{tg-check-for}. \code{build-cond-branch} is used to
;build one branch of the \code{cond} form.
;
;\findex{alternative-code}
(define (alternative-code expr rule grammar follow-set)

  (define (build-cond-branch expr)
    (if (empty? expr)
	'(#f)
	`((tg-next ',(map terminal-name
			  (first-set expr grammar)))
	  ,@(expression-code expr rule grammar follow-set))))

  (let* ((starters (first-set expr grammar))
	 (cond-clause
	  (list
	   (let loop ((choices (alternative-choices expr))
		      (result '(cond)))
	     (if (null? choices)
		 (reverse (cons (cons 'else (car result))
				(cdr result)))
		 (loop (cdr choices)
		       (cons (build-cond-branch (car choices))
			     result)))))))
    (append
     (cond ((member (make-empty) starters)
	    (build-check-for (set-union starters follow-set)
			     (nonterminal-name
			      (rule-nonterminal rule))))
	   (else
	    (build-check-for starters
			     (nonterminal-name
			      (rule-nonterminal rule)))))
     cond-clause)))


;\findex{build-check-for}
(define (build-check-for set calling-nonterminal)
  (if (member (make-empty) set)
      '()
      `((tg-check-for ',(map nonterminal-name set)
		      ',calling-nonterminal))))


;\node Code Generated For Iterations
;\comment  Node-Name,  Next,  Previous,  Up
;\section{Code Generated For Iterations}
;
;Iterations map to loop constructs.
;The only difference between iteration zero or more times and one or
;more times is the placement of the exit test.
;The loops are implemented with named let.
;Prior to each loop, the lookahead is examined to see if it can start
;either a new iteration or the next expression, similarily as with
;alternative expressions.
;
;Within each iteration, the special symbol \code{$iter} refers to the
;value that was returned as the semantic value of the last element of
;the iterand the last time it was iterated. When the iteration starts,
;the value of \code{$iter} is the empty list. This feature can used for
;example when collecting the value of a parameter list or a similar
;construct. 
;
;\findex{zero-iteration-code}
(define (zero-iteration-code expr rule grammar follow-set)
  (let ((starters (first-set (iteration-iterand expr)
			     grammar)))
    `((let tg-loop (($iter '()))
	(cond ((tg-next ',(map terminal-name
			       starters))
	       (let ((tg-value (begin
				 ,@(expression-code
				  (iteration-iterand expr)
				  rule
				  grammar
				  follow-set))))
		 ,@(build-check-for
		    (set-union starters follow-set)
		    (nonterminal-name
		     (rule-nonterminal rule)))
		 (tg-loop tg-value))))))))

;\findex{nonzero-iteration-code}
(define (nonzero-iteration-code expr rule grammar follow-set)
  (let ((starters (first-set (iteration-iterand expr)
			     grammar)))
    `((let tg-loop (($iter '()))
	(let ((tg-value (begin
			  ,@(expression-code (iteration-iterand expr)
					   rule
					   grammar
					   follow-set))))
	  ,@(build-check-for (set-union starters follow-set)
			     (nonterminal-name
			      (rule-nonterminal rule)))
	  (cond ((tg-next ',(map terminal-name starters))
		 (tg-loop tg-value))))))))
  
;Names for local variables that hold the semantic values returned by
;the constituents of an expression are generated using
;\code{attribute-holder-bindings}. The global varible
;\code{*attribute-count} is used for storing the count.
;
;\strong{Note!} the \code{number->string} procedure differs a bit from
;implementation to implementation. MIT's \file{cscheme} needs a third
;argument \code{'(heur)} to work properly while Digital's
;\file{scheme2c} barfs at it.
;
;\findex{*attribute-count*}
(define *attribute-count* 0)

;\findex{attribute-holder}
(define (attribute-holder)
  (set! *attribute-count* (+ *attribute-count* 1))
  (string->symbol
   (string-append "$" (number->string *attribute-count*))))


;\findex{attribute-holder-bindings}
(define (attribute-holder-bindings expr)
  (set! *attribute-count* 0)
  (let loop ((count (number-of-constituents expr))
             (bindings '()))
    (if (= count 0)
	(begin
	  (set! *attribute-count* 0)
	  bindings)
        (loop (- count 1)
              `((,(attribute-holder) '())
		,@bindings)))))

;To be able to generate the right number of local variables, we need
;know how many terminals and nonterminals there are in the expression.
;
;\findex{number-of-constituents}
(define (number-of-constituents expr)
  (cond
   ((null? expr) 0)
   ((empty? expr) 0)
   ((action? expr) 1)
   ((terminal? expr) 1)
   ((nonterminal? expr) 1)
   ((zero-iteration? expr)
    (number-of-constituents (iteration-iterand expr)))
   ((nonzero-iteration? expr)
    (number-of-constituents (iteration-iterand expr)))
   ((sequence? expr)
    (apply +
	   (map (lambda (element)
		  (number-of-constituents element))
		(sequence-elements expr))))
   ((alternative? expr)
    (apply +
	   (map (lambda (choice)
		  (number-of-constituents choice))
		(alternative-choices expr))))
   (else
    (taurus-error "number-of-constituents: bad expression "
		  expr))))
