* Implementation plan for GenKit

1. The implementation is divided into two parts

   Part I. Front-end: a. Grammar formalism design and parser implementation
                      b. User interface (shell & socket-based), including
                         the debugging interface

   Part II. Kernel: a. Unification engine
                    b. Generation algorithm

   Erik is on I and Ben is on II.

2. The implementation will be in C++. Unicode will be adopted for better i18n.
   
   Q: Unicode is a compile-time decision or run-time?

3. We plan to work separately for the next 1.5 months to implement both
   modules. At that point we discuss the interface issues between the two:
   basically we need to agree on a set of interface data structures. The
   front-end will instantiate the structures, which are then fed into the
   kernel.

   We plan to have another 1.5 months for integration, testing and debugging.

4. Issues for the front-end (most of theses issues need to be sorted out with
   the grammar writers (Donna) )

a. We will use ANTLR to generate C++ version of the grammar. ANTLR is a parser
   generator capable of generating parsers in both C++ and Java code. It's 
   easier to maintain the grammar formalism in the future.

b. We decide *not* to implement any call-out mechanism at this point. Instead,
   the formalism should provide a set of frequently used constraint type values 
   (e.g., *numbers*, *integers*), and a flexible way to do table-lookup.

   Q: Do we need to implement simple arithmetic capability? cf. Tomita & Knight,
      Pseudo-Unification and Full-Unification, CMT MEMO 88, p.7.

c. Modularity concern: a more elegant way to design the name scoping system?
   The ::TAG suffix? The argument is that this way we have to make explicit 
   for all NTs (esp. the RHS). But this probably can be solved by defining 
   the semantics of NTs without tags? (so NP can be matched to NP::SHARED or 
   NP::MED)

   The old name scoping system is simply too chaotic - a bunch of grammar
   writers need to agree on the naming scheme before starting the dev. It's
   true that the front-end can issue a warning "NT xx in grammar_1 is redefined 
   in grammar_2", but then the grammar writer needs to change each and every
   instance where the offending NT occurs.

   C++ has this concept "namespace".

d. We plan to add a few LFG-motivated expressions (e.g., functional uncertainty,
   inside-out function application, etc.). The goal is to allow future extension
   of the front-end to fully comply with LFG formalism.

   Xerox LFG workbench is a complete parsing implementation for LFG (in LISP). We 
   can model after it for certain notations.

   Q: We need to confirm there's no conflict between the existing GenKit formalism
      and LFG formalism.

e. Uniformity between analysis and generation grammars: make them as similar as 
   possible (or at least they can be converted back and forth mechanically).

f. The code should run on both UNIX/LINUX and Windows.


5. Issues for the kernel

a. Most of the following issues are to make the kernel extendible toward the LFG
   formalism.

b. We plan to implement both true and pseudo unifications.

   Q: Is it possible to make it a run-time switch?

c. The orderings of equations execution are different in GenKit and LFG. In the former
   the constraint eqs are executed in the order they are specified, but in the latter
   the constraint eqs are executed after the unification eqs - how to accommodate the
   two at the same time?

d. Smart generation algorithm - is optimization possible (maybe by caching)?

   Q: How does LFG address the generation problems?