* Implementation plan for GenKit 1. The implementation is divided into two parts Part I. Front-end: a. Grammar formalism design and parser implementation b. User interface (shell & socket-based), including the debugging interface Part II. Kernel: a. Unification engine b. Generation algorithm Erik is on I and Ben is on II. 2. The implementation will be in C++. Unicode will be adopted for better i18n. Q: Unicode is a compile-time decision or run-time? 3. We plan to work separately for the next 1.5 months to implement both modules. At that point we discuss the interface issues between the two: basically we need to agree on a set of interface data structures. The front-end will instantiate the structures, which are then fed into the kernel. We plan to have another 1.5 months for integration, testing and debugging. 4. Issues for the front-end (most of theses issues need to be sorted out with the grammar writers (Donna) ) a. We will use ANTLR to generate C++ version of the grammar. ANTLR is a parser generator capable of generating parsers in both C++ and Java code. It's easier to maintain the grammar formalism in the future. b. We decide *not* to implement any call-out mechanism at this point. Instead, the formalism should provide a set of frequently used constraint type values (e.g., *numbers*, *integers*), and a flexible way to do table-lookup. Q: Do we need to implement simple arithmetic capability? cf. Tomita & Knight, Pseudo-Unification and Full-Unification, CMT MEMO 88, p.7. c. Modularity concern: a more elegant way to design the name scoping system? The ::TAG suffix? The argument is that this way we have to make explicit for all NTs (esp. the RHS). But this probably can be solved by defining the semantics of NTs without tags? (so NP can be matched to NP::SHARED or NP::MED) The old name scoping system is simply too chaotic - a bunch of grammar writers need to agree on the naming scheme before starting the dev. It's true that the front-end can issue a warning "NT xx in grammar_1 is redefined in grammar_2", but then the grammar writer needs to change each and every instance where the offending NT occurs. C++ has this concept "namespace". d. We plan to add a few LFG-motivated expressions (e.g., functional uncertainty, inside-out function application, etc.). The goal is to allow future extension of the front-end to fully comply with LFG formalism. Xerox LFG workbench is a complete parsing implementation for LFG (in LISP). We can model after it for certain notations. Q: We need to confirm there's no conflict between the existing GenKit formalism and LFG formalism. e. Uniformity between analysis and generation grammars: make them as similar as possible (or at least they can be converted back and forth mechanically). f. The code should run on both UNIX/LINUX and Windows. 5. Issues for the kernel a. Most of the following issues are to make the kernel extendible toward the LFG formalism. b. We plan to implement both true and pseudo unifications. Q: Is it possible to make it a run-time switch? c. The orderings of equations execution are different in GenKit and LFG. In the former the constraint eqs are executed in the order they are specified, but in the latter the constraint eqs are executed after the unification eqs - how to accommodate the two at the same time? d. Smart generation algorithm - is optimization possible (maybe by caching)? Q: How does LFG address the generation problems?