NOTE: This paper is not finished, but there's a lot of worthwhile stuff.

It is probably a good idea to browse through the Lily source code before
reading this document.
------------------------------------------------------------------
                            Lily User's Guide

                              Roger Sheldon

                              September 1993


Overview
--------

Artificial Intelligence programs require tremendous amounts of list
processing, making Lisp a popular language for their implementation.
Lisp is also chosen for rapid prototyping efforts which make use of
its weak type checking as well as the powerful programming
environments built around it.  A C++ class library, Lily, has been
designed which provides a subset of the functionality of Lisp.  Lisp
data types are implemented as classes each of which is derived from a
single base class that provides default behavior (typically an error)
through virtual functions.  Additional data types can be easily added
by deriving a new class from the base class and overriding the default
behavior as needed.  A simple reference counting garbage collection
technique is used which is made transparent to the programmer by
exploiting the automatic contruction and destruction of temporary
objects.  This paper describes the design of Lily, presents the
advantages and disadvantages of the design, and concludes with ideas
for enhancing Lily.

1. Background
-------------

1.1 Introduction
----------------

C++ is enjoying a great deal of popularity.  While languages such
as Ada (Booch 83, Holden 90), Smalltalk (Goldberg 83), Lisp (Steele
84), Eiffel (Meyer 89), and others are being used, they are certainly
not gaining in popularity as fast as C++.  This is due in part to the
fact that C++ allows very efficient implementations and an easy
migration path for the large amounts of existing C software and
programmers (Jordan 90).

Similarly, Artificial Intelligence (AI) software is gaining acceptance
as an effective approach for solving various problems.  Though there
are numerous programming languages and expert systems shells to
facilitate the development of AI software, the delivery of a system
often requires porting Lisp to C or purchasing special-purpose
hardware.

Lily is a C++ class library which provides many of the list processing
capabilities of the Lisp language.  Listing 1 compares the Lisp
implementation of the shove-pl function (Winston 84, pg. 261) to the
C++/Lily implementation.  The translation of Lisp to C++ code using Lily is
straightforward (assuming the Lisp code doesn't use features of Lisp
which aren't supported by Lily).

                              Listing 1.

(defun shove-pl (variable item a-list)
  (cond ((null a-list) (list (list variable (list item))))
        ((equal variable (caar a-list))
         (cons (list variable (append (cadar a-list) (list item)))
               (cdr a-list)))
        (t (cons (car a-list) (shove-pl variable item (cdr a-list))))))

LObject shove_pl(LObject &variable, LObject &item, LObject &alist) {
    if (!alist)
        return list(list(variable, list(item)));
    else if (variable == caar(alist))
        return cons(list(variable, Append(cadar(alist), list(item))), 
                    cdr(alist));
    else
        return cons(car(alist), shove_pl(variable, item, cdr(alist))); 
}

This paper describes the Lily C++ class library.  In this first section, the
structure of heterogeneous lists is described.  Next, related work is
reviewed and compared to Lily.  The second section describes the architecture
of the Lily library.  The third and final section views Lily from the
Lily programmer's point of view.  The paper concludes with notes
on the current implementation, acknowledgements, thoughts on future
enhancements to Lily, and finally, conclusions.

1.2 Heterogeneous Lists
-----------------------

Lisp is a powerful language whose primary data structure is a cons
cell.  Using Winston's box notation for depicting list structures, figures
1 and 2 show the representation of two lists.
The left and right sides of each box are called the "car" and
"cdr".  The car contains a pointer to some data object, often a
sublist.  The cdr contains a pointer to the next cons cell in the
list.  A list is terminated by the special symbol nil (depicted as a
slash ('/') in figures 1 and 2.)  Lisp lists are said to be heterogeneous
because the car of each cell can be of any data type.  (Footnote: the cdr
can be of any type as well, but this point will not be addressed in this
paper.)  Heterogeneous lists can be easily developed in any object
oriented programming language which supports inheritence and polymorphism.

                 +-------+    +-------+    +-------+
                 | * | * |--->| * | * |--->| * | / |
                 +-------+    +-------+    +-------+
                   |            |            |
                   v            v            v
                   A            B            C

         Figure 1.  Box notation diagram of the list (A B C).

               +-------+   +-------+   +-------+
               | * | * |-->| * | * |-->| * | / |
               +-------|   +-------+   +-------+
                 |           |           |
                 v           v           v
               +-------+     B         +-------+
               | * | / |               | * | / |
               +-------+               +-------+
                 |                       |
                 v                       v
                 A                     +-------+   +-------+
                                       | * | * |-->| * | / |
                                       +-------+   +-------+
                                         |           |
                                         v           v
                                         C           D

   Figure 2.  Box notation diagram of the list ((A) B ((C D))).


1.3 Related Work
----------------

The Free Software Foundation's libg++ class library (Lea 89) contains
a parameterized list class written by Doug Lea (dl@rocky.oswego.edu).
Here is an excerpt from the Gnu Emacs Info node on Libg++ Lists:

    "Files `g++-include/List.hP' and `g++ include/List.ccP' provide
    pseudo-generic Lisp-type List classes.  These lists are homogeneous
    lists, more similar to lists in syntactically typed functional
    languages like ML than Lisp, but support operations very similar to
    those found in Lisp.  Any particular kind of list class may be
    generated via the `genclass' shell command.  However, the
    implementation assumes that the base class supports an equality
    operator `=='.  All equality tests use the `==' operator, and are thus
    equivalent to the use of `equal', not `eq' in Lisp.

    All list nodes are created dynamically, and managed via reference
    counts. `List' variables are actually pointers to these list nodes.
    Lists may also be traversed via Pixes.

    Supported operations are mirrored closely after those in Lisp.
    Generally, operations with functional forms are constructive,
    functional operations, while member forms (often with the same name)
    are sometimes operations with functional forms are constructive,
    functional operations, while member forms (often with the same name)
    are sometimes procedural, possibly destructive operations.

    As with Lisp, destructive operations are supported.  Programmers are
    allowed to change head and tail fields in any fashion, creating
    circular structures and the like. However, again as with Lisp, some
    operations implicitly assume that they are operating on pure lists,
    and may enter infinite loops when presented with improper lists. Also,
    the reference- counting storage management facility may fail to
    reclaim unused circularly-linked nodes.

    Several Lisp-like higher order functions are supported (e.g., `map').
    Typedef declarations for the required functional forms are provided
    int the `.h' file."

Another related work is Hu's book "C/C++ for Expert Systems".
Hu's Lisp utilities were written in C, not C++.  In fact, most of the
utilities described in the book were written in C.  Hu's C++ examples
provide a simple example of using C++ for object-oriented programming and
do not address any artificial intelligence problems.

The use of a reference counting garbage collection mechanism combined with
temporary objects has been described by Bruce Eckel in "Who's Minding the
Store? Reference Counting in C++", Computer Language, May 1992.

2. Inside Lily
--------------

2.1 Lily Class Hierarchy
------------------------

Lily currently supports 4 basic data types: symbols, functions,
numbers, and lists.  These data types are represented as C++ classes
derived from a single base class as shown in figure 3.  The LObject
(Lily Object) class encapsulates a Base object.  Users of the Lily
library work only with instances of the LObject class.  All the other
classes shown in the hierarchy are internal to Lily.  The LObject and
Base class work together to provide a transparent garbage collection
mechanism -- more on this later.  The Base class takes care of
reference counting and error handling.  The Cons class provides
heterogeneous lists.  The empty list, nil, is an instance of class
Null which is derived from the Cons class.  The Real and Integer
classes provide floating point and integer numbers.  The Symbol class
provides strings.  (In Lisp symbols serve as both data and variables
whereas in Lily symbols serve as data and LObjects are variables.)
Logical true, t, is an instance of the class special_Symbol which is
derived from class Symbol.  Function objects are specialized Symbols with an
additional data member for a pointer to a function.

                        - - -LObject- - -
                        |     Base      |
                        - - - -|- - - - -
                               |
                  ----------------------------
                  |        |        |        |
                 Cons     Real   Integer   Symbol
                  |                          |
                 Null                -----------------
                                     |               |
                               special_Symbol     Function

                     Figure 3.  Lily Class Hierarchy.


In the remainder of this section we will explore the all the internal Lily 
classes.  The next section will describe how the LObject class works.

2.2 Base Class
--------------

The Base class defines virtual functions for all the possible Lily (Lisp)
functions:

  class Base {
  protected:
    short   refs;       // number of references to this object
  public:
    Base()                                  { refs = 0; }
    virtual ~Base()                         { }
    virtual operator    int();  // conversion for logical expressions
    virtual Base &      Atom()              { return *t.value; }
    virtual Base &      Car()               { DE; return *nil.value; }
    virtual Base &      Cdr()               { DE; return *nil.value; }
    virtual Base &      Copy()              { DE; return *nil.value; }
    virtual Base &      Copy_list()         { DE; return *nil.value; }
    ...
  };

Where DE is #defined as:

  #define DE cerr << "ERROR: " << __FILE__ << " line: " << __LINE__ << "\n"

Thus, if a derived class does not override one of the virtual functions, the
Base class ensures that an error message is issued.  For example, the
Cons class provides the Car() member function but the Integer class does not.
The DE macro tells the programmer the file and line number where an error
occurred.  (It does not, unfortunately, tell the programmer which function
failed.  It would be a simple matter to add a 'function' argument to the
DE macro which would be used to tell the programmer which function failed.)
Each class derived from Base overrides the virtual functions as necessary and
adds any necessary data members.

The operator int() member function is used for evaluating logical 
expressions.  In Lisp, logical false is represented by nil, anything else
implies logical true.  Lily works the same way.  When a Lily object apprears
in a logical expression, the operator int() member function returns 1 unless
the object is nil, in which case 0 is returned.

2.2 The Integer and Real Classes
--------------------------------

Here is the Integer class:

class Integer : public Base {
  int value;
	public:
    	Integer(int i);
    	~Integer();
    	void    Decr(Base &a)           { value -= a.Integer_value(); }
    	void    Incr(Base &a)           { value += a.Integer_value(); }
    	int     Integer_value();
    	Base &  Numberp()               { return *t.value; }
    	ostream&Print(ostream&s=cout);
    	Base &  Equal(Base &a);
    	LObject_type Type();
    	Base &  Typep(LObject_type);
    	operator int()                  { return value; }
	// Non-Lisp utilities
    	Base &  Copy();
	};

The Integer class adds an int data member.  Most of the Integer member
functions are straightforward.  

The Integer class is the only class which complicates logical expressions.
If an Integer object appears in an expression we don't know if the expression
is a logical expression or possibly a mathematical expression.  In the
case of a logical expression we want an Integer to behave like all other
Lily objects; that is, return 1 to indicate it's not nil.  In the 
case of a mathematical expression we want the Integer to evaluate to its value.

To be consistent, the Integer::operator int() member function provides
the semantics of logical true/false.  To get the actual int value of the
object you must invoke the Integer_value() method.

2.3 The Symbol Classes
----------------------

-------------------------------------------------------------------------------

                  READ THIS !!!

EVERYTHING FROM THIS POINT ON IS CHICKEN-SCRATCH; READ AT YOUR OWN FRUSTRATION
-------------------------------------------------------------------------------

class Symbol : public Base {
protected:
    char    *value;
public:
    Symbol(char *s);
    ~Symbol();

    Base &      Atom();
    Base &      Copy();
    Base &      Equal(Base &a);
    ostream&    Print(ostream& s=cout);
//  Base &      Print();
    char *      Symbol_name();
    LObject_type Type();
    Base &      Typep(LObject_type);
};

class special_Symbol : public Symbol {  // to make special Symbol t
public:
    special_Symbol(char *s);
    Base *  Ref();
    void    Deref();
};

Since the cons cell is the most commonly used data structure in Lisp, it
follows that the Cons class overrides most of the virtual functions:
from reference maintenance:

 class LObject {
  Base *value;
  // ...
 };
----------

The signature of each Lily function is:

	LObject foo(LObject &, ...);

where ... means 'however many arguments foo requires', not variable
arguments.  Later we shall see that this particular signature is
responsible for reclaiming memory immediately after it becomes
garbage.

3.1 Constructing LObjects
-------------------------

Programmers using the Lily class library manipulate Lisp data structures
through instances of the class LObject.  It is often the case that the LObject
class can provide convenience constructors for creating Lisp objects.
For example:

  class LObject {
	...
  public:
    LObject(Base &);
		...
  };

Objects can be created by instantiating one of the Lisp object types and
passing the Lisp object to the LObject class constructor:

	LObject a_symbol(*new Symbol("foo"));

  class LObject {
	...
  public:
    LObject();
    LObject(LObject &);
    LObject(Base &);
    LObject(char *);
    LObject(int);
    LObject(float);
		      // ...
  };

The user could use this constructor to achieve the same result as before:

Programmers using the Lily class library manipulate Lisp data structures
through instances of the class LObject.  It is often the case that the LObject
class can provide convenience constructors for creating Lisp objects.
For example:

  LObject i(*new Integer(20));
LObject sym(*new Symbol("foo"));
  LObject::LObject(int i) {
    value = new Integer(i);
    value->Ref();
  }

  LObject::LObject(char *s) {
    value = new Symbol(s);
    value->Ref();
  }

These convenience constructors provide a certain amount of abstraction
to the coding effort.  Similar to Lisp, the type of the data object is
determined implicitly, permitting programmers to write:

LObject i(20);  // creates an
Integer
LObject sym("foo"); // create a Symbol

It is not necessary to use a convenience constructor.  The LObject class
provides the constructor:

LObject::LObject(Base &a){value=a.Ref();}

The user could use this constructor to achieve the
same result as before:

LObject i(*new Integer(20));
LObject sym(*new Symbol("foo"));

In most cases the programmer can take advantage of the convenience
constructors.  Sometimes though the constructors for the Lisp data
classes have identical arguments.  For example, if a String class was
added, its constructor, String::String(char *), would conflict with
the constructor for class Symbol, Symbol::Symbol(char *).  Thus no
convenience constructor for Strings could be provided and the user
must tediously write :

LObject str(*new String("mystring"));

Interface Functions
-------------------

In addition to convenience constructors a set of functions are used to
interface between users' code and Lily's virtual functions (described
in the next section.)  An example of an interface function is:

LObject assoc(LObject &item, LObject &a_list) {
 return a_list.value->Assoc(item);
}

Interface functions provide a layer of abstraction on top of the
virtual functions.  Without them, a statement like:

LObjectect ans = assoc(item, alist);

would have to be written as:

LObjectect ans =
 alist.value->Assoc(*item.value);

The former style is quicker to write and easier to comprehend than the
latter and protects the user from the internal representation.  In
fact, the latter style is not permitted because, as seen in Listing 1,
the value data member of class LObject is private, accessible only by
friends (all interface functions are friends of class LObject).  Since
interface functions are declared inline, there is no runtime overhead
incurred by their use.

Virtual Functions and Error Handling
------------------------------------

Many operations in Lisp are illegal.  For example:

(car 'a) ; error

is an error because the argument to car must be a list.  Inheritance
can be used to implicitly perform type-checking of arguments.  The
base class provides virtual functions whose default behavior is to
invoke an error handler.  Derived classes override the virtual methods
when appropriate.  For example, Base::Car() is defined as:

Base & Base::Car()
 { err("car"); return *nil.value }

where DEexit is a macro which prints the file name and line number and
exits the program.  (A friendlier error handler would be required for
interactive programs such an interpreter which must abort the current
operation but continue with the program.)  There is no symbol::Car()
method to override Base::Car(), so the following will invoke
Base::Car():

obj foo = new symbol("a");
obj bar = car(foo); // error

However, a Cons object will handle ::Car()
messages with:

obj Cons::Car() { return _car; }

Immediate Garbage Collection
----------------------------

One of Lisp's most convenient features is automatic garbage collection (Allen
78).  Consider the following Lisp code:

    (setq baz '(a b))
    (setq baz (cdr baz))

When baz is assigned the cdr of itself, the car, 'a', is left
unreferenced.  The Lisp garbage collector will recognize this and
reclaim the memory so it can be reused.  Later we will examine how Lily
achieves automatic garbage collection.


Why enclose pointers to Lily objects in the obj class?  LObject-oriented
software usually involves manipulation of objects directly.

Variable Arguments
------------------

t, nil


Winston's Matching Program
--------------------------

ljdf

Availability
------------

Lily has only been tested with GNU's g++ (REF!)  compiler (version
1.37) at this time.  An attempt to compile with AT&T's C++ 2.0 failed
because it couldn't handle temporary objects in certain expressions.

GNU's Bison (REF!) and Flex (REF!) programs are used to parse input
for the read() function.  The corresponding yacc and lex programs in
UNIX can be used as well, but require a little hacking since they
don't generate ANSI C as Bison and Flex do.


Future Work
-----------

Disadvantages: - numbers are class objects -- Lily could be modified
to support ints and floats, but this would contaminate the design.

Acknowledgements

Carol Whitney, Stuart Weinstein, FSF, Stroustroup

Conclusions
-----------


References
----------

Allen, J. (1978). Anatomy of Lisp. New York, NY: McGraw-Hill.

Booch, G. (1983). Software Engineering with Ada.  Menlo Park, CA:
Benjamin/Cummings.

Ellis, M., & Stroustroup, B. (1990). The Annotated C++ Reference
Manual. Reading, MA: Addison- Wesley.

Goldberg, A. & Robson, D. (1983). Smalltalk-80: The Language and Its
Implementation. Addison- Wesley, 1984.

Holden, T. (1990). C++: The Language Ada Was Supposed to Be?, The C++
Report . Volume 2, Number 4, April 1990.

Hu, D. (1989). C/C++ for Expert Systems. Portland, OR: MIS Press.

Jordan, D. (1990). Implementation Benefits of C++ Language Mechanisms,
Communications of the ACM, Volume 33, Number 9, September 1990.

Lea, D. (dl@rocky.oswego.edu) (1990). Libg++ List Class Description,
an online help document accessed via the GNU Emacs Info utility, File:
libg++, Node: List. Free Software Foundation, Inc., 675 Mass Ave,
Cambridge, MA 02139, USA.

Meyer, B. (1989). Eiffel: The Language. Tech. Rep.  TR-E1-17/RM,
Interactive Software Engineering Inc., Santa Barbara, CA.

Stallman, R. (1987). GNU Emacs Manual, Sixth Edition, Version 18. Free
Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139.

Tiemann, M. (1990). User's Guide to GNU C++ for version 1.37.1. Free
Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139.

Stallman, R., Tiemann, M. Lea, et. al. (1987). g++, Version 18. Free
Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139.

Steele, G. (1984). Common Lisp: The Language.  Burlington, MA: Digital
Press.

Winston, P., & Horn, B. (1984). Lisp Second Edition. Reading, MA:
Addison-Wesley.




Listing 1
Comparison of Lisp to C++.

typedef struct _cons {  /* a cons-cell */
   union {
     struct _cons *p;
     char         *s;
   } car;
   struct _cons  *cdr;  /*usually points to a sublist */
   unsigned char type;  /* the types of the pointers in the cells */
} cons;

#define CAR_STRING 1
#define CAR_INTEGER 2
#define CAR_LIST  4
#define CDR_STRING 8
#define CDR_INTEGER 16
#define CDR_LIST  32
...
killcons(p)
cons *p;
{
  if (p != NULL) {
    if ((p->type & CAR_LIST) == CAR_LIST) {
      killcons(p->car.p);
      killcons(p->cdr);
    } else if ((p->type & CAR_STRING) == CAR_STRING) {
      free(p->car.s);  /* get rid of the string */
    }
    free(p);
  }
}

Listing 3.
Lily's Base and Interface Classes

  The box-and-arrow notation in figure 1 is taken
from (Winston 84).
  In this paper we ignore dotted lists which are lists
terminated by a non-nil atom.
  Of course, the LObjectect::LObjectect(char *) constructor
could be changed to create Strings rather than
Symbols.

Summary
-------

Eventually, Lily will support many other Lisp data types including
arrays, strings, complex and float numbers, bit-vectors, etc.
Additional data types can easily be added by deriving from class Base.
Note however that, aside from data types, many of Lisp's features,
such as packages and macros, are not planned to be supported.  The
goal is to provide list processing capabilities for C++ programmers,
not build another Lisp.
