Common Lisp the Language, 2nd Edition

next up previous contents index
Next: What the Print Up: Printed Representation of Previous: Standard Dispatching Macro

22.1.5. The Readtable

Previous sections describe the standard syntax accepted by the read function. This section discusses the advanced topic of altering the standard syntax either to provide extended syntax for Lisp objects or to aid the writing of other parsers.

There is a data structure called the readtable that is used to control the reader. It contains information about the syntax of each character equivalent to that in table 22-1. It is set up exactly as in table 22-1 to give the standard Common Lisp meanings to all the characters, but the user can change the meanings of characters to alter and customize the syntax of characters. It is also possible to have several readtables describing different syntaxes and to switch from one to another by binding the variable *readtable*.

Even if an implementation supports characters with non-zero bits and font attributes, it need not (but may) allow for such characters to have syntax descriptions in the readtable. However, every character of type string-char must be represented in the readtable.

X3J13 voted in March 1989 (CHARACTER-PROPOSAL)   to remove the type string-char and to replace the bits and font attributes with the notion of implementation-defined attributes. If any implementation-defined attributes are supported, an implementation may (but need not) allow for such characters to have syntax descriptions in the readtable. Characters that do not have non-standard values for any implementation-defined attribute must be represented in the readtable.


The value of *readtable* is the current readtable. The initial value of this is a readtable set up for standard Common Lisp syntax. You can bind this variable to temporarily change the readtable being used.

To program the reader for a different syntax, a set of functions are provided for manipulating readtables. Normally, you should begin with a copy of the standard Common Lisp readtable and then customize the individual characters within that copy.

copy-readtable &optional from-readtable to-readtable

A copy is made of from-readtable, which defaults to the current readtable (the value of the global variable *readtable*). If from-readtable is nil, then a copy of a standard Common Lisp readtable is made. For example,

(setq *readtable* (copy-readtable nil))

will restore the input syntax to standard Common Lisp syntax, even if the original readtable has been clobbered (assuming it is not so badly clobbered that you cannot type in the above expression!). On the other hand,

(setq *readtable* (copy-readtable))

will merely replace the current readtable with a copy of itself.

If to-readtable is unsupplied or nil, a fresh copy is made. Otherwise, to-readtable must be a readtable, which is destructively copied into.

readtablep object

readtablep is true if its argument is a readtable, and otherwise is false.

(readtablep x) == (typep x 'readtable)

set-syntax-from-char to-char from-char &optional to-readtable from-readtable

This makes the syntax of to-char in to-readtable be the same as the syntax of from-char in from-readtable. The to-readtable defaults to the current readtable (the value of the global variable *readtable*), and from-readtable defaults to nil, meaning to use the syntaxes from the standard Lisp readtable.

X3J13 voted in January 1989 (ARGUMENTS-UNDERSPECIFIED)   to clarify that the to-char and from-char must each be a character.

Only attributes as shown in table 22-1 are copied; moreover, if a macro character is copied, the macro definition function is copied also. However, attributes as shown in table 22-3 are not copied; they are ``hard-wired'' into the extended-token parser. For example, if the definition of S is copied to *, then * will become a constituent that is alphabetic but cannot be used as an exponent indicator for short-format floating-point number syntax.

It works to copy a macro definition from a character such as " to another character; the standard definition for " looks for another character that is the same as the character that invoked it. It doesn't work to copy the definition of ( to {, for example; it can be done, but it lets one write lists in the form {a b c), not {a b c}, because the definition always looks for a closing parenthesis, not a closing brace. See the function read-delimited-list, which is useful in this connection.

X3J13 voted in January 1989 (RETURN-VALUES-UNSPECIFIED)   to specify that the set-syntax-from-char function returns t.


set-macro-character char function &optional 
         non-terminating-p readtable 
get-macro-character char &optional readtable
set-macro-character causes char to be a macro character that when seen by read causes function to be called. If non-terminating-p is not nil (it defaults to nil), then it will be a non-terminating macro character: it may be embedded within extended tokens. set-macro-character returns t.

get-macro-character returns the function associated with char and, as a second value, returns the non-terminating-p flag; it returns nil if char does not have macro-character syntax. In each case, readtable defaults to the current readtable.

X3J13 voted in January 1989 (GET-MACRO-CHARACTER-READTABLE)   to specify that if nil is explicitly passed as the second argument to get-macro-character, then the standard readtable is used. This is consistent with the behavior of copy-readtable.

The function is called with two arguments, stream and char. The stream is the input stream, and char is the macro character itself. In the simplest case, function may return a Lisp object. This object is taken to be that whose printed representation was the macro character and any following characters read by the function. As an example, a plausible definition of the standard single quote character is:

(defun single-quote-reader (stream char) 
  (declare (ignore char)) 
  (list 'quote (read stream t nil t))) 

(set-macro-character #\' #'single-quote-reader)

(Note that t is specified for the recursive-p argument to read; see section 22.2.1.) The function reads an object following the single-quote and returns a list of the symbol quote and that object. The char argument is ignored.

The function may choose instead to return zero values (for example, by using (values) as the return expression). In this case, the macro character and whatever it may have read contribute nothing to the object being read. As an example, here is a plausible definition for the standard semicolon (comment) character:

(defun semicolon-reader (stream char) 
  (declare (ignore char)) 
  ;; First swallow the rest of the current input line. 
  ;; End-of-file is acceptable for terminating the comment. 
  (do () ((char= (read-char stream nil #\Newline t) #\Newline))) 
  ;; Return zero values. 

(set-macro-character #\; #'semicolon-reader)

(Note that t is specified for the recursive-p argument to read-char; see section 22.2.1.)

The function should not have any side effects other than on the stream. Because of backtracking and restarting of the read operation, front ends (such as editors and rubout handlers) to the reader may cause function to be called repeatedly during the reading of a single expression in which the macro character only appears once.

Compatibility note: The ability to return either zero or one value is the closest Common Lisp macro characters come to the splicing macro characters of MacLisp or the splice macro characters of Interlisp. The Common Lisp definition does not allow the splicing of arbitrarily many values, but it does allow a macro-character function to decide after it is invoked whether or not to yield a value, an option not possible in MacLisp or Interlisp.

MacLisp has nothing equivalent to non-terminating macro characters. The Interlisp equivalents of terminating and non-terminating macro characters are macro characters with the ALWAYS or FIRST option, respectively. Common Lisp has nothing equivalent to the Interlisp ALONE macro-character option.

Here is an example of a more elaborate set of read-macro characters that I used in the implementation of the original simulator for Connection Machine Lisp [44,57], a parallel dialect of Common Lisp. This simulator was used to gain experience with the language before freezing its design for full-scale implementation on a Connection Machine computer system. This example illustrates the typical manner in which a language designer can embed a new language within the syntactic and semantic framework of Lisp, saving the effort of designing an implementation from scratch.

Connection Machine Lisp introduces a new data type called a xapping, which is simply an unordered set of ordered pairs of Lisp objects. The first element of each pair is called the index and the second element the value. We say that the xapping maps each index to its corresponding value. No two pairs of the same xapping may have the same (that is, eql) index. Xappings may be finite or infinite sets of pairs; only certain kinds of infinite xappings are required, and special representations are used for them.

A finite xapping is notated by writing the pairs between braces, separated by whitespace. A pair is notated by writing the index and the value, separated by a right arrow (or an exclamation point if the host Common Lisp has no right-arrow character).

Remark: The original language design used the right arrow; the exclamation point was chosen to replace it on ASCII-only terminals because it is one of the six characters [ ] { } ! ? reserved by Common Lisp to the user.

While preparing the TeX manuscript for this book I made a mistake in font selection and discovered that by an absolutely incredible coincidence the right arrow has the same numerical code (octal 41) within TeX fonts as the ASCII exclamation point. The result was that although the manuscript called for right arrows, exclamation points came out in the printed copy. Imagine my astonishment!

Here is an example of a xapping that maps three symbols to strings:

{moe->"Oh, a wise guy, eh?" larry->"Hey, what's the idea?" 
  curly->"Nyuk, nyuk, nyuk!"}

For convenience there are certain abbreviated notations. If the index and value for a pair are the same object x, then instead of having to write ``x->x'' (or, worse yet, ``#43=x->#43#'') we may write simply x for the pair. If all pairs of a xapping are of this form, we call the xapping a xet. For example, the notation

{baseball chess cricket curling bocce 43-man-squamish}

is entirely equivalent in meaning to

{baseball->baseball curling->curling cricket->cricket 
 chess->chess bocce->bocce 43-man-squamish->43-man-squamish}

namely a xet of symbols naming six sports.

Another useful abbreviation covers the situation where the n pairs of a finite xapping are integers, collectively covering a range from zero to n-1. This kind of xapping is called a xector and may be notated by writing the values between brackets in ascending order of their indices. Thus

[tinker evers chance]

is merely an abbreviation for

{tinker->0 evers->1 chance->2}

There are two kinds of infinite xapping: constant and universal. A constant xapping {->z} maps every object to the same value z. The universal xapping {->} maps every object to itself and is therefore the xet of all Lisp objects, sometimes called simply the universe. Both kinds of infinite xet may be modified by explicitly writing exceptions. One kind of exception is simply a pair, which specifies the value for a particular index; the other kind of exception is simply k->indicating that the xapping does not have a pair with index k after all. Thus the notation

{sky->blue grass->green idea->glass->->red}

indicates a xapping that maps sky to blue, grass to green, and every other object except idea and glass to red. Note well that the presence or absence of whitespace on either side of an arrow is crucial to the correct interpretation of the notation.

Here is the representation of a xapping as a structure:

  (xapping (:print-function print-xapping) 
           (:constructor xap 
             (domain range &optional 
              (default ':unknown defaultp) 
              (infinite (and defaultp :constant)) 
              (exceptions '())))) 
  (infinite nil :type (member nil :constant :universal) 

The explicit pairs are represented as two parallel lists, one of indexes (domain) and one of values (range). The default slot is the default value, relevant only if the infinite slot is :constant. The exceptions slot is a list of indices for which there are no values. (See the end of section 22.3.3 for the definition of print-xapping.)

Here, then, is the code for reading xectors in bracket notation:

(defun open-bracket-macro-char (stream macro-char) 
  (declare (ignore macro-char)) 
  (let ((range (read-delimited-list #\] stream t))) 
    (xap (iota-list (length range)) range))) 

(set-macro-character #\[ #'open-bracket-macro-char) 
(set-macro-character #\] (get-macro-character #\) )) 

(defun iota-list (n)     ;Return list of integers from 0 to n-1
  (do ((j (- n 1) (- j 1)) 
       (z '() (cons j z))) 
      ((< j 0) z)))

The code for reading xappings in the more general brace notation, with all the possibilities for xets (or individual xet pairs), infinite xappings, and exceptions, is a bit more complicated; it is shown in table 22-5. That code is used in conjunction with the initializations

(set-macro-character #\{ #'open-brace-macro-char) 
(set-macro-character #\} (get-macro-character #\) ))

Table 22-5: Macro Character Definition for Xapping Syntax

(defun open-brace-macro-char (s macro-char) 
  (declare (ignore macro-char)) 
  (do ((ch (peek-char t s t nil t) (peek-char t s t nil t)) 
       (domain '())  (range '())  (exceptions '())) 
      ((char= ch #\}) 
       (read-char s t nil t) 
       (construct-xapping (reverse domain) (reverse range))) 
    (cond ((char= ch #\->) 
           (read-char s t nil t) 
           (let ((nextch (peek-char nil s t nil t))) 
             (cond ((char= nextch #\}) 
                    (read-char s t nil t) 
                    (return (xap (reverse domain) 
                                 (reverse range) 
                                 nil :universal exceptions))) 
                   (t (let ((item (read s t nil t))) 
                        (cond ((char= (peek-char t s t nil t) #\}) 
                               (read-char s t nil t) 
                               (return (xap (reverse domain) 
                                            (reverse range) 
                                            item :constant 
                              (t (reader-error s 
                                   "Default -> item must be last")))))))) 
          (t (let ((item (read-preserving-whitespace s t nil t)) 
                   (nextch (peek-char nil s t nil t))) 
               (cond ((char= nextch #\->) 
                      (read-char s t nil t) 
                      (cond ((member (peek-char nil s t nil t) 
                                     '(#\Space #\Tab #\Newline)) 
                             (push item exceptions)) 
                            (t (push item domain) 
                               (push (read s t nil t) range)))) 
                     ((char= nch #\}) 
                      (read-char s t nil t) 
                      (push item domain) 
                      (push item range) 
                      (return (xap (reverse domain) (reverse range)))) 
                     (t (push item domain) 
                        (push item range))))))))




make-dispatch-macro-character char 
      &optional non-terminating-p readtable
This causes the character char to be a dispatching macro character in readtable (which defaults to the current readtable). If non-terminating-p is not nil (it defaults to nil), then it will be a non-terminating macro character: it may be embedded within extended tokens. make-dispatch-macro-character returns t.

Initially every character in the dispatch table has a character-macro function that signals an error. Use set-dispatch-macro-character to define entries in the dispatch table.

X3J13 voted in January 1989 (ARGUMENTS-UNDERSPECIFIED)   to clarify that char must be a character.


set-dispatch-macro-character disp-char sub-char function 
    &optional readtable 
get-dispatch-macro-character disp-char sub-char 
    &optional readtable

set-dispatch-macro-character causes function to be called when the disp-char followed by sub-char is read. The readtable defaults to the current readtable. The arguments and return values for function are the same as for normal macro characters except that function gets sub-char, not disp-char, as its second argument and also receives a third argument that is the non-negative integer whose decimal representation appeared between disp-char and sub-char, or nil if no decimal integer appeared there.

The sub-char may not be one of the ten decimal digits; they are always reserved for specifying an infix integer argument. Moreover, if sub-char is a lowercase character (see lower-case-p), its uppercase equivalent is used instead. (This is how the rule is enforced that the case of a dispatch sub-character doesn't matter.)

set-dispatch-macro-character returns t.

get-dispatch-macro-character returns the macro-character function for sub-char under disp-char, or nil if there is no function associated with sub-char.

If the sub-char is one of the ten decimal digits 0 1 2 3 4 5 6 7 8 9, get-dispatch-macro-character always returns nil. If sub-char is a lowercase character, its uppercase equivalent is used instead.

X3J13 voted in January 1989 (GET-MACRO-CHARACTER-READTABLE)   to specify that if nil is explicitly passed as the second argument to get-dispatch-macro-character, then the standard readtable is used. This is consistent with the behavior of copy-readtable.

For either function, an error is signaled if the specified disp-char is not in fact a dispatch character in the specified readtable. It is necessary to use make-dispatch-macro-character to set up the dispatch character before specifying its sub-characters.

As an example, suppose one would like #$foo to be read as if it were (dollars foo). One might say:

(defun |#$-reader| (stream subchar arg) 
  (declare (ignore subchar arg)) 
  (list 'dollars (read stream t nil t))) 

(set-dispatch-macro-character #\# #\$ #'|#$-reader|)

Compatibility note: This macro-character mechanism is different from those in MacLisp, Interlisp, and Lisp Machine Lisp. Recently Lisp systems have implemented very general readers, even readers so programmable that they can parse arbitrary compiled BNF grammars. Unfortunately, these readers can be complicated to use. This design is an attempt to make the reader as simple as possible to understand, use, and implement. Splicing macros have been eliminated; a recent informal poll indicates that no one uses them to produce other than zero or one value. The ability to access parts of the object preceding the macro character has been eliminated. The MacLisp single-character-object feature has been eliminated because it is seldom used and trivially obtainable by defining a macro.

The user is encouraged to turn off most macro characters, turn others into single-character-object macros, and then use read purely as a lexical analyzer on top of which to build a parser. It is unnecessary, however, to cater to more complex lexical analysis or parsing than that needed for Common Lisp.


readtable-case readtable

X3J13 voted in June 1989 (READ-CASE-SENSITIVITY)   to introduce the function readtable-case to control the reader's interpretation of case. It provides access to a slot in a readtable, and may be used with setf to alter the state of that slot. The possible values for the slot are :upcase, :downcase, :preserve, and :invert; the readtable-case for the standard readtable is :upcase. Note that copy-readtable is required to copy the readtable-case slot along with all other readtable information.

Once the reader has accumulated a token as described in section 22.1.1, if the token is a symbol, ``replaceable'' characters (unescaped uppercase or lowercase constituent characters) may be modified under the control of the readtable-case of the current readtable:

As an illustration, consider the following code.

(let ((*readtable* (copy-readtable nil))) 
  (format t "READTABLE-CASE  Input   Symbol-name~ 
  (dolist (readtable-case '(:upcase :downcase :preserve :invert)) 
    (setf (readtable-case *readtable*) readtable-case) 
    (dolist (input '("ZEBRA" "Zebra" "zebra")) 
      (format t ":~A~16T~A~24T~A~%" 
                (string-upcase readtable-case) 
                (symbol-name (read-from-string input)))))))

The output from this test code should be

READTABLE-CASE  Input   Symbol-name 
:UPCASE         Zebra   ZEBRA 
:UPCASE         zebra   ZEBRA 
:DOWNCASE       ZEBRA   zebra 
:DOWNCASE       Zebra   zebra 
:DOWNCASE       zebra   zebra 
:PRESERVE       Zebra   Zebra 
:PRESERVE       zebra   zebra 
:INVERT         ZEBRA   zebra 
:INVERT         Zebra   Zebra 
:INVERT         zebra   ZEBRA

The readtable-case of the current readtable also affects the printing of symbols (see *print-case* and *print-escape*).

next up previous contents index
Next: What the Print Up: Printed Representation of Previous: Standard Dispatching Macro