22.2.1. Input from Character Streams

Common Lisp the Language, 2nd Edition

Next: Input from Binary Up: Input Functions Previous: Input Functions

22.2.1. Input from Character Streams

Many character input functions take optional arguments called input-stream, eof-error-p, and eof-value. The input-stream argument is the stream from which to obtain input; if unsupplied or nil it defaults to the value of the special variable *standard-input*. One may also specify t as a stream, meaning the value of the special variable *terminal-io*.

The eof-error-p argument controls what happens if input is from a file (or any other input source that has a definite end) and the end of the file is reached. If eof-error-p is true (the default), an error will be signaled at end of file. If it is false, then no error is signaled, and instead the function returns eof-value.

change_begin
X3J13 voted in January 1989 (ARGUMENTS-UNDERSPECIFIED) to clarify that an eof-value argument may be any Lisp datum whatsoever.
change_end

Functions such as read that read the representation of an object rather than a single character will always signal an error, regardless of eof-error-p, if the file ends in the middle of an object representation. For example, if a file does not contain enough right parentheses to balance the left parentheses in it, read will complain. If a file ends in a symbol or a number immediately followed by end-of-file, read will read the symbol or number successfully and when called again will see the end-of-file and only then act according to eof-error-p. Similarly, the function read-line will successfully read the last line of a file even if that line is terminated by end-of-file rather than the newline character. If a file contains ignorable text at the end, such as blank lines and comments, read will not consider it to end in the middle of an object. Thus an eof-error-p argument controls what happens when the file ends between objects.

Many input functions also take an argument called recursive-p. If specified and not nil, this argument specifies that this call is not a ``top-level'' call to read but an imbedded call, typically from the function for a macro character. It is important to distinguish such recursive calls for three reasons.

First, a top-level call establishes the context within which the #n= and #n# syntax is scoped. Consider, for example, the expression

(cons '#3=(p q r) '(x y . #3#))

If the single-quote macro character were defined in this way:

(set-macro-character #\' 
                     #'(lambda (stream char) 
                         (declare (ignore char)) 
                         (list 'quote (read stream))))

then the expression could not be read properly, because there would be no way to know when read is called recursively by the first occurrence of ' that the label #3= would be referred to later in the containing expression. There would be no way to know because read could not determine that it was called by a macro-character function rather than from ``top level.'' The correct way to define the single quote macro character uses the recursive-p argument:

(set-macro-character #\' 
                     #'(lambda (stream char) 
                         (declare (ignore char)) 
                         (list 'quote (read stream t nil t))))

Second, a recursive call does not alter whether the reading process is to preserve whitespace or not (as determined by whether the top-level call was to read or read-preserving-whitespace). Suppose again that the single quote had the first, incorrect, macro-character definition shown above. Then a call to read-preserving-whitespace that read the expression 'foo would fail to preserve the space character following the symbol foo because the single-quote macro-character function calls read, not read-preserving-whitespace, to read the following expression (in this case foo). The correct definition, which passes the value t for the recursive-p argument to read, allows the top-level call to determine whether whitespace is preserved.

Third, when end-of-file is encountered and the eof-error-p argument is not nil, the kind of error that is signaled may depend on the value of recursive-p. If recursive-p is not nil, then the end-of-file is deemed to have occurred within the middle of a printed representation; if recursive-p is nil, then the end-of-file may be deemed to have occurred between objects rather than within the middle of one.

[Function]
read &optional input-stream eof-error-p eof-value recursive-p

read reads in the printed representation of a Lisp object from input-stream, builds a corresponding Lisp object, and returns the object.

Note that when the variable *read-suppress* is not nil, then read reads in a printed representation as best it can, but most of the work of interpreting the representation is avoided (the intent being that the result is to be discarded anyway). For example, all extended tokens produce the result nil regardless of their syntax.

[Variable]
*read-default-float-format*

The value of this variable must be a type specifier symbol for a specific floating-point format; these include short-float, single-float, double-float, and long-float and may include implementation-specific types as well. The default value is single-float.

*read-default-float-format* indicates the floating-point format to be used for reading floating-point numbers that have no exponent marker or have e or E for an exponent marker. (Other exponent markers explicitly prescribe the floating-point format to be used.) The printer also uses this variable to guide the choice of exponent markers when printing floating-point numbers.

[Function]
read-preserving-whitespace &optional in-stream eof-error-p eof-value recursive-p

Certain printed representations given to read, notably those of symbols and numbers, require a delimiting character after them. (Lists do not, because the close parenthesis marks the end of the list.) Normally read will throw away the delimiting character if it is a whitespace character; but read will preserve the character (using unread-char) if it is syntactically meaningful, because it may be the start of the next expression.

change_begin
X3J13 voted in January 1989 (PEEK-CHAR-READ-CHAR-ECHO) to clarify the interaction of unread-char with echo streams. These changes indirectly affect the echoing behavior of read-preserving-whitespace.
change_end

The function read-preserving-whitespace is provided for some specialized situations where it is desirable to determine precisely what character terminated the extended token.

As an example, consider this macro-character definition:

(defun slash-reader (stream char) 
  (declare (ignore char)) 
  (do ((path (list (read-preserving-whitespace stream)) 
             (cons (progn (read-char stream nil nil t) 
                          (read-preserving-whitespace 
                             stream)) 
                   path))) 
      ((not (char= (peek-char nil stream nil nil t) #\/)) 
       (cons 'path (nreverse path))))) 
(set-macro-character #\/ #'slash-reader)

(This is actually a rather dangerous definition to make because expressions such as (/ x 3) will no longer be read properly. The ability to reprogram the reader syntax is very powerful and must be used with caution. This redefinition of / is shown here purely for the sake of example.)

Consider now calling read on this expression:

(zyedh /usr/games/zork /usr/games/boggle)

The / macro reads objects separated by more / characters; thus /usr/games/zork is intended to be read as (path usr games zork). The entire example expression should therefore be read as

(zyedh (path usr games zork) (path usr games boggle))

However, if read had been used instead of read-preserving-whitespace, then after the reading of the symbol zork, the following space would be discarded; the next call to peek-char would see the following /, and the loop would continue, producing this interpretation:

(zyedh (path usr games zork usr games boggle))

On the other hand, there are times when whitespace should be discarded. If a command interpreter takes single-character commands, but occasionally reads a Lisp object, then if the whitespace after a symbol is not discarded it might be interpreted as a command some time later after the symbol had been read.

Note that read-preserving-whitespace behaves exactly like read when the recursive-p argument is not nil. The distinction is established only by calls with recursive-p equal to nil or omitted.

[Function]
read-delimited-list char &optional input-stream recursive-p

This reads objects from stream until the next character after an object's representation (ignoring whitespace characters and comments) is char. (The char should not have whitespace syntax in the current readtable.) A list of the objects read is returned.

To be more precise, read-delimited-list looks ahead at each step for the next non-whitespace character and peeks at it as if with peek-char. If it is char, then the character is consumed and the list of objects is returned. If it is a constituent or escape character, then read is used to read an object, which is added to the end of the list. If it is a macro character, the associated macro function is called; if the function returns a value, that value is added to the list. The peek-ahead process is then repeated.

change_begin
X3J13 voted in January 1989 (PEEK-CHAR-READ-CHAR-ECHO) to clarify the interaction of peek-char with echo streams. These changes indirectly affect the echoing behavior of the function read-delimited-list.
change_end

This function is particularly useful for defining new macro characters. Usually it is desirable for the terminating character char to be a terminating macro character so that it may be used to delimit tokens; however, read-delimited-list makes no attempt to alter the syntax specified for char by the current readtable. The user must make any necessary changes to the readtable syntax explicitly. The following example illustrates this.

Suppose you wanted #{a b c ... z} to be read as a list of all pairs of the elements a, b, c, ..., z; for example:

#{p q z a}  reads as  ((p q) (p z) (p a) (q z) (q a) (z a))

This can be done by specifying a macro-character definition for #{ that does two things: read in all the items up to the }, and construct the pairs. read-delimited-list performs the first task.

change_begin
Note that mapcon allows the mapped function to examine the items of the list after the current one, and that mapcon uses nconc, which is all right because mapcar will produce fresh lists.
change_end

(defun |#{-reader| (stream char arg) 
  (declare (ignore char arg)) 
  (mapcon #'(lambda (x) 
              (mapcar #'(lambda (y) (list (car x) y)) (cdr x))) 
          (read-delimited-list #\} stream t))) 

(set-dispatch-macro-character #\# #\{ #'|#{-reader|) 

(set-macro-character #\} (get-macro-character #\) nil))

(Note that t is specified for the recursive-p argument.)

It is necessary here to give a definition to the character } as well to prevent it from being a constituent. If the line

(set-macro-character #\} (get-macro-character #\) nil))

shown above were not included, then the } in

#{p q z a}

would be considered a constituent character, part of the symbol named a}. One could correct for this by putting a space before the }, but it is better simply to use the call to set-macro-character.

Giving } the same definition as the standard definition of the character ) has the twin benefit of making it terminate tokens for use with read-delimited-list and also making it illegal for use in any other context (that is, attempting to read a stray } will signal an error).

Note that read-delimited-list does not take an eof-error-p (or eof-value) argument. The reason is that it is always an error to hit end-of-file during the operation of read-delimited-list.

[Function]
read-line &optional input-stream eof-error-p eof-value recursive-p

read-line reads in a line of text terminated by a newline. It returns the line as a character string (without the newline character). This function is usually used to get a line of input from the user. A second returned value is a flag that is false if the line was terminated normally, or true if end-of-file terminated the (non-empty) line. If end-of-file is encountered immediately (that is, appears to terminate an empty line), then end-of-file processing is controlled in the usual way by the eof-error-p, eof-value, and recursive-p arguments.

The corresponding output function is write-line.

[Function]
read-char &optional input-stream eof-error-p eof-value recursive-p

read-char inputs one character from input-stream and returns it as a character object.

The corresponding output function is write-char.

change_begin
X3J13 voted in January 1989 (PEEK-CHAR-READ-CHAR-ECHO) to clarify the interaction of read-char with echo streams (as created by make-echo-stream). A character is echoed from the input stream to the associated output stream the first time it is seen. If a character is read again because of an intervening unread-char operation, the character is not echoed again when read for the second time or any subsequent time.
change_end

[Function]
unread-char character &optional input-stream

unread-char puts the character onto the front of input-stream. The character must be the same character that was most recently read from the input-stream. The input-stream ``backs up'' over this character; when a character is next read from input-stream, it will be the specified character followed by the previous contents of input-stream. unread-char returns nil.

One may apply unread-char only to the character most recently read from input-stream. Moreover, one may not invoke unread-char twice consecutively without an intervening read-char operation. The result is that one may back up only by one character, and one may not insert any characters into the input stream that were not already there.

change_begin
X3J13 voted in January 1989 (UNREAD-CHAR-AFTER-PEEK-CHAR) to clarify that one also may not invoke unread-char after invoking peek-char without an intervening read-char operation. This is consistent with the notion that peek-char behaves much like read-char followed by unread-char.
change_end

Rationale: This is not intended to be a general mechanism, but rather an efficient mechanism for allowing the Lisp reader and other parsers to perform one-character lookahead in the input stream. This protocol admits a wide variety of efficient implementations, such as simply decrementing a buffer pointer. To have to specify the character in the call to unread-char is admittedly redundant, since at any given time there is only one character that may be legally specified. The redundancy is intentional, again to give the implementation latitude.

change_begin
X3J13 voted in January 1989 (PEEK-CHAR-READ-CHAR-ECHO) to clarify the interaction of unread-char with echo streams (as created by make-echo-stream). When a character is ``unread'' from an echo stream, no attempt is made to ``unecho'' the character. However, a character placed back into an echo stream by unread-char will not be re-echoed when it is subsequently re-read by read-char.
change_end

[Function]
peek-char &optional peek-type input-stream eof-error-p eof-value recursive-p

What peek-char does depends on the peek-type, which defaults to nil. With a peek-type of nil, peek-char returns the next character to be read from input-stream, without actually removing it from the input stream. The next time input is done from input-stream, the character will still be there. It is as if one had called read-char and then unread-char in succession.

If peek-type is t, then peek-char skips over whitespace characters (but not comments) and then performs the peeking operation on the next character. This is useful for finding the (possible) beginning of the next printed representation of a Lisp object. The last character examined (the one that starts an object) is not removed from the input stream.

If peek-type is a character object, then peek-char skips over input characters until a character that is char= to that object is found; that character is left in the input stream.

change_begin
X3J13 voted in January 1989 (PEEK-CHAR-READ-CHAR-ECHO) to clarify the interaction of peek-char with echo streams (as created by make-echo-stream). When a character from an echo stream is only peeked at, it is not echoed at that time. The character remains in the input stream and may be echoed when read by read-char at a later time. Note, however, that if the peek-type is not nil, then characters skipped over (and therefore consumed) by peek-char are treated as if they had been read by read-char, and will be echoed if read-char would have echoed them.
change_end

[Function]
listen &optional input-stream

The predicate listen is true if there is a character immediately available from input-stream, and is false if not. This is particularly useful when the stream obtains characters from an interactive device such as a keyboard. A call to read-char would simply wait until a character was available, but listen can sense whether or not input is available and allow the program to decide whether or not to attempt input. On a non-interactive stream, the general rule is that listen is true except when at end-of-file.

[Function]
read-char-no-hang &optional input-stream eof-error-p eof-value recursive-p

This function is exactly like read-char, except that if it would be necessary to wait in order to get a character (as from a keyboard), nil is immediately returned without waiting. This allows one to efficiently check for input availability and get the input if it is available. This is different from the listen operation in two ways. First, read-char-no-hang potentially reads a character, whereas listen never inputs a character. Second, listen does not distinguish between end-of-file and no input being available, whereas read-char-no-hang does make that distinction, returning eof-value at end-of-file (or signaling an error if no eof-error-p is true) but always returning nil if no input is available.

[Function]
clear-input &optional input-stream

This clears any buffered input associated with input-stream. It is primarily useful for clearing type-ahead from keyboards when some kind of asynchronous error has occurred. If this operation doesn't make sense for the stream involved, then clear-input does nothing. clear-input returns nil.

[Function]
read-from-string string &optional eof-error-p eof-value &key :start :end :preserve-whitespace

The characters of string are given successively to the Lisp reader, and the Lisp object built by the reader is returned. Macro characters and so on will all take effect.

The arguments :start and :end delimit a substring of string beginning at the character indexed by :start and up to but not including the character indexed by :end. By default :start is 0 (the beginning of the string) and :end is (length string). This is the same as for other string functions.

The flag :preserve-whitespace, if provided and not nil, indicates that the operation should preserve whitespace as for read-preserving-whitespace. It defaults to nil.

As with other reading functions, the arguments eof-error-p and eof-value control the action if the end of the (sub)string is reached before the operation is completed; reaching the end of the string is treated as any other end-of-file event.

read-from-string returns two values: the first is the object read, and the second is the index of the first character in the string not read. If the entire string was read, the second result will be either the length of the string or one greater than the length of the string. The parameter :preserve-whitespace may affect this second value.

(read-from-string "(a b c)") => (a b c) and 7

[Function]
parse-integer string &key :start :end :radix :junk-allowed

This function examines the substring of string delimited by :start and :end (which default to the beginning and end of the string). It skips over whitespace characters and then attempts to parse an integer. The :radix parameter defaults to 10 and must be an integer between 2 and 36.

If :junk-allowed is not nil, then the first value returned is the value of the number parsed as an integer or nil if no syntactically correct integer was seen.

If :junk-allowed is nil (the default), then the entire substring is scanned. The returned value is the value of the number parsed as an integer. An error is signaled if the substring does not consist entirely of the representation of an integer, possibly surrounded on either side by whitespace characters.

In either case, the second value is the index into the string of the delimiter that terminated the parse, or it is the index beyond the substring if the parse terminated at the end of the substring (as will always be the case if :junk-allowed is false).

Note that parse-integer does not recognize the syntactic radix-specifier prefixes #O, #B, #X, and #nR, nor does it recognize a trailing decimal point. It permits only an optional sign (+ or -) followed by a non-empty sequence of digits in the specified radix.

Next: Input from Binary Up: Input Functions Previous: Input Functions

AI.Repository@cs.cmu.edu