Common Lisp the Language, 2nd Edition

next up previous contents index
Next: Non-standard Characters Up: Characters Previous: Standard Characters

2.2.2. Line Divisions

The treatment of line divisions is one of the most difficult issues in designing portable software, simply because there is so little agreement among operating systems. Some use a single character to delimit lines; the recommended ASCII character for this purpose is the line feed character LF (also called the new line character, NL), but some systems use the carriage return character CR. Much more common is the two-character sequence CR followed by LF. Frequently line divisions have no representation as a character but are implicit in the structuring of a file into records, each record containing a line of text. A deck of punched cards has this structure, for example.

Common Lisp provides an abstract interface by requiring that there be a single character, #\Newline, that within the language serves as a line delimiter. (The language C has a similar requirement.) An implementation of Common Lisp must translate between this internal single-character representation and whatever external representation(s) may be used.

Implementation note: How the character called #\Newline is represented internally is not specified here, but it is strongly suggested that the ASCII LF character be used in Common Lisp implementations that use the ASCII character encoding. The ASCII CR character is a workable, but in most cases inferior, alternative.

When the first edition was written it was not yet clear that UNIX would become so widely accepted. The decision to represent the line delimiter as a single character has proved to be a good one.

The requirement that a line division be represented as a single character has certain consequences. A character string written in the middle of a program in such a way as to span more than one line must contain exactly one character to represent each line division. Consider this code fragment:

(setq a-string "This string 
forty-two characters.")

Between g and c there must be exactly one character, #\Newline; a two-character sequence, such as #\Return and then #\Newline, is not acceptable, nor is the absence of a character. The same is true between s and f.

When the character #\Newline is written to an output file, the Common Lisp implementation must take the appropriate action to produce a line division. This might involve writing out a record or translating #\Newline to a CR/LF sequence.

Implementation note: If an implementation uses the ASCII character encoding, uses the CR/LF sequence externally to delimit lines, uses LF to represent #\Newline internally, and supports #\Return as a data object corresponding to the ASCII character CR, the question arises as to what action to take when the program writes out #\Return followed by #\Newline. It should first be noted that #\Return is not a standard Common Lisp character, and the action to be taken when #\Return is written out is therefore not defined by the Common Lisp language. A plausible approach is to buffer the #\Return character and suppress it if and only if the next character is #\Newline (the net effect is to generate a CR/LF sequence). Another plausible approach is simply to ignore the difficulty and declare that writing #\Return and then #\Newline results in the sequence CR/CR/LF in the output.

next up previous contents index
Next: Non-standard Characters Up: Characters Previous: Standard Characters