Common Lisp the Language, 2nd Edition

next up previous contents index
Next: Character Attributes Up: Common Lisp the Language Previous: Implementation Parameters

13. Characters

Common Lisp provides a character data type; objects of this type represent printed symbols such as letters.

In general, characters in Common Lisp are not true objects; eq cannot be counted upon to operate on them reliably. In particular, it is possible that the expression

(let ((x z) (y z)) (eq x y))

may be false rather than true, if the value of z is a character.

Rationale: This odd breakdown of eq in the case of characters allows the implementor enough design freedom to produce exceptionally efficient code on conventional architectures. In this respect the treatment of characters exactly parallels that of numbers, as described in chapter 12.


------------------------------------------------------------------------------ Table 13-1: Standard Character Labels, Glyphs, and Descriptions SM05 @ commercial at SD13 ` grave accent SP02 ! exclamation mark LA02 A capital A LA01 a small a SP04 " quotation mark LB02 B capital B LB01 b small b SM01 # number sign LC02 C capital C LC01 c small c SC03 $ dollar sign LD02 D capital D LD01 d small d SM02 % percent sign LE02 E capital E LE01 e small e SM03 & ampersand LF02 F capital F LF01 f small f SP05 ' apostrophe LG02 G capital G LG01 g small g SP06 ( left parenthesis LH02 H capital H LH01 h small h SP07 ) right parenthesis LI02 I capital I LI01 i small i SM04 * asterisk LJ02 J capital J LJ01 j small j SA01 + plus sign LK02 K capital K LK01 k small k SP08 , comma LL02 L capital L LL01 l small l SP10 - hyphen or minus sign LM02 M capital M LM01 m small m SP11 . period or full stop LN02 N capital N LN01 n small n SP12 / solidus LO02 O capital O LO01 o small o ND10 0 digit 0 LP02 P capital P LP01 p small p ND01 1 digit 1 LQ02 Q capital Q LQ01 q small q ND02 2 digit 2 LR02 R capital R LR01 r small r ND03 3 digit 3 LS02 S capital S LS01 s small s ND04 4 digit 4 LT02 T capital T LT01 t small t ND05 5 digit 5 LU02 U capital U LU01 u small u ND06 6 digit 6 LV02 V capital V LV01 v small v ND07 7 digit 7 LW02 W capital W LW01 w small w ND08 8 digit 8 LX02 X capital X LX01 x small x ND09 9 digit 9 LY02 Y capital Y LY01 y small y SP13 : colon LZ02 Z capital Z LZ01 z small z SP14 ; semicolon SM06 [ left square bracket SM11 { left curly bracket SA03 < less-than sign SM07 \ reverse solidus SM13 | vertical bar SA04 = equals sign SM08 ] right square bracket SM14 } right curly bracket SA05 > greater-than sign SD15 ^ circumflex accent SD19 ~ tilde SP15 ? question mark SP09 _ low line ------------------------------------------------------------------------------

If two objects are to be compared for ``identity,'' but either might be a character, then the predicate eql is probably appropriate.

X3J13 voted in March 1989 (CHARACTER-PROPOSAL)   to approve the following definitions and terminology for use in discussing character facilities in Common Lisp.

A character repertoire defines a collection of characters independent of their specific rendered image or font. (This corresponds to the mathematical notion of a set, but the term character set is avoided here because it has been used in the past to mean both what is here called a repertoire and what is here called a coded character set.) Character repertoires are specified independent of coding and their characters are identified only with a unique character label, a graphic symbol, and a character description. As an example, table 13-1 shows the character labels, graphic symbols, and character descriptions for all of the characters in the repertoire standard-char except for #\Space and #\Newline.

Every Common Lisp implementation must support the standard character repertoire as well as repertoires named base-character, extended-character, and character. Other repertoires may be supported as well. X3J13 voted in June 1989 (MORE-CHARACTER-PROPOSAL)   to specify that names of repertoires may be used as type specifiers. Such types must be subtypes of character; that is, in a given implementation the repertoire named character must encompass all the character objects supported by that implementation.

A coded character set is a character repertoire plus an encoding that provides a bijective mapping between each character in the set and a number (typically a non-negative integer) that serves as the character representation. There are numerous internationally standardized coded character sets.

A character may be included in one or more character repertoires. Similarly, a character may be included in one or more coded character sets.

To ensure that each character is uniquely defined, we may use a universal registry of characters that incorporates a collection of distinguished repertoires called character scripts that form an exhaustive partition of all characters. That is, each character is included in exactly one character script. (Draft ISO 10646 Coded Character Set Standard, if eventually approved as a standard, may become the practical realization of this universal registry.)

(X3J13 voted in June 1989 (MORE-CHARACTER-PROPOSAL)   to specify that an implementation must document the character scripts it supports. For each script the documentation should discuss character labels, glyphs, and descriptions; any canonicalization processes performed by the reader that result in treating distinct characters as equivalent; any canonicalization performed by format in processing directives; the behavior of char-upcase, char-downcase, and the predicates alpha-char-p, upper-case-p, lower-case-p, both-case-p, graphic-char-p, alphanumericp, char-equal, char-not-equal, char-lessp, char-greaterp, char-not-greaterp, and char-not-lessp for characters in the script; and behavior with respect to input and output, including coded character sets and external coding schemes.)

In Common Lisp a character data object is identified by its character code, a unique numerical code. Each character code is composed from a character script and a character label. The convention by which a character script and character label compose a character code is implementation dependent. [X3J13 did not approve all parts of the proposal from its Subcommittee on Characters. As a result, some features that were approved appear to have no purpose. X3J13 wished to support the standardization by ISO of character scripts and coded character sets but declined to design facilities for use in Common Lisp until there has been more progress by ISO in this area. The approval of the terminology for scripts and labels gives a hint to implementors of likely directions for Common Lisp in the future.]

A character object that is classified as graphic, or displayable, has an associated glpyh. The glyph is the visual representation of the character. All other character data objects are classified as non-graphic.

This terminology assigns names to Common Lisp concepts in a manner consistent with related concepts discussed in various ISO standards for coded character sets and provides a demarcation between standardization activities. For example, facilities for manipulating characters, character scripts, and coded character sets are properly defined by a Common Lisp standard, but Common Lisp should not define standard character sets or standard character scripts.

next up previous contents index
Next: Character Attributes Up: Common Lisp the Language Previous: Implementation Parameters