CLHS: Issue CHAR-NAME-CASE Writeup

Issue CHAR-NAME-CASE Writeup

Issue: CHAR-NAME-CASE

Forum: Editorial

References: CHAR-NAME (p242), NAME-CHAR (p243), #\ (p353)

Category: Clarification

Edit history: 01-Mar-91, Version 1 by Pitman

08-Apr-91, Version 2 by Pitman

Status: X3J13 passed option X3J13-MAR-91 on a 9-3 vote, March 1991.

Problem Description:

In what case is a char name? Specifically,

1. Which of (NAME-CHAR "SPACE"), (NAME-CHAR "Space"), (NAME-CHAR "space")

is a legitimate way to name the Space character? [CLtL seems to

make this clear, but then later says things that make one wonder.]

2. Does (CHAR-NAME #\Space) yield "Space", "SPACE, or "space" ?

CLtL, p353, says:

#\name reads in a character object whose name is name

(actually, whose name is (string-upcase name); therefore the

syntax is case insensitive).

This would seem to imply that (CHAR-NAME #\Space) => "SPACE"

and further that only because of the syntax does lookup of

#\Space succeed, which would imply that (NAME-CHAR "Space")

should fail.

CLtL, p243 (for NAME-CHAR), says:

If the name is the same as the name of a character object (as

determined by STRING-EQUAL), that object is returned.

CLtL, p242 (for CHAR-NAME), says:

Graphic characters may or may not have names.

This would seem to `license' an implementation to give names to chars

like "A" and "$" if they wanted. But that can't really work in

general because "A" and "a" if compared by STRING-EQUAL will seem

the same, when in fact two different characters are almost surely

implied.

Proposal (CHAR-NAME-CASE:X3J13-MAR-91):

Specify that all character names are two or more characters long.

Specify that CHAR-NAME returns names of characters in the case

mentioned in the description of CHAR-NAME (i.e., "Tab", "Page",

"Rubout", "Linefeed", "Return", "Backspace", "Newline", "Space").

Proposal (CHAR-NAME-CASE:ANY-IN-CAPITALIZE-OUT):

1. Define that NAME-CHAR is case-sensistive for names one character long,

but case-insensitive for longer names.

2. Define that if CHAR-NAME returns a string, that string is either

a long name (a string of length greater than one) in

capitalized (i.e., uppercase initial) format

OR a short name (a string one character long) which contains only

one character, which has the same character code as the argument

to CHAR-NAME.

Test case:

1. a. (NAME-CHAR "SPACE") => #\Space

b. (NAME-CHAR "Space") => #\Space

c. (NAME-CHAR "space") => #\Space

d. (MEMBER (NAME-CHAR "A") '(NIL #\A)) => true

e. (MEMBER (NAME-CHAR "a") '(NIL #\a)) => true

2. a. (EQUAL (CHAR-NAME #\Space) "Space") => true

b. (MEMBER (CHAR-NAME #\a) '(NIL "a") :test #'EQUAL) => true

c. (MEMBER (CHAR-NAME #\A) '(NIL "A") :test #'EQUAL) => true

Rationale:

A primary purpose of these operators is to support input

(e.g., the #\ reader macro) and and output (e.g., PRIN1 and

FORMAT's ~@C) of characters.

1. It would be confusing for "Space" and "SPACE" to map to different

characters, and just plain frustrating for one to be valid

while the other was useless.

2. It is most useful for this to return a known case so that users

who care about the presentation are saved from calling STRING-UPCASE,

STRING-CAPITALIZE, or STRING-DOWNCASE even when the result might

already come back that way. The most common thing to do with a name

is to print it (e.g., when printing a character), so capitalization

makes the most sense.

Proposal (CHAR-NAME-CASE:CASE-SENSITIVE):

1. Define that NAME-CHAR is case-sensitive.

2. Define that CHAR-NAME returns the names "Space" and "Newline"

(and "Rubout", "Page", "Tab", "Backspace", "Return", and "Linefeed"

if they are supported by the implementation) in uppercase initial.

3. Define that #\ is case-sensitive.

4. Define that the names "SPACE" and "space" are synonyms for "Space".

Define that the names "NEWLINE" and "newline" are synonyms for "Newline".

And likewise for "Rubout", "Page", "Tab", "Backspace", "Return",

and "Linefeed" if they are supported.

Test case:

1. a. (NAME-CHAR "SPACE") => #\Space

b. (NAME-CHAR "Space") => NIL

c. (NAME-CHAR "Space") => NIL

d. (MEMBER (NAME-CHAR "A") '(NIL #\A)) => true

e. (MEMBER (NAME-CHAR "a") '(NIL #\a)) => true

2. a. (EQUAL (CHAR-NAME #\Space) "SPACE") => true

b. (MEMBER (CHAR-NAME #\a) '(NIL "a") :test #'EQUAL) => true

c. (MEMBER (CHAR-NAME #\A) '(NIL "A") :test #'EQUAL) => true

Rationale:

1. This leaves flexibility for other languages or notations

which have long names for glyphs that make some kind of case.

e.g.,

(NAME-CHAR "ALPHA")

might want to be distinguished from

(NAME-CHAR "alpha")

in Greek, with the former representing an uppercase character

and the latter representing a lowercase character.

The choice of uppercase initial means that when printing #\xxx,

no case conversion is needed. Saving the time and in some

implementations the space required to do case conversion.

2. This assures that (NAME-CHAR (CHAR-NAME x)) => x,

when (CHAR-NAME x) is not NIL.

3. This is necessary in order for #\ALPHA and #\alpha to be

distinguishable.

4. This keeps compatibility concerns for existing usages of

#\space, #\SPACE, #\newline, and #\NEWLINE from becoming

the overwhelming issue.

Current Practice:

Symbolics Genera implements ANY-IN-CAPITALIZE-OUT.

Cost to Implementors:

Both proposals are probably relatively cheap to implement.

Cost to Users:

ANY-IN-CAPITALIZE-OUT is probably very low cost because it is

more tolerant of deviations which almost surely already exists between

implementations, and among users who have made various assumptions

about what implementations might or should do.

CASE-SENSITIVE is a slightly incompatible change, but

probably a lot of programs don't use either of these functions.

Of those that do, some usages don't depend on anything other

than that (NAME-CHAR (CHAR-NAME x)) will work. In any case, most

applications that uses these are probably fairly easily to check

by hand even though mechanical checking probably would not work

very well.

Cost of Non-Adoption:

Divergence among implementations. Potential portability problems

from skews of interpretations. New implementations would not know

clearly what their rights and responsibilities were.

Benefits:

A clearer specification.

Aesthetics:

ANY-IN-CAPITALIZE-OUT is less aesthetic because it has a special

case for one-character strings.

CASE-SENSITIVE is generally more aesthetic.

Discussion:

KMP thinks that mostly it's a good idea to resolve this clearly one

way or another, and isn't incredibly fussy about the details. He has

a mild preference for proposal CASE-SENSITIVE because it leaves the

greatest flexibility to the international community which has the

English alphabet forced upon them all too often already.