Newsgroups: alt.lang.design,comp.lang.c++,comp.lang.lisp
Path: cantaloupe.srv.cs.cmu.edu!nntp.club.cc.cmu.edu!hudson.lm.com!news.pop.psu.edu!psuvax1!uwm.edu!spool.mu.edu!olivea!charnel.ecst.csuchico.edu!waldorf.csc.calpoly.edu!kestrel.edu!mcdonald
From: mcdonald@kestrel.edu (Jim McDonald)
Subject: Re: Comparing productivity: LisP against C++ (was Re: Reference Counting)
Message-ID: <1995Jan6.023011.7884@kestrel.edu>
Sender: mcdonald@saker.kestrel.edu (Jim McDonald)
Nntp-Posting-Host: saker.kestrel.edu
Organization: Kestrel Institute, Palo Alto, CA
References: <19941203T221402Z.enag@naggum.no> <3danhm$fqi@xmission.xmission.com> <3dc3ur$fsc@wariat.wariat.org> <3dd145$gnl@xmission <19950104.155848.591924.NETNEWS@UICVM.UIC.EDU> <3efgh9$icq@celebrian.otago.ac.nz>
Date: Fri, 6 Jan 1995 02:30:11 GMT
Lines: 52
Xref: glinda.oz.cs.cmu.edu comp.lang.c++:106086 comp.lang.lisp:16287

In article <3efgh9$icq@celebrian.otago.ac.nz>, nmein@bifrost.otago.ac.nz (Nick Mein) writes:
|> David Hanley (dhanley@matisse.eecs.uic.edu) wrote:
|> : Fergus Henderson (fjh@munta.cs.mu.OZ.AU) wrote:
|> 
|> : :  It quite simply does not make
|> : : sense to "add" two characters together.
|> 
|> :         Granted, adding 'a' to 'a' does not make too much sense.  
|> 
|> Although subtracting one character from another is perfectly sensible:
|> 
|> char x;
|> //...
|> if (('a' <= x) && (x <= 'z'))
|>    x += 'A' - 'a';

Careful!  Using an EBCDIC character code, there are gaps between 'i' and 'j',
and between 'I' and 'J', etc.   I.e. the code for 'i' is something like 9
while the code for 'j' is 12.  (I forget the exact numbers.)

By accident, the code above would probably work, because the gap is consistent
between any lowercase letter and its uppercase version, but if you did other 
operations you could get into trouble. 

For example, (('j' - 'i') == ('b' - 'a') could be false!

And '1' - 'A' might be positive on one machine and negative on anther.

If you don't care about EBCDIC, there are the various double-byte code sets 
(e.g. for Kanji) that could have similar gotchas.

Or do the C and C++ standards somewhere specify ASCII as the only legal 
character set?  (Maybe they do, I'm not a C/C++ lawyer.)

For what it's worth, Common Lisp takes these issues into account in its 
specification of the meaningful operations and comparisons that can be 
performed on characters, e.g. by specifying the explicit partial orders 
on characters that a user can depend on.

It provides CHAR-INT to convert from characters to integers, with a comment
to the user that char-int is provided mainly for the purposes of hashing 
characters.  (The inverse function, int-char, that converts integers to
characters, existed in previous versions but was dropped from the standard,
probably because it has no clean definition and serves no useful purpose.)

CL also provides CHAR-CODE for getting numbers to use when accessing arrays, 
CHAR-UPCASE and CHAR-DOWNCASE for doing your trick above more portably, 
GRAPHIC-CHAR-P to discriminate if a character is printable or formatting,
plus about 30 similar *standard* functions.  A good compiler can recognize 
calls to these and do optimizations to perform them inline.


