Newsgroups: alt.usage.english,sci.lang
Path: cantaloupe.srv.cs.cmu.edu!nntp.club.cc.cmu.edu!goldenapple.srv.cs.cmu.edu!rochester!cornellcs!newsstand.cit.cornell.edu!portc01.blue.aol.com!cyclic.gsl.net!news.gsl.net!EU.net!CERN.ch!sp065!flavell
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Subject: Re: Character sets (was: A.D.)
In-Reply-To: <ct9nUvj030n@sktb.demon.co.uk>
X-Sender: flavell@sp065
X-Nntp-Posting-Host: sp065.cern.ch
Content-Type: TEXT/PLAIN; charset=US-ASCII
Message-ID: <Pine.A41.3.95a.970331153847.25756A-100000@sp065>
Sender: news@news.cern.ch (USENET News System)
Organization: speaking for myself and not for CERN
Comment: I hate unsolicited commercial email - boycott companies that use it - and reserve the right to bill for use of resources.
References: <5gkvm7$fbh@airdmhor.gen.nz> <E7EJoA.8HE@nonexistent.com> <E7s2K4.8q2@acli.interlog.com> <cTMbbaj030n@sktb.demon.co.uk> <5hk32l$glq@thoth.portal.ca> <cTrh1Wj030n@sktb.demon.co.uk> <5hkrsn$2ps@thoth.portal.ca> <ct9nUvj030n@sktb.demon.co.uk>
Mime-Version: 1.0
Date: Mon, 31 Mar 1997 14:03:16 GMT
Lines: 68

On Sun, 30 Mar 1997, Paul L. Allen wrote:

(along with an excellent survey of the situation, for sure)

> It's not valid on web sites because although HTTP is an 8-bit transport the
> specifications (neglecting the drafts for internationalization which are
> still only proposals) limit the character set to the printable characters of
> ISO 8859/1. 

The HTTP protocol explains clearly how to send out an HTML document with
a different transfer coding (which is confusingly called "charset",
although it emphatically does _not_ specify the Document Character Set -
a sad lapse in choice of terminology... yes, they felt they had to
follow MIME usage in this respect...). However, you are right to say
that the current HTML specifications do not actually require browsers to
handle this correctly; nor do the current mass market browsers handle
this correctly in all respects, although they are getting slowly towards
where, for example, the Alis Tango browser has already reached.
(http://www.alis.com/ for product info; follow the Babel link for useful
background information).  [disclaimer: I am not even a customer: I take
an interest in the topic, though, and was impressed by what I saw.]

...

> In fact, on web pages it is entirely valid to enter top-bit-set characters
> directly as HTTP is 8-bit clean.  

Well, that statement is accurate on one of two conditions:

- either: the platform uses a code that contains at least iso-8859-1
(that could be e.g unix, MS-Windows, etc., in a Latin-1 locale at least)

- or: the platform uses a code that is different but fully compatible
with iso-8859-1 (e.c DOS CP850, EBCDIC CECP1047) _AND_ the documents are
served out by a server that understands how to map the platform's own
code into iso-8859-1 for transfer over the 'net.

The Mac can be used for this purpose by using a modified version
of the Mac native code (a modification that's also used for Mac <-> 'net
code mapping in fine Internet programs like Fetch).  Fourteen characters
of the Mac native code are swapped in favour of fourteen that are needed
to make up the iso-8859-1 repertoire.

>  Characters in the range 128-159 are invalid on web pages
> and if used will give different results on different platforms because
> the character set specified for use with HTML does not have printable
> characters in those positions.

Quite right.  One oddity: 127 is a control code in the lower half, but 
its partner 255 is a valid displayable character in the upper half.

> All of which is rather off-topic for either group it's posted to except for
> the fact that people who read sci.lang may have been tempted to put up web
> pages exploiting the additional characters available in the invalid
> positions.

I can't blame them for wanting.  I've been waiting at least 3 years to
see some useful progress in actual deployed software, viz. to the point
where I as an author can feel confident to use this i18n stuff.  The
Alis Tango browser serves as a "proof by example", but, when authoring
for the WWW, it's not polite to harangue one's readers into getting a
better browser. 

If people are interested, I have some pointers for further reading at 
http://ppewww.ph.gla.ac.uk/~flavell/iso8859/

(all usual disclaimers etc.)

