Newsgroups: sci.lang
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!godot.cc.duq.edu!newsgate.duke.edu!news.mathworks.com!newsfeed.internetmci.com!in2.uu.net!psinntp!psinntp!psinntp!commpost!usenet
From: pardoej@lonnds.ml.com (Julian Pardoe LADS LDN X1428)
Subject: Re: Languages: Hard, Harder, Hardest
Message-ID: <Dv57o7.BF1@tigadmin.ml.com>
Sender: usenet@tigadmin.ml.com (News Account)
Reply-To: pardoej@lonnds.ml.com
Organization: Merrill Lynch Europe
References: <7f7mrv56qc.fsf@wisdom.cs.hku.hk>
Date: Fri, 26 Jul 1996 09:00:07 GMT
Lines: 48

In article <7f7mrv56qc.fsf@wisdom.cs.hku.hk>, sdlee@cs.hku.hk (Lee Sau Dan ~{@nJX6X~}) writes:
-->>>>>> "Patrick" == Patrick Juola <patrick@gryphon.psych.ox.ac.uk> writes:
-->
-->    Patrick> As you point out, "greatly" is of course a matter of
-->    Patrick> opinion.  Please let me provide some numbers.  Consulting
-->    Patrick> my sources, I have on-line translations of the Bible in
-->    Patrick> the following languages (with sizes) :
-->
-->    Patrick> 	English (NRV) 4,379,692 bytes Russian 3,575,074 "
-->    Patrick> Dutch 4,542,254 " French 4,311,550 " Finnish 4,229,221 "
-->    Patrick> Maori 4,639,731 "
-->
-->    Patrick> The maximum difference in this case is less than 30%
-->    Patrick> (Maori/Russian), and notably is a difference between the
-->    Patrick> language with the smallest character set and the largest,
-->    Patrick> again indicating "conservation of complexity."  On the
-->    Patrick> basis of this data, representing at least four maximally
-->    Patrick> independent linguistic groups(*) by the way, I think it's
-->    Patrick> reasonable to conclude that (scholastic translations of)
-->    Patrick> Bibles don't vary greatly in size.
-->
-->Do you have  the figure for Chinese?  I've  once read from a book that
-->the Chinese  version of the piles  of official documents in the United
-->Nation is much thinner than  the English, French, Spanish, Russian and
-->Arabic  counterparts.  I'd like  to know  if it is   also true for the
-->bible.

Note, what Patrick says about character-set sizes.  One would expect a
language-using a larger character set to require fewer characters.  One
could imagine encoding English using a scheme like that of Japanese.
The bible would then require far fewer characters.  However the nature
of English as a language hasn't changed.  Furthermore we are still no
nearer to deciding whether an alphabetic or ideo/logo/morpho-graphic
system is "better" (whatever that means).

(You don't even need to go so far as to use a system like that of Japanese.
One could just encode frequent words and letter-combinations as new
letters.  The size of the Bible in English would then shrink dramatically.
So what?)

This isn't to say that Patrick's figures are meaningless.  When talking
about the "size" of some document in various languages we probably want
to be counting morphemes.  For the languages he cites character-count
is probably a good proxy for morpheme-count as they all use similar
alphabetic systems.

-- jP --

