Newsgroups: sci.lang
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!newsflash.concordia.ca!news.nstn.ca!ott.istar!istar.net!van.istar!west.istar!n1van.istar!van-bc!nntp.portal.ca!news.bc.net!info.ucla.edu!newsfeed.internetmci.com!in3.uu.net!psinntp!psinntp!psinntp!commpost!usenet
From: pardoej@lonnds.ml.com (Julian Pardoe LADS LDN X1428)
Subject: Re: Languages: Hard, Harder, Hardest
Message-ID: <Dv7GI2.AGF@tigadmin.ml.com>
Sender: usenet@tigadmin.ml.com (News Account)
Reply-To: pardoej@lonnds.ml.com
Organization: Merrill Lynch Europe
References: <rte-2407961026550001@135.25.40.118>
Date: Sat, 27 Jul 1996 14:06:02 GMT
Lines: 39

In article <rte-2407961026550001@135.25.40.118>, rte@elmo.lz.att.com (Ralph T. Edwards) writes:
-->In article <7fd91mffp2.fsf@wisdom.cs.hku.hk>, sdlee@cs.hku.hk (Lee Sau Dan
-->~{@nJX6X~}) wrote:
-->
-->> >>>>> "Ralph" == Ralph T Edwards <rte@elmo.lz.att.com> writes:
-->> 
-->> 
-->>     Ralph> Because Russian has pretty much one symbol for one sound
-->>     Ralph> system of representation, fewer characters are required
-->>     Ralph> than, say Dutch, English or Finnish which often use
-->>     Ralph> digraphs like th, ea, aa. This is an artifact of the
-->>     Ralph> writing system and does not reflect the complexity of the
-->>     Ralph> real (spoken) language.  French has many silent letters.
-->> 
-->> Shall we  also  consider the size of   the  alphabet?  English has  an
-->> alphabet size of 26.  So,  each letter worths log_2(26)  [ the base  2
-->> logarithm of 26 ] BITs of computer storage.  So, we shall multiply the
-->> figure for   English with log_2(26) to   find  out how  many  bits are
-->> actually required to store the English version.   Since Russian has 33
-->> (I may   be wrong; correct me   plase) letters in  the  alphabet, each
-->> letter embeds more information  than an English letter.  Each  Russian
-->> letter  worths  log_2(33) BITs.   Its  figure  shall be  multiplied by
-->> log_2(33) before making a "fair" comparison.
-->
-->I think more direct would be coming up with a phoneme to symbol average
-->for both languages, correcting for that, and then applying your theory to
-->the phoneme count.  Of course then one has the problem of how to count
-->phonemes.  Is Russian (y)e one or two?  Is a Finnish long vowel a separate
-->phoneme or vowel + length?
-->Is English ou, ay one or two?  It would be nice to come up with a theory
-->that didn't depend on arbitrary decisions or arbitrary alphabets.

I think that one wants to count morphemes, not phonemes.  (Of, course
deciding what counts as a morpheme will involve a degree of arbitariness.
Is it <morpheme> or <morph><eme>? -- people seem to feel pretty free to attach
the ending "-eme" to just about anything.)

-- jP --

