Newsgroups: sci.lang
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!gatech!news-feed-1.peachnet.edu!news.netins.net!internet.spss.com!markrose
From: markrose@spss.com (Mark Rosenfelder)
Subject: Re: Protoworld: tank = liquid container
Message-ID: <D95pAG.DE@spss.com>
Sender: news@spss.com
Organization: SPSS Inc
References: <3otjap$sfo@news.ccit.arizona.edu> <3ptupv$5jp@ixnews4.ix.netcom.com> <3q2i3o$gpe@news.ccit.arizona.edu>
Date: Thu, 25 May 1995 23:01:28 GMT
Lines: 64

In article <3q2i3o$gpe@news.ccit.arizona.edu>,
Hung J Lu <hlu@GAS.UUG.Arizona.EDU> wrote:
>Let us consider a simplified model of world languages.
>(Of course, this is just a model, but it shows a point.)
>Let us suppose that there a N sounds to signify N
>objects/concept, and that different languages match
>these N sounds to the N objects/concepts differently.
>(N is just a number, potentially a very large number.)
>
>Question: pick two languages at random, how many
>	  words in their vocabularies would have
>	  the same sound meaning the same object/concept?
>
>	  In particular, if N increases, do you expect
>	  two languages to share more, or less, identical
>	  words with identical meanings?
>
>Answer: ONE.
>
>Yes, no matter how large N is, on the average, we would expect
>two different languages to share only one word with identical
>sound and identical meaning.

Uh, please explain the reasoning, if any, underlying that conclusion.

Here's something I posted here a few years ago on this question:

We'd like to know how many word-meaning matches we can expect to find
between languages based solely on chance.  There are all sorts of
complications here: how close do the words have to be?  what do you count
as a word?  as a meaning?

I think we can ignore most of the complications, based on an annoying
fact about discussions like this: people are awfully lenient about possible
correspondences.  In [one sci.lang] discussion about "mama", for instance,
some people seem to be saying that any nasal sound in the word for mother
is a suspicious coincidence.  In other words, nobody's being very exact here.

So let's move merrily ahead.  Let's suppose that all languages contain
10,000 words, each with a different meaning.  (Obviously most languages
contain many more words; but the additional words are often technical and
obviously borrowed anyway.)  For each word in language A, find the
word that most sounds like it in language B.  (This procedure eliminates
the need to estimate how likely it is for languages to have the "same word"
and, again, matches people's low standards for matches in games like this.)
Now, what is the chance that this word has the same meaning?  As we've
defined the problem, it's obviously 1 in 10,000.

But there's more words where that came from.  What's the chance that, among
*all ten thousand words*, we'll find some word-meaning matches?
Well, the chance that you'd find r matches is, if I'm not mistaken,
(10000!/(r!(10000-r)!) (.0001^r) (.9999^(10000-r)).

It turns out that there's about a 63% chance that you'll find between
1 and 5 matches with this procedure.  In other words, it would be quite
surprising if there were *no* matches.

Curiously enough, this probability doesn't change too much if you pick a
higher number than 10,000.  The probability of a single match goes down,
but you get more chances.

Once you've got your matches, of course, your work has only begun.  Are
they due to chance, to shared descent, to borrowing, to onomatopoiea,
to a mistake in your judgment of similarity, or what?
