Newsgroups: comp.speech
Path: pavo.csi.cam.ac.uk!doc.ic.ac.uk!warwick!zaphod.crihan.fr!jussieu.fr!centre.univ-orleans.fr!univ-lyon1.fr!swidir.switch.ch!scsing.switch.ch!xlink.net!howland.reston.ans.net!europa.eng.gtefsd.com!library.ucla.edu!ihnp4.ucsd.edu!pacbell.com!att-out!walter!speech!dk
From: dk@speech..bellcore.com (Dan Kahn)
Subject: US dialectal variation and ASR databases
Message-ID: <CMIDpy.7CB@walter.bellcore.com>
Sender: Dan Kahn
Nntp-Posting-Host: speech.bellcore.com
Organization: Bellcore,  Morristown, NJ
Date: Fri, 11 Mar 1994 16:27:33 GMT
Lines: 51

A recent poster said:
>    I'm collecting speech data over the telephone for 
...
> automatic speech recognition product.
...
> 3.  The system will prompt you to say a list of about 60 words.
> 4.  You will be asked one question:  what state did you grow up in?

This "what state" question is probably included for the purpose of
dialect classification.  (I can't be certain that's true in this
case, but the "grow up" reference strongly suggests it.)  In any
event, it's a question often asked by collectors of speech
recognition databases to classify a speaker's "dialect."

In fact, however, the answer to that question provides far less
data of interest than one might at first guess.

For one thing, dialects do not respect state boundaries.  It's hard
to find American dialects more different than those of Boston and
the Berkshires, both in Massachusetts, or Brooklyn and Buffalo, both
in New York.  Many of the other Eastern and Southern states offer
similar examples.  On the other hand, there are states separated
by a thousand miles whose speech differences are negligible compared
to those just cited.  "Grew up in same state" is neither a necessary
nor a sufficient condition for dialect similarity.

Even more important is that sociolinguistic factors introduce so
much variance as to largely swamp out geographic differences.
Forget _states_ - you can find two neighborhoods in the same
_town_ whose born-and-bred residents show much greater dialectal
difference from each other than either does to a sociolinguistic-
ally-matched neighborhood literally thousands of miles away.

Then there's the sampling problem.  From the universe of friends,
colleagues and internet frequenters you could build a database
of dozens of speakers from each of the 50 states and still not
scratch the surface of American dialectal variation.

I was going to jokingly conclude that as a means of speech-pattern
classification you'd do little worse by asking "how tall are you"
compared to "what state are you from" but it occurs to me that if
you were allowed only a single question the height data might well
correlate _better_ than the state data with commonly extracted
speech parameters (because of the bimodal nature of the height and
the F0/formant distributions, both along male/female lines).

Hope this is of some help to planners of speech databases meant
to be truly representative of American speech.

Dan Kahn
Bellcore
