Newsgroups: comp.speech
Path: pavo.csi.cam.ac.uk!doc.ic.ac.uk!mrccrc!news.dcs.warwick.ac.uk!warwick!zaphod.crihan.fr!univ-lille1.fr!ciril.fr!news.imag.fr!univ-lyon1.fr!swidir.switch.ch!scsing.switch.ch!xlink.net!howland.reston.ans.net!pipex!uunet!zib-berlin.de!netmbx.de!Germany.EU.net!EU.net!sun4nl!sci.kun.nl!psych4.psych.kun.nl!user
From: hartsuiker@nici.kun.nl (Rob Hartsuiker)
Subject: Re: integrating psycholinguistics with machine speech recognition?
Message-ID: <hartsuiker-010294092443@psych4.psych.kun.nl>
Followup-To: comp.speech,sci.lang,sci.psychology
Sender: news@sci.kun.nl (News owner)
Nntp-Posting-Host: psych4.psych.kun.nl
Organization: NICI 
References: <2ikor6$gs3@hobbes.cc.uga.edu>
Date: Tue, 1 Feb 1994 08:27:44 GMT
Lines: 60

In article <2ikor6$gs3@hobbes.cc.uga.edu>, pbrunk@aisun2.ai.uga.edu (Paul
Brunk [MSAI]) wrote:
> 
> Hi all:
> 
> I'm working on an MS in artificial intelligence.  The topic of this thesis
> is the actual and possible theoretical interface between psycholinguistic
> research and automatic speech recognition research.  In particular, I'm
> interested in how human linguistic knowledge could be integrated into an
> automatic speech recognition so as to prune the lexical search space in an
> efficient manner  (..)
> 
> My question: does anyone know of research (articles, systems, etc) which
> addresses this theoretical interface?

(..)

A few weeks ago, dr Anne Cutler gave a lecture, called 'lexical acces from
continous speech input'. As the title says, her research is concerned with
how we retrieve lexical items from a continous speech stream, in which
there are no clear markers for word endings or word beginnings. She
contrasted three kinds of models:
1. Segmentation models that use prosody (i.e. 'Strong vs. Weak vowels') as
the most important cue - this is her own work
2. Sequential recognition models, like Marslen-Wilson's Cohort model
3. Competition models
		-There is  'TRACE' by Hinton? and McClelland (see the PDP-volumes)
	 -There is 'SHORTLIST' by Norris & Cutler

To summarize and simplify Anne Cutler's talk, the first kind of model can
account for segmentation in an extremely large number of cases - this is
based on studies in which a 30,000 English word corpus is analyzed
statistically.
Sequential recognition models like Cohort don't work, because most
monosyllabic words can often be continued (star -> startled, started), most
content words are monosyllabic, and most polysyllabic words contain other
words within them. 
The third kind of model fares better, on Cutler's account. The TRACE model
has an extremely implausible architecture (repeating the entire lexicon,
each time step). However, the SHORTLIST model does better.

So, as you can see work has been done on integration of prosodic and
lexical knowledge with speech recognition. However, I am unaware of any
attempt to include top-down influences: syntactic, semantic and pragmatic
constraints. It's all to obvious, for example, that knowing what is being
talked about greatly helps in understanding an ambiguous acoustic signal.

Hope this is of any use,
	Rob


> --
> Paul Brunk, system administrator
(..)

Robert J. Hartsuiker
NICI, KU Nijmegen      
hartsuiker@nici.kun.nl      
phone: +31 80 612608       
                            
