![]() |
Associate
Teaching Professor
Language Technologies Institute www.cs.cmu.edu/~max
6413 Gates Hillman Complex Tel. 412 268-3858
I am chair of the SLaTE
special interest group (Speech and Language Technology for
Education) of ISCA (International Speech Communication Association) –
you can find SLaTE at: www.sigslate.org
GOALS INTERESTS
PROJECTS STUDENTS PUBLICATIONS TEACHING
CARNEGIE SPEECH
COMPANY RESOURCES
Understanding
the factors that affect the variability of the speech signal. Creating
automatic systems (automatic speech recognition and synthesis) that benefit
from this knowledge and that provide real benefit to end users. This endeavor
implies studying groups of speakers, input conditions, styles of speech and
detecting the acoustic and upper-level indices that are indicative of these
variants.
I am
interested in the variability of the speech signal – its sources and
manifestations, whether groups of speakers or some variation in the manner in
which they speak or the conditions in which they find themselves. Non-native
speech is one particular interest within this area, as is speaking style.
I
am also interested in the manner in which a foreign language can be taught
effectively, either by a human or by a computer. This implies the presentation
of the information, the choice of which information to present, and the manner
in which the information to be presented is chosen. One present interest here
is in a Gestaltist approach in teaching the new
sounds of a second language. Another interest is in teaching culture using
“pinpointing” as it was developed in my work on non-native pronunciation error
detection, where the specific error is shown in context and corrective help is
offered specific to the error.
That
system to detect and correct foreign speakers’ pronunciation errors in English
is called Fluency and the basic algorithms developed in that project have been
spun off into the NativeAccentTM product sold by the company I
started, Carnegie SpeechTM. So, I am very interested in seeing research
results in use in real life! The Let’s Go system has been answering the phone
for the Port Authority of Allegheny County every evening since the beginning of
March 2005. And I hope that the REAP system will also get into the hands of
many students, native and non-native, who want to learn to read better.
I
am the chair of the ISCA special interest group on Speech and Language Technology in Education (SLaTE). Please visit our website for more information (www.sigslate.org).
Fluency – a project to use automatic speech
recognition to detect pronunciation errors and to provide appropriate
correction information – contact “max at cs dot cmu
dot edu” for more information.
Let’s Go – a project using a spoken
dialogue system to expand access to such systems to the elderly and to
non-native speakers. http://www.speech.cs.cmu.edu/letsgo/
REAP – a project to retrieve appropriate,
individuated texts for students learning to read http://hartford.lti.cs.cmu.edu/Reap/
LTI
PhD – Antoine Raux
Masters
in Language Technology (MLT) – Jonathan Brown
(g), James Sanders, Michael Heilman,
Carol Sisson
Master
of Computer-Assisted Language Learning (MCALL) - Jie Hu (g)
Computer-Assisted
Language Learning
Pronunciation
Eskenazi, M., (1999) Issues in the use
of speech recognition for foreign language tutors, invited paper: Language
Learning and Technology Journal (online) Vol. 2, No. 2, January 1999, pp.
62-76. http://llt.msu.edu/vol2num2/article3/index.html
Probst, K., Ke, Y., Eskenazi, M.,
2002, Enhancing foreign language tutors - in search of the golden speaker,
Speech Communication, 37/3-4 pp. 161-173.
Eskenazi, M., Pelton,
G. 2002, Pinpointing pronunciation errors in children’s speech: examining the
role of the speech recognizer, Proposed to the Pronunciation Modeling and
Lexicon Adaptation for Spoken Language Technology Workshop, Sept 2002,
Colorado. .pdf file
Eskenazi, M., Ke, M.,
Albornoz, J., Probst, K., 2000. Update on the Fluency
Pronunciation Trainer, In: Proceedings of InSTIL
2000,
Mayfield Tomokiyo, L., Wang, L.,
Eskenazi, M., 2000, An Empirical Study of the Effectiveness of
Speech-Recognition-based Pronunciation Training, Proc. ICSLP 2000,
Eskenazi, M., Hansma, S., 1998, The
Fluency Pronunciation Trainer, Proc. STiLL Workshop on Speech Technology in
Language Learning, Marhollmen, May. .pdf file
Eskenazi, M., Hansma, S., Semp, M.,
Warner, R., 1998, By ear and by eye - adaptive tutoring for foreign language
pronunciation training – in Proc. STiLL Workshop on Speech Technology in
Language Learning Marhollmen. .pdf file
Callan, J., Eskenazi, M., Perfetti, C., 2006, Progress in Providing Reader-Specific lexical Practice for Inproved Reading Comprehension, presented at IES 2006 research conference, June 15-16 2006, Washington DC
Juffs, A., Eskenazi, M., Wilson, L., Pelletreau, T., Sanders, J., Callan, J., Brown, J., 2006, Promoting robust learning of vocabulary through computer assisted language learning, Proc. Joint conference of AAAL and ACLA/CAAL 2006, Montreal, June 2006.
Juffs, A.,
Brown, J., Eskenazi, M., 2006,
Using Simulated Students for the Assessment of Authentic Document Retrieval,
ITS2006,
Heilman,
M., Eskenazi, M., 2006, Language Learning: Challenges for Intelligent Tutoring
Systems, Workshop on Ill-defined Domains in Intelligent Tutoring,
Heilman, M., Collins-Thompson, K., Callan, J., Eskenazi, M., 2006, Classroom Success of an Intelligent Tutoring System for Lexical Practice and Reading Comprehension, Proc. Interspeech2006, Pittsburgh September 2006. .pdf file
J. Brown, G. Frishkoff,
and M. Eskenazi. (2005). "Automatic question generation for vocabulary
assessment." In Proceedings of HLT/EMNLP 2005.
J. Brown and M. Eskenazi. (2005).
"Student, text and curriculum modeling for reader-specific document
retrieval." In Proceedings of the IASTED International Conference on
Human-Computer Interaction 2005.
Brown, J., Eskenazi, M., 2004, Retrieval of
Authentic Documents for Reader-Specific Lexical Practice, Proceedings INSTIL
2004,
Non-native speech
Eskenazi, M., Raux, A., Harris, E., 2006, “Using speech recognition for
just-in-time language learning, J. Acoust. Soc. Am, vol 120, no. 5, pt.2, p.3138. THE SLIDES FROM MY
TALK ARE HERE
Raux, A., Eskenazi, M., 2004,
Using Task-Oriented Spoken Dialogue Systems for Language Learning: Potential,
Practical Applications and Challenges, Proceedings INSTIL 2004,
Raux, A., Eskenazi, M., 2004,
Non-native users in the Let’s Go!! Spoken Dialogue System: Dealing with
Linguistic Mismatch, Proceedings HLT 2004,
Raux, A., Langner, B.,
Black. A., Eskenazi. M., 2003, LET’S GO: Improving Spoken Dialog Systems for the Elderly and
Non-natives, Proc. Eurospeech 2003,
Spoken
Dialogue
Raux,
A., Bohus, D., Langner, B.,
Black, A., Eskenazi, M., 2006, Doing Research on a Deployed Spoken Dialogue
System: One Year of Let’s Go! Experience, Proc. Interspeech 2006,
A. Raux, B. Langner, D. Bohus, A. W Black and M. Eskenazi, Let's Go Public! Taking
a Spoken Dialog System to the Real World, Interspeech 2005
Eskenazi, M.,
1998, User Come Back, DARPA Communicator Compare and Contrast Meeting, June
16-17, 1998. .pdf file
Ravishankar, M. and Eskenazi, M.,
1997, Automatic Generation of Context-dependent Pronunciations, Proc.
Eurospeech ’97,
Placeway,
P., Chen, S., Eskenazi, M., Jain, U., Parikh, V., Raj, B., Ravishankar, M.,
Rosenfeld, R., Seymore, K., Siegler,
M., Stern, R., Thayer, E., 1997, The 1996 HUB-4 Sphinx-3 System, Proc, DARPA
Speech Recognition Workshop, Chantilly, Virginia, Morgan Kaufmann Publishers.
Seymore, K.,
Chen, S., Eskenazi, M., Rosenfeld, R., (1997), Language and Pronunciation Modelling in the CMU 1996 HUB-4 Evaluation, Proc, DARPA
Speech Recognition Workshop, Chantilly, Virginia, Morgan Kaufmann Publishers
Elderly
speech
Eskenazi.
M., Black, A., Simmons, R., 2002, Elderly Perception of Speech from a Computer,
Meeting of the
Acoustical Society of
Eskenazi, M., Black, A., 2001. A study
on speech over the telephone and aging, Proc. Eurospeech01,
Speaking
Styles
Eskenazi, M. 1993. Trends in Speaking Style
Research, Keynote speech, Proceedings Eurospeech’93,
Eskenazi, M., 1995, Hot Topics in
Speaking Style Research, in European Studies in Phonetics and Speech
Communication, Bloothooft, Hazan,
Huber, Llisterri, eds., OTS Publications, The
Netherlands. P. 58 - 62.
Eskenazi, M., Lacheret,
A., 1991, Exploration of individual strategies in continuous speech, Speech Communication,
vol. 10 no. 3.
Eskenazi, M. 1992. Changing speech
styles, speakers’ strategies in read speech and careful and casual spontaneous
speech. Proceedings of the International Conference on Spoken Language
Processing,
Data collection and
assessment
Eskenazi, M., Rudnicky, A., Gregory,
K., Constantinides, P., Brennan, R., Bennett, C.,
Allen, J., 1998, Data Collection and Processing in the Carnegie Mellon
Communicator, in Proc. ESCA Eurospeech 98. . .pdf file
M.
Eskenazi, 1996, KIDS: A Database of Children's Speech , in Proc. 3rd joint
Meeting: Acoustical Societies of America and
Eskenazi, M., Hogan, C., Allen, J.,
Frederking, R., 1998, Issues in database design: Recording and processing
speech from new populations, Proc. LREC Assessment and Database
Lamel,
L., Gauvain,
JL., Eskenazi, M., 1991, BREF, a Large Vocabulary Spoken Corpus for French, in
Proc. EUROSPEECH-91
AFNOR, 1990, norme experimentale S 31-115, Evaluation de systemes
de traitement automatique de la parole Partie 1: Definitions
et methode d'evaluation de systemes de reconnaissance automatique de la parole - systemes de reconnaissance globale.
Cochlear
Implants
Eskenazi, M., Vormes,
E., Monguillot, G., Frachet,
B., 1993, A new training and assessment technique for cochlear implants, in Advances
in Cochlear Implants, Hochmair-Desoyer and Hochmair eds., International Science Seminars, Vienna,
Austria, p. 572-577.
* Speech class: 11-752 Production, Prosody
and Synthesis taught with Alan Black 11-752 course
description
* Language Technologies: 11-717 Language
Technologies for Computer-Assisted Language Learning taught with Lori Levin and
Teruko Mitamura 11-717 course
description
A
book chapter on the material in 11-717:
Eskenazi, M., Brown, J., 2006, Teaching the Creation of Software that Uses Speech Recognition, in Teacher Education in CALL, P. Hubbard and M. Levy Eds., Language Learning and Language Teaching series, John Benjamins Publishing.
Here
is a NEW informal course I am teaching on how to write a scientific paper.
In
2001, Jaime Carbonell and I started the Carnegie Speech Company. The company
produces software
for teaching and assessing ESL. It has received funding from Innovation Works
and from the state of Pennsylvania, and it has had SBIR grants from the US
Department of Education and the National Science
foundation as well as a prestigious Advanced Technology Program award from NIST
at the Department of Commerce. We have people in various places around the
world using our products! You can find out all about it at: www.carnegiespeech.com
And
see: http://www.postgazette.com/pg/09013/941345-298.stm?cmpid=newspanel1
Here are some things that may be of interest
to you.
1. THE PITTSBURGH SCIENCE OF LEARNING CENTER: Through an NSF SLC award, cognitive scientists, language technologists,
psychologists and others work together in this center to explore robust
learning. You can find out more about our Center at learnlab.org
2. GRAPHEME_TO-PHONEME DICTIONARY: I am one of the people who has worked on CMUDICT a large grapheme-to-phoneme
dictionary containing over 130,000 entries. CMUDICT can be used for a variety
of applications and research topics. It is distributed as open source software
and can be found at: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
3. AUTOMATIC
SPEECH RECOGNIZER: The first time I used
SPHINX II, I was astounded at how robust it could be. It is not perfect – none are, as we all know.
But with understanding of the strong and weak points of the recognizer and some
smart engineering, it is possible to modify it to perform nicely in
well-defined applications (like Carnegie SpeechTM’s
NativeAccentTM). It is also open source software and can be found
at: http://www.cmusphinx.org .
An
important element in getting the recognizer to work well in a new application
is to train it with data that is representative of the speakers who will use
the application and the language they will use to express themselves. Carnegie
SpeechTM sells licenses to YOUTH, a database of children’s speech
that we put together during our Department of Education SBIR. This can be used
to train the recognizer for applications for kids from about 6 to 11. Several
commercial applications successfully use this data in their products.
4. AN AUTOMATIC
SPEECH RECOGNITION (ASR) DIALOGUE SYSTEM: One of the precursors of our Let’s Go dialogue system and one of the
best known is the Galaxy system from MIT. http://www.sls.csail.mit.edu/GALAXY.html
5. NEW FINDINGS
IN LANGUAGE LEARNING: One of the most promising
directions that I know of for language learning is the one that started with D.
Pisoni and R. Yamada (ATR). They create pairs of
sounds (R and L for Japanese learners of English) and acoustically “pull them
apart” until the student can hear the difference between them. Students for
whom this training works often can pronounce a new sound without having
pronunciation training on it. At CMU. J. McClelland in the CNBC is working on
this. You can check out: http://www.cnbc.cmu.edu/~jlm/papers/
6. LANGUAGE
TECHNOLOGIES FOR LANGUAGE LEARNING CONFERENCE: The last INSTiL conference took place in
7. AUTHORING NEW TUTORING SYSTEMS SUMMER SCHOOL: The great people who have made some of the most advanced and
successful intelligent tutoring systems that exist hold a summer school each
year where you can come and use their authoring tools to create your own tutor.
You can find it here: http://learnlab.org/opportunities/summer/
8. FAUX AMIS: I compiled a list of words that appear to be the same, but have very
different meanings in French and English. The list is available here and if you
find other entries or would like to suggest modifications or corrections,
please download and let me know. I will be glad to post your comments and make
changes to the list for all to use. .doc file
Eventually I will add the meanings!