Arthur R. Toth
Ph.D. Language and Information Technologies, School of Computer Science, Carnegie Mellon University, May 2009
"Using Articulatory Position Data to Improve Voice Transformation"
Advisor: Alan W Black
M.S. Language Technologies, School of Computer Science, Carnegie Mellon University, May 2001
A.B. Mathematics, Harvard University, June 1993
Teaching Assistant Positions
15-453: Formal Languages, Automata, and Computation, Spring 2003
11-682/15-492: Intro to IR, NLP, MT, and Speech, Fall 2002
I have been working as a research scientist at
Yap, Inc. since September 21, 2009.
I received my Ph.D. on May 17th, 2009.
I continued work with Dr. Tanja Schultz for one
month (May 2009), but now from Pittsburgh. My task was to construct an
on-line system that converts electromyographical data to speech. The
surface electromyographical data we used was collected by attaching
probes to a person's face in order to measure the activation
potentials of certain muscles which are used during speech. As this
data could also be collected while a person pantomimes speech, we were
investigating its use for silent speech interfaces which take this
data and produce speech from it. The goal of the on-line system was
to serve as a demonstration and proof-of-concept of a silent speech
interface based on certain machine learning and signal processing
From February through April 2009, I worked with Dr. Tanja Schultz in the Cognitive Systems Lab at University of Karlsruhe. I
worked with her group to apply voice transformation techniques to
synthesize speech from electromyographical data that they had
collected and previously used for speech recognition experiments.
This work led to two paper submissions to Interspeech 2009. During
this time, Tanja and I also continued our collaboration with Dr. Alan W Black and Dr. Qin Jin. I constructed
some human listening evaluations on various types of de-identified
speech to determine how difficult it was for people to identify
speakers when we tried to obscure who was speaking. This work was
combined with some other work we had performed and was part of another
paper we submitted to Interspeech 2009 and part of an article we
submitted to IEEE Transactions on Audio, Speech, and Language
From 2005 until January 2009, I worked with Dr. Alan W Black on the
TRANSFORM project. My primary work was on trying to use articulatory
position data, more specifically the MOCHA database, to improve voice
transformation. We also investigated and implemented Harmonic plus
noise and Harmonic Stochastic models for speech signals. In our last
year-and-a-half, we collaborated with Dr. Qin Jin and Dr. Tanja Schultz, pitting our
voice transformation systems against their speaker identification
systems. We investigated security issues, such as whether voice
transformation was a threat for fooling speaker identification
systems, and we investigated privacy issues, such as whether voice
transformation could be used to obscure the identity of speech
presentated to speaker identification systems.
From September 2002 until 2005, I worked with Dr. Alan W Black on the
Storyteller project. I worked primarily on the automatic detection of
prosodic boundaries in speech, especially in the context of
multi-sentence recordings that are longer than what is typically used
for constructing concatenative speech synthesizers.
Previously, from August 1999 through August 2002, I worked with Dr. Roni Rosenfeld on
Statistical Language Modeling and the Universal Speech Interface
Refereed Conference and Workshop Papers
- Arthur R. Toth, Bhiksha Raj, Kaustubh Kalgaonkar, Tony
Ezzat. Synthesizing Speech From Doppler
Signals. Proc. ICASSP 2010.
- Qin Jin, Arthur R. Toth, Tanja Schultz, Alan W Black. Speaker De-Identification Via Voice Transformation. Proc. ASRU2009>.
- Arthur R. Toth, Michael Wand, Tanja Schultz. Synthesizing Speech from Electromyography using Voice Transformation Techniques. Proc. Interspeech 2009.
- Michael Wand, Arthur R. Toth, Szu-Chen (Stan) Jou, Tanja Schultz. Impact of Different Speaking Modes on EMG-based Speech Recognition. Proc. Interspeech 2009.
- Qin Jin, Arthur R. Toth, Tanja Schultz, Alan W Black. Voice Convergin: Speaker De-Identification by Voice Transformation. Proc. ICASSP 2009.
- Arthur R. Toth, Alan W Black. Incorporating Durational Modification in Voice Transformation. Proc. Interspeech 2008.
- Qin Jin, Arthur R. Toth, Alan W Black, Tanja Schultz. Is Voice Transformation a Threat to Speaker Identification? Proc. ICASSP 2008.
- Kishore Prahallad, Arthur R. Toth, Alan W Black. Automatic Building of Synthetic Voices from Large Multi-Paragraph Speech Databases. Proc. Interspeech 2007.
- Alan W Black, Christina L. Bennett, Benjamin C. Blanchard, John Kominek, Brian Langner, Kishore Prahallad, Arthur Toth. CMU Blizzard 2007: A Hybrid Acoustic Unit Selection System from Stastistically Predicted Parameters. Blizzard 2007.
- Arthur R. Toth, Alan W Black. Using Articulatory Position Data in Voice Transformation. Sixth ISCA Workshop on Speech Synthesis. 2007.
- Arthur R. Toth and Alan W Black. Visual Evaluation of Voice Transformation Based on Knowledge of Speaker. In Proc. ICASSP 2006.
- Arthur R. Toth and Alan W Black. Cross-Speaker Articulatory Position Data for Phonetic Feature Prediction. In Proc. Interspeech 2005.
- John Kominek, Christina Bennett, Brian Langner, Arthur Toth. The Blizzard Challenge 2005 CMU Entry: a method for improving speech synthesis systems. In Proc. Interspeech 2005.
- Arthur R. Toth. Forced Alignment for Speech Synthesis Databases Using Duration and Prosodic Phrase Breaks. In Proc. 5th ISCA Speech Synthesis Workshop. June 2004.
- Jason Y Zhang, Arthur R. Toth, Kevin Collins-Thompson, and Alan W Black. Prominence Prediction for Super-Sentential Prosodic Modeling Based on a New Database. In Proc. 5th ISCA Speech Synthsesis Workshop. June 2004.
- Arthur R. Toth, Thomas K. Harris, James Sanders, Stefanie Shriver and Roni Rosenfeld. Towards Every-Citizen's Speech Interface: An Application Generator for Speech Interfaces to Databases. In Proc. ICSLP 2002.
- Stefanie Shriver, Roni Rosenfeld, Xiaojin Zhu, Arthur Toth, Alex Rudnicky, Markus Flueckiger. Universalizing Speech: Notes from the USI Project. In Proc. Eurospeech 2001.
- Stefanie Shriver, Arthur Toth, Xiaojin Zhu, Alex Rudnicky, Roni Rosenfeld. A
Unified Design for Human-Machine Voice Interaction. In Proc. CHI 2001.
- Ronald Rosenfeld, Xiaojin Zhu, Stefanie Shriver, Arthur Toth, Kevin Lenzo,
Alan W Black. Towards
a Universal Speech Interface. In Proc. ICSLP 2000.
Refereed Journal Article
- Stefanie Tomko, Thomas K. Harris, Arthur Toth, James Sanders, Alexander Rudnicky, Roni Rosenfeld. Towards Efficient Human Machine Speech Communication: The Speech Graffiti Project. ACM Transactions on Speech and Language Processing. Vol. 2, No. 1. February 2005.