Arthur R. Toth

Graduated May 17,2009 with Ph.D.
Language Technologies Institute
School of Computer Science
Carnegie Mellon University
email: atoth@cs.cmu.edu

Education

  • Ph.D. Language and Information Technologies, School of Computer Science, Carnegie Mellon University, May 2009
    "Using Articulatory Position Data to Improve Voice Transformation"
    Advisor: Alan W Black
  • M.S. Language Technologies, School of Computer Science, Carnegie Mellon University, May 2001
  • A.B. Mathematics, Harvard University, June 1993

  • Teaching Assistant Positions

    15-453: Formal Languages, Automata, and Computation, Spring 2003
    11-682/15-492: Intro to IR, NLP, MT, and Speech, Fall 2002

    Research

    I have been working as a research scientist at Yap, Inc. since September 21, 2009.

    I received my Ph.D. on May 17th, 2009.

    I continued work with Dr. Tanja Schultz for one month (May 2009), but now from Pittsburgh. My task was to construct an on-line system that converts electromyographical data to speech. The surface electromyographical data we used was collected by attaching probes to a person's face in order to measure the activation potentials of certain muscles which are used during speech. As this data could also be collected while a person pantomimes speech, we were investigating its use for silent speech interfaces which take this data and produce speech from it. The goal of the on-line system was to serve as a demonstration and proof-of-concept of a silent speech interface based on certain machine learning and signal processing concepts.

    From February through April 2009, I worked with Dr. Tanja Schultz in the Cognitive Systems Lab at University of Karlsruhe. I worked with her group to apply voice transformation techniques to synthesize speech from electromyographical data that they had collected and previously used for speech recognition experiments. This work led to two paper submissions to Interspeech 2009. During this time, Tanja and I also continued our collaboration with Dr. Alan W Black and Dr. Qin Jin. I constructed some human listening evaluations on various types of de-identified speech to determine how difficult it was for people to identify speakers when we tried to obscure who was speaking. This work was combined with some other work we had performed and was part of another paper we submitted to Interspeech 2009 and part of an article we submitted to IEEE Transactions on Audio, Speech, and Language Processing.

    From 2005 until January 2009, I worked with Dr. Alan W Black on the TRANSFORM project. My primary work was on trying to use articulatory position data, more specifically the MOCHA database, to improve voice transformation. We also investigated and implemented Harmonic plus noise and Harmonic Stochastic models for speech signals. In our last year-and-a-half, we collaborated with Dr. Qin Jin and Dr. Tanja Schultz, pitting our voice transformation systems against their speaker identification systems. We investigated security issues, such as whether voice transformation was a threat for fooling speaker identification systems, and we investigated privacy issues, such as whether voice transformation could be used to obscure the identity of speech presentated to speaker identification systems.

    From September 2002 until 2005, I worked with Dr. Alan W Black on the Storyteller project. I worked primarily on the automatic detection of prosodic boundaries in speech, especially in the context of multi-sentence recordings that are longer than what is typically used for constructing concatenative speech synthesizers.

    Previously, from August 1999 through August 2002, I worked with Dr. Roni Rosenfeld on Statistical Language Modeling and the Universal Speech Interface project.


    Publications