11-756 THEORY AND PRACTICE OF SPEECH RECOGNITION SYSTEMS

THEORY AND PRACTICE OF SPEECH RECOGNITION SYSTEMS

Instructor: Bhiksha Raj,   co-instructed by Rita Singh and Mosur Ravishankar

COURSE NUMBER--ECE: 18799D LTI: 11756
Credits:12
Timings:4:30 p.m. -- 5:50 p.m.
Days:Mondays and Wednesdays
Location: GHC 4101

Prerequisites:
Mandatory:  Linear Algebra. Basic Probability Theory.
Recommended:  Signal Processing.
Coding Skills:  This course will require significant programming form the students. Students must be able to program fluently in at least one language (C, C++, Java, Python, LISP, Matlab are all acceptable).


PROJECTS PAGE

Voice recognition systems invoke concepts from a variety of fields including speech production, algebra, probability and statistics, information theory, linguistics, and various aspects of computer science. Voice recognition has therefore largely been viewed as an advanced science, typically meant for students and researchers who possess the requisite background and motivation.

In this course we take an alternative approach. We present voice recognition systems through the perspective of a novice. Beginning from the very simple problem of matching two strings, we present the algorithms and techniques as a series of intuitive and logical increments, until we arrive at a fully functional continuous speech recognition system.

Following the philosophy that the best way to understand a topic is to work on it, the course will be project oriented, combining formal lectures with required hands-on work. Students will be required to work on a series of projects of increasing complexity. Each project will build on the previous project, such that the incremental complexity of projects will be minimal and eminently doable. At the end of the course, merely by completing the series of projects students would have built their own fully-functional speech recognition systems.

In this edition of the course we will also introduce the theory of Weighted Finite State transducers. In the latter half of the course students will learn to build their own WFST systems, and use open-source tools to compose their own WFST recoginzers.

Grading will be based on project completion and presentation.


                                                                                                                        
Class 123 Jan 2012 Introduction Slides
Class 225 Jan 2012 Data capture Slides assignment 1a
Class 330 Jan 2012 Feature Computation Slides assignment 1b
Class 41 Feb 2012 Dynamic programming for string alignment. Slides assignment 2
Class 56 Feb 2012 Finite state automata (John McDonough) Slides
Class 68 Feb 2012 Assignment 1 presentations
Class 713 Feb 2012 DTW to recognize speech Slides
Class 815 Feb 2012 Assignment 2 presentations assignment 3
Class 920 Feb 2012 DTW to HMMs, part 1 Slides
Class 1022 Feb 2012 HMMs Slides
Class 1127 Feb 2012 Assignment 3 presentations assignment 4
Class 1229 Feb 2012 Continuous speech Slides
Class 135 Mar 2012 Grammars Slides
Class 147 Mar 2012 Backpointer tables, training with continuous speech Slides
Class 1519 Mar 2012 Project Presentations assignment 5
Class 1621 Mar 2012 Ngrams Slides assignment 6
Class 1726 Mar 2012 FSA, John M. Slides John's syllabus
Class 1828 Mar 2012 FSA, part 2, John M. Slides
Class 192 Apr 2012 Class canceled (instructor sick)
Class 204 Apr 2012 Ngrams, part 2 Slides
Class 219 Apr 2012 FSA, part 3, John M. Slides
Class 2211 Apr 2012 Subword units Slides
Class 2316 Apr 2012 Subword units, part 2 Slides
Class 2418 Apr 2012 Parameter sharing Slides
Class 2523 Apr 2012 Homework presentations Slides assignment 7 assignment 8