11-756 THEORY AND PRACTICE OF SPEECH RECOGNITION SYSTEMS

THEORY AND PRACTICE OF SPEECH RECOGNITION SYSTEMS

Instructor: Bhiksha Raj

COURSE NUMBER

ECE: 18799D

LTI: 11756

Credits:	12
Timings:	4:30 p.m. -- 5:50 p.m.
Days:	Mondays and Wednesdays
Location:	GHC 4102

Prerequisites:

Mandatory: Linear Algebra. Basic Probability Theory.

Recommended: Signal Processing.

Coding Skills: This course will require significant programming form the students. Students must be able to program fluently in at least one language (C, C++, Java, Python, LISP, Matlab are all acceptable).

PROJECTS PAGE

Voice recognition systems invoke concepts from a variety of fields including speech production, algebra, probability and statistics, information theory, linguistics, and various aspects of computer science. Voice recognition has therefore largely been viewed as an advanced science, typically meant for students and researchers who possess the requisite background and motivation.

In this course we take an alternative approach. We present voice recognition systems through the perspective of a novice. Beginning from the very simple problem of matching two strings, we present the algorithms and techniques as a series of intuitive and logical increments, until we arrive at a fully functional continuous speech recognition system.

Following the philosophy that the best way to understand a topic is to work on it, the course will be project oriented, combining formal lectures with required hands-on work. Students will be required to work on a series of projects of increasing complexity. Each project will build on the previous project, such that the incremental complexity of projects will be minimal and eminently doable. At the end of the course, merely by completing the series of projects students would have built their own fully-functional speech recognition systems.

In this edition of the course we will also introduce the theory of Weighted Finite State transducers. In the latter half of the course students will learn to build their own WFST systems, and use open-source tools to compose their own WFST recoginzers.

Grading will be based on project completion and presentation.


Class 1	23 Jan 2013	Introduction	Slides
Class 2	28 Jan 2013	Data capture	Slides
Class 3	30 Jan 2013	Feature computation	Slides	assignment 1
Class 4	4 Feb 2013	String matching	Slides
Class 5	6 Feb 2013	DTW	Slides	assignment 2
Class 6	11 Feb 2013	Assignment 1 presentations
Class 7	13 Feb 2013	DTW to HMMs	Slides
Class 8	18 Feb 2013	HMMs, part 1	Slides
Class 9	23 Feb 2013	Assignment 2 presentations		assignment 3
Class 10	25 Feb 2013	HMM part 2	Slides
Class 11	27 Feb 2013	Recognizing continuous speech	Slides	assignment 4
Class 12	4 Mar 2013	Grammars	Slides
Class 13	6 Mar 2013	Homework presentations
Class 14	20 Mar 2013	Homework presentations HW4
Class 15	22 Mar 2013	Backpointer tables; training from continuous recordings	Slides	assignment 5
Class 16	25 Mar 2013	No class (instructor away)
Class 17	27 Mar 2013	No class (instructor away)
Class 18	1 Apr 2013	Assignment 5 presentations
Class 19	3 Apr 2013	Ngram models	Slides	assignment 6
Class 20	8 Apr 2013	Ngram models, contd.	Slides
Class 21	10 Apr 2013	Subword units	Slides
Class 23	15 Apr 2013	Subword units continued.	Slides
Class 24	17 Apr 2013	Assignment presentation
Class 25	22 Apr 2013	Tying states	Slides	assignment 7
Class 26	24 Apr 2013	Inexact Search	Slides
Class 27	29 Apr 2013	Lattices and rescoring	Slides	assignment 8	assignment 9