11-756 / 18799D Design and Implementation of ASR Systems

11-756/18799D ASR: Assignment 1, Data Capture and Feature Computation

This homework consists of two parts. In the first you will write a program to capture speech and endpoint it. In the second you will compute features from the captured data.

Part 1

Write a program to capture speech data. It must include the following:

Suggestion: You can use portaudio for the audio capture. Portaudio is a well established cross-platform audio capture package.

Part 2

Write a routine for computing MFCC from audio

Some suggestions

You are allowed to use code from the web

However, we recommend doing your own code if you can.

Regardless of what you use, the feature computation code must be integrated with the audio capture routine.

How to visualize the spectrogram represented by cepstra

The Mel-log spectrum can be directly visualized as a matrix.

However, the cepstrum is a dimensionality-reduced and transformed version of the log spectrum. It is not visually meaningful. However, the truncated cepstrum can be converted back to a log spectrum by zeropadding it to 64 or 128 poitns and computing an inverse DCT (if you used a DCT to derive cepstra from log spectra). The IDCT-derived logspectrum is what the cepstrum really represents.

Due: Wednesday, 6 Feb 2013.