Kaldi+PDNN -- Building DNN-based ASR Systems with Kaldi and PDNN

Kaldi+PDNN - Building DNN-based ASR Systems with Kaldi and PDNN

News

----------------------------------------------------------------------------------------------

Nov 2014. A new version is ready. Check the change log for the list of updates.

Nov 2014. Kaldi+PDNN is moved to GitHub for better code management and community participation.

Nov 2014. Multi-task Learning is added to PDNN. This enables DNN training over multiple languages, domains, dialects, etc.

Jul 2014. SAT for DNNs systems are added.

Apr 2014. A new version is released. Check the change log for the list of updates.

About

----------------------------------------------------------------------------------------------

Kaldi+PDNN builds state-of-the-art DNN acoustic models using the open-source Kaldi and PDNN toolkits. The pipeline has 3 stages:

1. Initial GMM models are built with the existing Kaldi recipes

2. DNN/DCN acoustic models are trained by PDNN

3. Trained DNN/DCN models are ported back to Kaldi for decoding or tandem system building

Hightlights of Kaldi+PDNN include:

Model diversity. Deep neural networks, deep convolutional networks, bottleneck-feature tandem systems

PDNN toolkit. Easy to use, fast to implement new ideas.       [more info]

Open license. All the code is released under Apache 2.0, the same license as Kaldi

Consistency with Kaldi. Scripts follow the Kaldi style and can be integrated with any of the existing example setups.

Requirements

----------------------------------------------------------------------------------------------

1. A GPU should be available on your machine. Otherwise, PDNN will use CPUs.

2. Initial GMM model building should be done with the existing Kaldi recipes

3. Install Theano. Refer to the Theano installation for more details. If you are running Ubuntu Linux, following steps in this document

    will set up Theano for you.

4. Install pfile_utils-v0_51. This script installs it automatically. Add pfile_utils-v0_51/bin to the PATH environment variable if it is NOT

     installed under the Kaldi tools folder

Download

----------------------------------------------------------------------------------------------

Kaldi+PDNN is publicly available from GitHub. Go to your Kaldi setup (e.g., egs/wsj/s5) and check out the latest version.

            svn co https://github.com/yajiemiao/kaldipdnn/trunk/run_wsj run_wsj

svn co https://github.com/yajiemiao/pdnn/trunk pdnn

svn co https://github.com/yajiemiao/kaldipdnn/trunk/steps_pdnn steps_pdnn

The scripts and RESULTS appear under run_wsj. Kaldi+PDNN currently supoorts the following datasets:

run_timit     --   TIMIT

run_wsj      --   Wall Street Journal

run_swbd   --   Switchboard (the complete 300-hour setup)

run_swbd_110h    --   Switchboard (the 110-hour setup)

run_tedlium --   TED-LIUM (transcribing TED talks)

Benchmark Results

----------------------------------------------------------------------------------------------

Systems with *TBA* are being verified, and their numbers will be updated soon.

TIMIT                                                            PER(%)       dev [test]

run-dnn.sh 18.8 20.2	run-bnf-tandem.sh 16.3 17.8	run-dnn-fbank.sh 20.2 21.6
run-cnn.sh 19.0 19.7	run-dnn-maxout.sh 17.5 19.0

Wall Street Journal                                      WER(%)     dev93 [eval92]

run-dnn.sh 7.18 4.08	run-bnf-tandem.sh 6.72 3.81	run-dnn-fbank.sh 7.38 4.27
run-cnn.sh 7.27 4.29

Switchboard (the 300-hour setup)           WER%      Hub'00-SWB [HUB'00]

run-dnn.sh 15.4 21.4

run-bnf-tandem.sh 15.0 21.7

run-dnn-fbank.sh TBA

Switchboard (the 110-hour setup)           WER%      Hub'00-SWB [HUB'00]

run-dnn.sh 19.2 25.6	run-bnf-tandem.sh 18.0 25.0	run-dnn-fbank.sh 21.7 28.2
run-cnn.sh 19.5 25.6	run-bnf-fbank-tandem.sh 19.6 27.7

TED-LIUM                                                       WER%     dev [test]

run-dnn.sh 23.3 20.4	run-bnf-tandem.sh 22.0 19.3	run-dnn-fbank.sh 24.5 21.4
run-cnn.sh 22.7 19.7	run-dnn-maxout.sh 22.9 19.7

Systems

----------------------------------------------------------------------------------------------

Core	run-dnn.sh	Hybrid model with DNN and fMLLR features
	run-bnf-tandem.sh	Tandem system with deep bottleneck features trained over fMLLRs
	run-dnn-fbank.sh	Hybrid model with DNN and filterbanks
	run-cnn.sh	Hybrid model with CNN and filterbanks

Extentions	run-dnn-maxout.sh	Hybrid model with deep maxout networks and fMLLRs
	run-bnf-fbank-tandem.sh	Tandem system with deep bottleneck features trained over filterbanks
	SAT for DNNs	Various Speaker Adaptive Training recipes for DNNs. Refer to here

Contacting us

----------------------------------------------------------------------------------------------

You can post your questions, suggestions, and discussions to GitHub Issues.

You can also send emails to Yajie Miao (yajiemiao AT gmail.com)

Reference

----------------------------------------------------------------------------------------------

Please cite the following manuscript if you use Kaldi+PDNN in your papers/publications:

Yajie Miao, "Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN," arXiv:1401.6984, 2014.