Kaldi+PDNN - Building DNN-based ASR Systems with Kaldi and PDNN





News
----------------------------------------------------------------------------------------------
   
Nov 2014. A new version is ready. Check the change log for the list of updates. 
   
Nov 2014. Kaldi+PDNN is moved to GitHub for better code management and community participation.
   
Nov 2014. Multi-task Learning is added to PDNN. This enables DNN training over multiple languages, domains, dialects, etc.
   
Jul 2014. SAT for DNNs systems are added.
   
Apr 2014. A new version is released. Check the change log for the list of updates.



About
----------------------------------------------------------------------------------------------
    
Kaldi+PDNN builds state-of-the-art DNN acoustic models using the open-source Kaldi and PDNN toolkits. The pipeline has 3 stages:
   
1. Initial GMM models are built with the existing Kaldi recipes
     
2. DNN/DCN acoustic models are trained by PDNN
     
3. Trained DNN/DCN models are ported back to Kaldi for decoding or tandem system building


Hightlights of Kaldi+PDNN include:
   
Model diversity. Deep neural networks, deep convolutional networks, bottleneck-feature tandem systems
     
PDNN toolkit. Easy to use, fast to implement new ideas.       [more info]
     
Open license. All the code is released under Apache 2.0, the same license as Kaldi
     
Consistency with Kaldi. Scripts follow the Kaldi style and can be integrated with any of the existing example setups.



Requirements
----------------------------------------------------------------------------------------------
    
1. A GPU should be available on your machine. Otherwise, PDNN will use CPUs.
     
2. Initial GMM model building should be done with the existing Kaldi recipes
    
3. Install Theano. Refer to the Theano installation for more details. If you are running Ubuntu Linux, following steps in this document
   
    will set up Theano for you.
    
4. Install pfile_utils-v0_51.  This script installs it automatically. Add pfile_utils-v0_51/bin to the PATH environment variable if it is NOT
   
     installed under the Kaldi tools folder




Download
----------------------------------------------------------------------------------------------
  
Kaldi+PDNN is publicly available from GitHub. Go to your Kaldi setup (e.g., egs/wsj/s5) and check out the latest  version.
    
   
svn co https://github.com/yajiemiao/kaldipdnn/trunk/run_wsj run_wsj
   
svn co https://github.com/yajiemiao/pdnn/trunk pdnn
   
svn co https://github.com/yajiemiao/kaldipdnn/trunk/steps_pdnn steps_pdnn
    
The scripts and RESULTS appear under run_wsj. Kaldi+PDNN currently supoorts the following datasets:
   
run_timit     --   TIMIT
   
run_wsj
      --   Wall Street Journal
   
run_swbd
   --   Switchboard (the complete 300-hour setup)
   
run_swbd_110h
    --   Switchboard (the 110-hour setup)
   
run_tedlium
  --   TED-LIUM (transcribing TED talks)



Benchmark Results
----------------------------------------------------------------------------------------------
    
Systems with *TBA* are being verified, and their numbers will be updated soon.

TIMIT                                                              PER(%)       dev [test]
   
  run-dnn.sh      18.8    20.2   run-bnf-tandem.sh   16.3    17.8
 run-dnn-fbank.sh    20.2   21.6
  run-cnn.sh      19.0    19.7
  run-dnn-maxout.sh   17.5   19.0

   
Wall Street Journal                                      WER(%)
     dev93 [eval92]
    
  run-dnn.sh    7.18   4.08
  run-bnf-tandem.sh   6.72   3.81
 run-dnn-fbank.sh     7.38   4.27
  run-cnn.sh    7.27    4.29
 

  
Switchboard (the 300-hour setup)           WER%      Hub'00-SWB [HUB'00]
    
  run-dnn.sh   15.4   21.4   run-bnf-tandem.sh      15.0   21.7
  run-dnn-fbank.sh     TBA
  
Switchboard (the 110-hour setup)           WER%      Hub'00-SWB [HUB'00]
   
  run-dnn.sh   19.2   25.6   
  run-bnf-tandem.sh           18.0   25.0
  run-dnn-fbank.sh    21.7   28.2
  run-cnn.sh   19.5   25.6
 run-bnf-fbank-tandem.sh  19.6   27.7

  
TED-LIUM                                                       WER%     dev [test]
   
  run-dnn.sh    23.3   20.4   run-bnf-tandem.sh      22.0   19.3
 run-dnn-fbank.sh    24.5   21.4
  run-cnn.sh    22.7   19.7 
  run-dnn-maxout.sh     22.9    19.7   




Systems
----------------------------------------------------------------------------------------------
  
Core
run-dnn.sh Hybrid model with DNN and fMLLR features
run-bnf-tandem.sh Tandem system with deep bottleneck features trained over fMLLRs
run-dnn-fbank.sh Hybrid model with DNN and filterbanks
run-cnn.sh Hybrid model with CNN and filterbanks

   
Extentions 
run-dnn-maxout.sh 
Hybrid model with deep maxout networks and fMLLRs
run-bnf-fbank-tandem.sh Tandem system with deep bottleneck features trained over filterbanks
SAT for DNNs
Various Speaker Adaptive Training recipes for DNNs. Refer to here




Contacting us
----------------------------------------------------------------------------------------------
 
You can post your questions, suggestions, and discussions to GitHub Issues.
   
You can also send emails to Yajie Miao (yajiemiao AT gmail.com)



Reference
----------------------------------------------------------------------------------------------
    
Please cite the following manuscript if you use Kaldi+PDNN in your papers/publications:
     
Yajie Miao, "Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN," arXiv:1401.6984, 2014.