cmds/run_SdA.py -- Training Stacked Denosing Autoencoders

-------------------------------------------------------------------------------------------------------------------

Arguments

argument meaning                           
default value / comment
--train-data training data specification required
--nnet-spec                    
--nnet-spec="d:h(1):h(2):...:h(n):s"   
   
Eg.250:1024:1024:1024:1024:1920

required. d-input dimension; h(i)-size of the i-th hidden layers; s-number of targets
--output-file path to save the resulting net required
--wdir working directory                    
required
   
--param-output-filepath to save model parameters in the PDNN formatby default "": doesn't output PDNN-formatted model
--cfg-output-filepath to save model configby default "": doesn't output model config
--kaldi-output-filepath to save the Kaldi-formatted modelby default "": doesn't output Kaldi-formatted model
   
--corruption-level corruption factor for binary random masking by default 0.2
--learning-rate learning rate value; constant by default 0.01
--epoch-number number of epochs
by default 10                      
--batch-size mini-batch size during training
by default 128
--momentum the momentum factor
by default 0.5
--ptr-layer-number number of layers to be pre-trained by default train all the hidden layers
--sparsitythese two parameters together achieve sparse autoencoders at each layer of SdA. The implementation follows this paper. sparsity and sparsity-weight correspond to rho on page14 and beta on page15 in the paper.
by default both parameters are set to None, we are imposing no sparsity
--sparsity-weight
   
--hidden-activation hidden activation function by default sigmoid                  
--1stlayer-reconstruct-activation reconstruction activation function for the first layer. Now supports sigmoid and tanh reconstruction activation functions.  by default sigmoid. If your inputs are mean (and sometimes variance) normalized, you need to use tanh for feature reconstruction.



Example

python cmds/pdnn/run_SdA.py --train-data "train.pickle.gz,partition=600m,stream=true,random=true" \
                            --nnet-spec "330:1024:1024:1024:1024:1901" \
                           
--wdir ./ --ptr-layer-number 4
                           
--1stlayer-reconstruct-activation tanh \            
                           
--param-output-file sda.mdl

Your application may not have targets, for example, unsupervised training. In this case, you still need to specify the target number (1901 in this example). However, you can specify a fake target number. The layers which will be trained are decided by --ptr-layer-number. For instance, this example only trains the first 4 hidden layers and ignores the final softmax layer.