cmds/run_DNN.sh  --  Training Deep Neural Networks
---------------------------------------------------------------------------------------------------------------------
Arguments

argument                 
meaning/value                              
comments                                     
--train-data training data specification required
--valid-data valid data specification required
--nnet-spec            
--nnet-spec="d:h(1):h(2):...:h(n):s"   
   
Eg.250:1024:1024:1024:1024:1920
required. d-input dimension; h(i)-size of the i-th hidden layers; s-number of targets
--wdir     
working directory required
   
--param-output-file
path to save model parameters in the PDNN format
by default "": doesn't output PDNN-formatted model
--cfg-output-file
path to save model config
by default "": doesn't output model config
--kaldi-output-file
path to save the Kaldi-formatted model
by default "": doesn't output Kaldi-formatted model
--model-save-step
number of epochs between model saving
by default 1: save the tmp model after each epoch
 
--ptr-file
pre-trained model file                     
by default "": no pre-training
--ptr-layer-number how many layers to be initialized with the pre-trained model 
required if --pre-file is provided
 
--lrate learning rate by default D:0.08:0.5:0.05,0.05:15
--batch-size mini-batch size for SGD by default 256
--momentum the momentum by default 0.5
    
--activation 1. sigmoid  2. tanh
3. rectifier                 
by default sigmoid

4. maxout:${group_size}
when using maxout, you need to specify the group size variable, i.e., the number of units in each max-pooling group. More details can be found at the bottom of this page and also in this paper
  
--input-dropout-factor
dropout factor for the input layer  (features)
by default 0: no dropout is applied to the input features
--dropout-factor
comma-delimited dropout factors for *hidden layers*. Note the matching between dropout factors and network structure (nnet-spec)
E.g.
--dropout-factor 0.2,0.2,0.2,0.2
by default "": no dropout is applied. This is equivalent to setting dropout factors to all 0s. However, the latter case will be slower. Thus, "--dropout-factor  0,0,0,0" is NOT recommended.
   
--l1-reg l1 norm regularization weight
train_objective = cross_entropy + l1_reg * [l1 norm of all weight matrices]
by default 0
--l2-reg l2 norm regularization weight
train_objective = cross_entropy + l2_reg * [l2 norm of all weight matrices]
by default 0
--max-col-norm the max value of norm of gradients; usually used in dropout and maxout
by default none: not applied



Example

python pdnn/cmds/run_DNN.py  --train-data "train.pickle.gz,partition=600m,stream=true,random=true" \
                             --valid-data "valid.pickle.gz,partition=600m,stream=true,random=true" \
                             --nnet-spec "330:1024:1024:1024:1024:1901" \
                             --ptr-file dnn.ptr --ptr-layer-number 4 \
                             --activation sigmoid --wdir ./ \
                             --param-output-file nnet.mdl 
--cfg-output-file nnet.cfg


Maxout

When using maxout as the activation, h(i) in nnet-spec means the number of maxout units. In the following example, the sizes of the weight matrices, from the lowest to the highest layer, are 330x1200, 400x1200, 400x1200, ...

python pdnn/cmds/run_DNN.py --train-data "train.pickle.gz,partition=600m,stream=true,random=true" \
                            --valid-data "
valid.pickle.gz,partition=600m,stream=true,random=true" \
                            --nnet-spec "330:400:400:400:400:1901" \
                            --lrate "D:0.008:0.5:0.05,0.05:15" \
                            --activation "maxout:3"
--wdir ./ \
                           
--param-output-file nnet.mdl  --cfg-output-file nnet.cfg

Some of our observations: (1) a much smaller learning rate should be used than when sigmoid networks; (2) pre-training doesn't help much