A nice explanation of LSTMs.
It's expensive to compute the softmax layer over the vocabulary to compute p(word|context). Three solutions which have been shown to work are: hierarchical softmax (Goodman 2001), noise contrastive estimation (Gutmann and Hyvarinen 2010), self-normalizing neural networks (Devlin et al. 2014).
Chris Dyer's blog.
a paper that compares "off the shelf" dependency parsers.
a tutorial and a blog post on spectral clustering.
Andrew Jones' brief explanation for Xavier Glorot's initialization of neural network parameters.
Sergey Sundukovskiy slides on prototypes, minimal viable products (MVPs), ...etc.
Brendan's tool for visualizing syntactic trees (parseviz).
A bunch of monolingual corpora.
Compress: tar -cfvz compressed-output.tar.gz uncompressed-input-files.* OR tar -cvf mystuff.tar foo.tex fig1.eps fig2.eps && gzip mystuff.tar
Decompress: tar -xfvz compressed-input.tar.gz [-C uncompressed-output-dir]
Boyd and Vandenberghe's book "Convex Optimization"
to initialize submodules after a `git clone', execute `git submodule update --init' at the root directory (reference).
Groups, rings, fields, and vector spaces
Unicode points for Math symbols, Greek letters, Math operators (handy for preparing slides).
Tips and tricks in stochastic gradient descent land.
How to use stochastic gradient descent with L1-regularization? prox-grad, dual averaging, FRTL
installing standard R packages, custom packages in R, and what to do when cpp compilation fails while installing custom R packages
locality sensitive hashing (LSH)
history of deep learning
count-min sketches (a cool data structure that approximates counts of elements in a set)
style guidelines for python
an introduction to GCC
simulations of beta (and other) distribution density
evaluating clusterings (a ps version of the paper which I like more)
sequence labeling tutorial
configure; make; make install
step-by-step example for using GDB within Emacs to debug a C or C++ program. See this for more GDB commands.
gentle tutorial on using valgrind to find memory problems in c++ code
using screen to survive dropped ssh connections while running your jobs
productivity tips for using ssh
blacklight frontend machine blacklight.psc.teragrid.org
learning topic models; beyond svd. slides, paper
mit's matrix cookbook, and Tom Minka's awesome writeup on matrix derivatives.
Why are the objectives of logistic regression and crf models convex?
LaTeX on blogger
Eigen: a c++ matrix library