Learning Tree Structures for Conditional Random Fields (CRFs)
This page provides code from my project on learning tree structures
for Conditional Random Fields (CRFs). This code is a somewhat improved
version of the code used for this paper:
Joseph K. Bradley and Carlos Guestrin (2010).
"Learning Tree Conditional Random Fields."
International Conference on Machine Learning (ICML).
bibtex/abstract PDF
Page contents:
Project Overview
Main goals:
- Learn structured models of conditional distributions P(Y|X).
- Learn tractable structures (trees).
- Make use of local inputs, i.e., local dependencies between subsets of Y variables and subsets of X variables.
- Use scalable methods.
Approach:
- Do variable selection beforehand; i.e., for each Yi, select a small set of variables in X which Yi directly depends on.
- For all pairs (Yi, Yj), compute an edge weight.
- This was our main contribution: choosing good edge weights.
- Choose a max spanning tree.
- Using this tree structure for the CRF, do parameter learning.
CRF Learning Code
Download: The code is available here as a gzipped tar file.
Code Overview
Main parts of code:
- Factors: table factors, gaussian factors, conditional versions for CRFs
- Models: decomposable (junction tree), Bayes nets, CRFs, synthetic models
- Inference: exact for tractable models, sampling and BP for intractable
- Datasets: datasets, synthetic data
- Parameter learning: learning for factors, learning for models via gradient methods
- Structure learning: Chow-Liu for generative models, my MST-based methods for CRFs
- Discriminative learning: regression, decision trees/stumps, boosting
About the code:
- The code is C++, with a few Matlab scripts to process/plot results.
- We use CMake to build our code.
- Our main dependencies are IT++ (matrix/vector library) and Boost (C++ libraries).
- The code is pretty well-documented, with Doxygen-generated HTML documentation.
Installation and Getting Started
The code has detailed instructions on how to install the necessary
dependencies and build our code.
Once you download the code, look at these files in the home directory:
- README
- introduction, installation, getting started
- LICENSE.txt
- licensing information
- AUTHORS.txt
- list of contributors
- TREECRFS.txt
- info on duplicating my experiments
- doc/html/index.html
- Doxygen-generated documentation
Licensing
This code is mostly released under the
GNU General Public License (GPL).
However, a few files are released under the
GNU Lesser General Public
License (LGPL).
See the LICENSE.txt file for more details.
The code is a subset of the
SELECT Lab's
larger codebase.
We are planning to release the entire codebase under a more permissive
license before long.
If You Have Questions
If you have questions, get weird results, etc.,
please feel free to contact me; my email is listed at
the top of my homepage.
If you find bugs, definitely contact me! :)