Mary's Cascade code (c) Mary McGlohon, 2010 mmcgloho@cs.cmu.edu *********************************************** This is a small package for generating and processing cascades, from a list of edges. It is a work in progress. It is also research code that I originally wrote without intending to share with anyone. Before posting I've cleaned up and documented a few things, and removed some of the really horrible artifacts from deadline crunch time. But it's still got some rough edges. CLASSES -------- Cascade.java This is the baseline Cascade class. A cascade is a subgraph of nodes and edges. It is typically a tree, but does not have to be (e.g. our blogs work in SDM 2007). This keeps track of nodes, along with authors and timestamps, and can test for isomorphism (either just structures or structures with authors). --------- BuildCascadesEfficient.java This will take a list of edges in the format src,dest,time,srcauthor and make file where each line is one cascade (connected component). If it's important to you to have the authors and timestamps for root nodes, you will want to have lines for root nodes in format rootid,-1,roottime,rootauthor The -1 will not be stored as a node. ------- CascadeIterator.java This just takes a file and makes it so you can fetch Cascades line by line. ------- CascadeUtils.java and CascadeViz.java These are just utilities files (currently pretty small) that are used by the other classes. -------- CascadeMarkovModel.java "Yo dawg, I herd you liek graphs, so I made you a graph of graphs." This is a tree of cascades. It begins with a cascade of a single node, which then has one child of a cascade of two nodes. That then has two possible children, depending on where the next node attaches. In the case of an authored version, these will also have all possible combinations of different authors. From there, you can read in cascades one by one, and it will keep track of the paths these cascades took-- that is, the order in which nodes were added. It may be more explanatory if you just go on and look at uk.sci.weather-cascades.pdf or uk.sci.weather-authored-cascades.pdf ---------- LearnCascadeMarkovModel.java This is what you'll actually run. It takes in some arguments and spits out an appropriate CascadeMarkovModel in graphviz format. ************************************** TESTS I have a small test of the cascade isomorphisms. It does not currently test on cascades where multiple parents are allowed. To run the test, you need JUnit installed. Then compile and run java org.junit.runner.JUnitCore CascadeTest *************************************** INPUT/OUTPUT FILES The input file you will need is uk.sci.weather-all It is a list of all the edges (including root posts) in the sample of this Usenet group. The output files that will be generated are: uk.sci.weather-temp Intermediate file in BuildCascadesEfficient uk.sci.weather-authored-cascades uk.sci.weather-cascades Output files from BuildCascadesEfficient, depending on whether authors used uk.sci.weather-cascades.dot uk.sci.weather-authored-cascades.dot Graphviz files from LearnCascadeMarkovModel uk.sci.weather-cascades.pdf uk.sci.weather-authored-cascades.pdf Dot output from building the .dot files