In this thesis, we propose two novel tools, AutoSplit and random walk with restarts (RWR), for mining patterns from video and mixed-media data sets.
AutoSplit and its variants adopt ideas from independent component analysis (ICA). AutoSplit is more flexible than PCA, and can model real world, non-Gaussian data sets better. AutoSplit achieves video classification accuracy comparable to previous work based on hand-picked features. A proposed variant, Clustering-AutoSplit, gives a clearer separation of text from image background, compared to one of the best PCA-based methods.
For mixed-media data sets, we propose a graph-based data representation and propose to use RWR to find cross-modal associations. RWR outperforms previous methods on image captioning in terms of accuracy - achieving an improvement of 12.8 percentage points (a relative improvement of 58%). RWR is also capable of doing general multi-modal queries and retrievals.
For future work, we propose to extend and apply AutoSplit and RWR to other tasks. We also plan to summarize our experiences into guidelines for matching data mining tasks to tools.