Lin and Sundaram, ICME 2007
From ScribbleWiki: Analysis of Social Media
Blog Antenna: Summarization of Personal Blog Temporal Dynamics Based on Self-Similarity Factorization
Can be found here.
This paper talks about a novel framework for summarizing personal blog activity. The blog media is characterized by the temporal dynamics of their content and high posting volume, and the authors mention that this kind of activity is difficult to capture using static tag clouds or time-based tag streams. There are three aspects of interest in the analytical framework of the authors. They represent the blog temporal sequence using self-similarity matrices defined on the histogram intersection similarity measure of the content and link attributes of posts. Second, the temporal relationship of posts is determined as clustering using symmetric non-negative matrix factorization of the self-similarity matrices, and the clustering quality is determined by a modularity function, that they describe later. They summarize the blog temporal dynamics using a "blog antenna" summary based on the similarity factorization results.
The self similarity matrices are defined by two kinds of attributes:
- the blog content: essentially tf-idf vectors of post words, and
- post links: the tf-idf vectors of links on the post.
The authors claim that if two posts are similar to each other with respect to the attributes, it is possible that they are related to certain common things that might not be explicit in the content, which they call the posts "themes". The factorization for clustering posts into buckets is done by using a modularity function, detailed in the paper. The method decides on a the number of clusters too, which is advantageous.
After the factorization of the self similarity matrices, visual summaries are created by keeping in mind the following factors:
- how many themes exist in the blog.
- how does the sequence of the posts relate to the themes.
- is there a relationship between themes independent of time.
- can a blog be observed from all of the above perspectives.
The 3D antenna view looks like an antenna show various themes project out of a line in various directions in different colors with respect of time. The content drifting view shows how time-independent themes relate to each other. A content evolving view helps in viewing posts in terms of the strength with respect of dominant themes. A variant of this view shows posts having various themes in different proportions.
The Trec Blog Track 2006 data is used to visualize blogs from various domains.