Leskovec et. al., PAKDD 2006

From ScribbleWiki: Analysis of Social Media

Jump to: navigation, search

Contents

Patterns of Influence in a Recommendation Network

Introduction

The problem addressed by the paper is essentially to enumerate and count the subgraphs in a large network in an efficient manner. These subgraphs represent the patterns of influence. A node in the graph influences another node and the second node influences another set of nodes and so on resulting in what is called as an information cascade. Networks of interest include recommendation networks where one node influences another to make a favorable decision (for example, to buy a product). Another example of an information cascade is the network of rumor-spreading. One may ask a question as to how far would the rumor spread in the chain of people. Or for that matter how deep in the cascade can product recommendations diffuse (on an average). Answers to these questions can be found be analyzing real-world recommendation graphs and enumerating the information cascades. This paper

  • Implements an efficient algorithm for subgraph mining. Here, each subgraph refers to an information cascade which is

also known as a recommendation chain.

  • Applies the subgraph mining algorithm to a real world recommendation network dataset (Leskovec et. al., ACM 2006) in order to extract all the cascades

Subgraph Mining

The dataset available for analysis is a recommendation network data obtained from a retailer chain. Each data point contains the following information:

  • Customer ID of the sender of recommendation U (anonymized)
  • Customer ID of the recipient of recommendation V (anonymized)
  • Time of recommendation T
  • Flag bit whether the recommendation resulted in a purchase F

(Other parameters have been excluded as they are irrelevant)

We can visualize this as a directed graph between nodes U and V where the edge is labeled by {T,F}. First, incoming edges to a node that occur after the node made a purchase are removed. This is because only the recommendations that were received before the purchase could have influenced the purchase. Secondly, nodes which are not successful in inducing purchases by connected nodes are removed.

The subgraph mining algorithm proceeds as follows:

  1. Enumerating Cascades:
    1. For every node N, the cascade in its neighborhood is explored
    2. Starting from node N and moving a fixed number of steps forward or backward, a subgraph is created
  2. Counting cascades:
    1. For every cascade discovered, check if it is similar to a previously discovered cascade. Increment the count if similar
      1. If the cascade is small, similarity is computed using graph isomorphism
      2. If the cascade is large, a hash value for the cascade is computed. The hash value is computed using as input the number of nodes in the cascade, number of edges, sorted in-degree and out-degree sequences of nodes. If the hash-value of a cascade matches with that of another cascade, graphs isomorphism algorithm are applied to verify the similarity. This can save a lot of computation if the hash values turn out to be not the same.

Results

The cascade counting algorithm is applied to an actual recommendation dataset outlined in (Leskovec et. al., ACM 2006). Recommendation networks for the 4 product categories DVDs, Movies, Books and Music are mined in order to extract the most frequent cascades. The cascade sizes for all the 4 recommendation networks show power-law distribution which means that small cascades are most frequent in a network.Figure 1 shown some of the most common cascades which have very simple topologies and are shallow. Since these cascades are the most common, we observe that the propagation of influence in the network is limited in most cases.

Figure 1: (Leskovec et. al., PAKDD 2006)Shows the most frequent cascades across all the product categories of a recommendation network. The most frequent cascades are shallow which means that influence of recommendations in the network is limited.
Figure 1: (Leskovec et. al., PAKDD 2006)Shows the most frequent cascades across all the product categories of a recommendation network. The most frequent cascades are shallow which means that influence of recommendations in the network is limited.
Views
Personal tools
  • Log in / create account