# Glossary of Terms

### From ScribbleWiki: Analysis of Social Media

## Contents |

## A

**Academic Genealogy**

A project attempting to collect information about all mathematicians (who hold a doctoral degree, including degrees in computer science) in the world. It links students to advisors and you can see if you have Gauss somewhere in your tree:). Academic Genealogy Project

**Author Dispersion**

A measure of how spread out the discussion of a particular topic is. High values indicate that many people are talking about a particular topic, where low values indicate that discussion is centered around a small group of people. This measure is more indicative than just counting of unique authors for a topic, as error in the topic classifications dilutes the understanding of the spread of discussion. (Glance et al, KDD 2005)

**Average Diameter**

Same as the **characteristic path length** except that we take the mean of the average shortest path lengths over all nodes, instead of median. (Chakrabarti & Faloutsos, CSUR 2006)

## B

**Board Dispersion**

Similar to **author dispersion**, this measures how many different places are seeing discussion
about a particular topic. Topics that have a board dispersion that grows rapidly over time indicates a viral issue. If such a viral issue is negative, prompt attention is often recommended. (Glance et al, KDD 2005)

**Boardscape**

The concept of the world of boards. The collection of all the boards and the potential aggregated power of all board communities and their members (http://www.boardscape.com) The term was coined by Ron Kass of Boardtracker.

**Burst (of Activity)**

A signal of the appearance of a topic in a document stream with certain features rising sharply in frequency as the topic emerges. It does not typically rise smoothly to a crescendo and then fall away, but rather exhibits frequent alternations of rapid flurries and longer pauses in close proximity. (Kleinberg, SIGKDD 2002)

**Buzz Tracking**

Following trends in topics of discussion and understanding what new topics are forming. (Glance et al, KDD 2005)

## C

**Characteristic path length**

For each node in the graph, consider the shortest paths from it to every other node in the graph. Take the average length of all these paths. Now, consider the average path lengths for all possible starting nodes, and take their median. (Bu & Towsley, 2002)

## D

## E

**Early Alerting**

Informing subscribers when a rare but critical, or even fatal, condition occurs. (Glance et al, KDD 2005)

**Effective Diameter (a.k.a. eccentricity)**

Minimum number of hops in which some fraction (say, 90%) of all connected pairs of nodes can reach each other (Tauro et al, 2001) Can be calculated from hop-plot. (Chakrabarti & Faloutsos, CSUR 2006)

**Erdos number**

It is your distance from Erdös in terms of coauthorship. The Erdös Number Project

## F

**Folksonomy** (a.k.a. collaborative tagging, social classification, social indexing, social tagging)

Sets of categories that are derived based on the tags that are used to characterize some resource. Tags are given by users, not experts. (Halpin et al, WWW 2007)

To see examples, go to del.icio.us, Flickr, Furl, Rojo, Connotea, Technorati, and Amazon.

## G

## H

**Hop-plot**

Starting from a node *u* in the graph, we find the number of nodes <math>\,\! N_h(u)</math> in a neighborhood of *h* hops. We repeat this, starting from each node in the graph, and sum the results to find the total neighborhood size <math>N_h</math> for *h* hops <math>(N_h = \Sigma_u N_h(u))</math> . The hop-plot is just the plot of <math>N_h</math> versus *h* (Chakrabarti & Faloutsos, CSUR 2006)

## I

## J

## K

## L

## M

## N

## O

## P

## Q

## R

## S

**Sentiment Mining**

Extracting aggregate measures of positive vs. negative opinion. (Glance et al, KDD 2005)

**Splog**

False blogs with machine generated or hijacked content whose sole purpose is to host ads or raise the PageRank of target sites. (Kolari et al, 2006)

## T

## U

## V

## W

**Web Spamming (a.k.a. Spamdexing)**

Any deliberate human action that is meant to trigger an unjustifiably favorable relevance or importance for some web page, considering the page’s true value. (Gyongyi and Garcia-Molina, 2004)

## X

## Y

## Z

## Other

**-sphere**

A collection of a particular data on the internet. (e.g. blogospehere, splogosphere, twittersphere)