NOTE: The data sets provided here are in the old link data set format. The GDA and cGraph programs on this site now use a new link data format. Thus the data sets must be converted.
Files:
lab.zip - Co-publication data from the Auton Lab at Carnegie Mellon University.
manual.zip - Links created by a human who manually read a set of public web pages and news stories related to terrorism and subjectively linked entities mentioned in the articles.
citeseer.zip - Co-publication data from citeseer.com (coming soon - pending permission).
imdb.zip - Movie information from www.imdb.com (coming soon - pending permission).
The Names File:
The names file contains each entity in the data set and any related demographics information. Each line contains the information for a single entity in a comma seperated list. The first row contains the column labels (the first of which must be the "name" column). There are two names file for each data set:
dataset_names.txt - a names file for the filtered data set.
dataset-dems.txt - a names file for the unfiltered data set.
An example names file might look like:
name
aaa
bbb
ccc
The Links File:
The links file contains the set of links. Each line consists of a single link. The format for the link is:
UNIQUE_LINK_ID,LINK_TYPE,ENTITY1,ENTITY2,...
There are several links file for each data set:
dataset_filtered_links.txt - a links file for the filtered data set.
dataset-links.txt - a names file for the unfiltered data set.