Tools for evaluating complex systems, machine language, and complex tasks


This research material was developed by the Test Group within the Radar project. It is being disseminated through this site so that other researchers can leverage our efforts into other projects and fields. Some of this content is used to train Radar components and other content is used during the Radar evaluation test.

The content describes a technical conference (ARDRA), the planning of the conference, and small changes that need to be made to the conference plan.

The most important caveat to this collection is that over 90% of the content is fabricated. As such, there are known flaws in the content. For example, characters in email may have slightly different writing styles due to the multiple authors. Also note that any reference to real persons or organizations are strictly fictional representations.

For more detail on content creation, experiment design and protocol, execution of the experiment, see the following paper:

Requirements for use of this content

We are pretty flexible with respect to the use of this content. The only requirements we ask you to satisfy are:

  1. Do not redistribute this content. Instead, point interested parties to this site. This requirement is mostly to prevent discrepancies in the research community due to version changes.
  2. Likewise, keep track of what version you are using and report the version number when disseminating your work.
  3. Cite the paper listed above and a link to this site in publications/proposals if this content is used for the work in question.
  4. Send us references and links for any publications that result from the use of this content ( We will add what you send to a list on this page unless you request otherwise.
  5. Contact us ( if you make substantial changes to this content and want to distribute your version to others. We will provide a link to your version on this site.

Email Corpus

Readme file describing syntax (v1.0, 132 Kb)
  1. Wargaming files (v1.0, 711 messages, 1 MB zip file)
  2. Placeholder for backstory archive (287 messages)

Static Files

These files document aspects of the world and do not require software integration.
  1. ARDRA Conference Schedule in PDF and Excel: the initial schedule, including notes about conference events. The Excel file also has a tab containing the information about world rooms and buildings (rough ranges for capacities, etc).
  2. Budget Worksheet: this Excel file provided to help participants keep track of costs. It was optional and very few participants altered this file.

Database Content

The bulk of the content was present in a Postgres database (some of these releases also require Tomcat). These releases are combinations of three main components - ArdraConference, CorpusGUI, and Vendors. These are:
  1. ArdraConference: the world, including email, represented in database form. Note that this database includes some data and tables specific to RADAR components. Therefore, some of these tables are probably indecipherable to those who did not develop those components.
  2. CorpusGUI: a Java front end for editing the ArdraConference database.
  3. Vendors: a full functioning vendor e-commerce site, complete with fake vendors and prices.
We are providing several release packages for these components. Instructions and descriptions for each of the individual components are available inside their respective Zip archives. These should be the first place to consult if you are having trouble with one of the packages. Please read the Packages-Readme.txt file first.
  1. CorpusGUI only (10.6 MB)
  2. ArdraConference only (9.4 MB)
  3. A combined ArdraConference and CorpusGUI (20 MB), with a single shared database
  4. Vendor site only (5.7 MB)
  5. A combination of AC, GUI, & Vendor (25.7), again with an single shared database
  6. dynaweb.war: a quick install for the ArdraConference site. To install it, deploy it by selecting the file from the form on the Tomcat Manager page: "http://localhost:8080/manager/html"
  7. myapp.war: a quick install for the Vendor site. It can likewise be installed from the Tomcat Manager page.
Software authors: S. Rachael Bennett, Jialiang Wang, George Haff, Kyle Cunningham, Matt Lahut, Robert McGuire, Isaac Simmons, Anthony Tomasic, & Aaron Steinfeld.


The formal RADAR evaluation score was a complex algorithm which incorporated schedule quality, website updates, and briefings. This scoring function is described at a high level in Freed, et al (2008) and Steinfeld, et al (2006). The software to execute this algorithm is proprietary and was developed by an external evaluator.
  1. Post-test survey: A validated survey to measure subject experiences. See Steinfeld, Quinones, Zimmerman, et al (2007) for more information, including a full list of survey questions. Newer questions were added in later years. These will be released in the near future.
  2. Performance measurements (future): We attempted to identify reusable measurements that (a) have value outside the RADAR evaluation test and (b) are easily understood by a wide range of users. These will be posted here when ready.

Publications which use this material

Many of the publications under the Radar project utilize content for unit tests or data from the evaluation tests which use this content. Please review the Radar publications page for a full list or research under this project.
  1. Faulring, A., Myers, B., Mohnkern, K., Schmerl, B., Steinfeld, A., Zimmerman, J., Smailagic, A., Hansen, J., & Siewiorek, D. (2010). Agent-assisted task management that reduces email overload. International Conference on Intelligent User Interfaces (IUI).PDF (785 KB) copyrighted
  2. Faulring, A., Mohnkern, K., Steinfeld, A., & Myers, B. A. (2009). The design and evaluation of user interfaces for the RADAR learning personal assistant. AI Magazine, 30(4), 74-84. link
  3. (RADAR Overview Paper) Freed, M., Carbonell, J., Gordon, G., Myers, B., Siewiorek, D., Smith, S., Steinfeld, A., & Tomasic, A. (2008). RADAR: A personal assistant that learns to reduce email overload. AAAI Integrated Intelligence. PDF (386 KB) copyrighted
  4. Faulring, A., Mohnkern, K., Steinfeld, A., & Myers, B. (2008). Successful user interfaces for RADAR. In Proc. ACM Conference on Human Factors in Computing Systems (CHI) Workshop on Usable Artificial Intelligence. PDF (149 KB) copyrighted
  5. Steinfeld, A., Bennett, S. R., Cunningham, K., Lahut, M., Quinones, P.-A., Wexler, D., Siewiorek, D., Hayes, J., Cohen, P., Fitzgerald, J., Hansson, O., Pool, M., & Drummond, M. (2007). Evaluation of an Integrated Multi-Task Machine Learning System with Humans in the Loop. In Proc. NIST Performance Metrics for Intelligent Systems Workshop (PerMIS). PDF (1.1 MB) copyrighted.
  6. Steinfeld, A., Quinones, P.-A., Zimmerman, J., Bennett, S. R., & Siewiorek, D. (2007). Survey measures for evaluation of cognitive assistants. In Proc. NIST Performance Metrics for Intelligent Systems Workshop (PerMIS). PDF (457 KB) copyrighted.
  7. A. Steinfeld, R. Bennett, K. Cunningham, M. Lahut, P.-A. Quinones, D. Wexler, D. Siewiorek, P. Cohen, J. Fitzgerald, O. Hansson, J. Hayes, M. Pool, and M. Drummond, The RADAR Test Methodology: Evaluating a Multi-Task Machine Learning System with Humans in the Loop (Tech Report CMU-CS-06-125, CMU-HCII-06-102), Pittsburgh, PA: Carnegie Mellon University, School of Computer Science, 2006.

Related (free) corpora, etc

  1. Enron Enron Email Dataset

Aaron Steinfeld,

Valid HTML 4.01!   Valid CSS!   Level Triple-A conformance icon, W3C-WAI Web Content Accessibility Guidelines 1.0