Marmite: Re-purposing Web Content through End-User Programming

Overview

There is a tremendous amount of web content available today, but it is not always in a form that supports end-users' needs. For example, it is easy to find a list of hotels in Portland, but not so easy to sort them by distance to the Portland convention center. All of the data and services needed to accomplish this goal already exist, but they are not in a form amenable to this task.

A rapidly growing community of developers is addressing this problem by creating "mashups" that combine existing web content and services in new ways. However, creating a mashup takes a high level of programming expertise.

Towards this end, we are developing Marmite, a tool that will let everyday end-users create mashups by making it easy to extract content from web pages, process it in a data-flow manner, integrate it with other data sources, and direct it to a variety of useful sinks, such as databases, map services, and compilable source code that can be further customized.

This proposal focuses on three issues: (1) making it easy to select what content to crawl, (2) developing a hybrid dataflow / spreadsheet UI that shows what content has been extracted and how that content is transformed, and (3) developing techniques for handling exceptions in the dataflow. Success in this research will result in a tool that will let average web users create mashups, potentially stimulating the creation of many new kinds of services. We expect Marmite to be applicable across many web-based scenarios and to be of interest to startups and existing developers of mashups. This project will enlist the annual participation of 25 undergrad and graduate students through courses taught by the PI, and has the potential for technology transfer through corporate partners at Carnegie Mellon University's Human Computer Interaction Institute and CyLab.

Results

Marmite's programming model centers around a hybrid dataflow / table visualization. In our user studies, we found that people with spreadsheet experience could understand this model and were capable of developing fairly sophisticated mashups with little training. For example, half of our participants could re-create a simple form of the well-known housingmaps mashup in about 15 minutes.

Significance

Our work on Marmite adds to the body of knowledge on end-user programming, looking at how a hybrid dataflow / table visualization can help structure the task of programming. Our work on Marmite also points out a different direction for end-user programming, namely applying these techniques for manipulating and processing large and often messy sets of data.

Current Status

We are currently no longer working on Marmite. We have moved on to a new project named Haggis (yum!), which is looking at providing a different model for extracting, processing, and visualizing data extracted from the web and other sources, in particular, real-time data from sensor nets.

Screenshots


Shows a simple data flow and example spreadsheet view

Shows a map visualization, re-creating the housingmaps.com website

People

Publications

Software

Video

Talks

Funding

This project is funded by NSF Award 0646526 and by Microsoft SensorMap