A significant fraction of the pages on the web are generated from structured databases. A longstanding goal of the semantic web initiative is to get webmasters to make this structured data directly available on the web. The path towards this objective has been rocky at best. While there have been some notable wins (such as RSS and FOAF), many of the other initiatives have seen little industry adoption. Learning from these earlier attempts has guided the development of schema.org, which appears to have altered the trajectory. Three years after its launch over 5 million Internet domains are are using schema.org markup.
In this talk, we recount the history behind the early efforts and try to understand why some of them succeeded while others failed. We will then give an update on Schema.org, its goals, accomplishments and where it is headed. We will also discuss some of the interesting research problems enabled by such widespread availability of structured data.
Ramanathan V. Guha is a Google Fellow currently working on web search, ads and AI. He was one of the primary architects of the Cyc system. Later, while at Netscape, he created the first versions of RDF and RSS. He cofounded Epinions, a consumer review website which pioneered many of the widely used techniques in social ranking. Guha joined Google in May of 2005. There, he started Custom Search, Search based keyword tool, SMS Channels. He is the founder of Schema.org, collaboration between the major search engines, which provides a structured markup vocabulary, which is currently used by over 20% of the pages on the web. Guha graduated with a B.Tech (Mechanical Engineering) from Indian Institute of Technology Madras, MS in Mechanical Engineering from University of California, Berkeley and Ph.D in Computer Science from Stanford University.