The Scone Knowledge-Base Project

Scott E. Fahlman

Language Technologies Institute & Computer Science Department
Carnegie Mellon University

Scone is a high-performance, open-source knowledge-base (KB) system intended for use as a component in many different software applications. Like other KB systems – for example, Cyc and the various Description Logic systems – Scone provides support for representing symbolic knowledge about the world. This may be general "common sense" knowledge or knowledge about a specific application domain.

Our plan is to release Scone - the software, a relatively small "core" knowledge base, and a programmer-level manual - as open-source software as soon as we have tested the system with "friendly" users in various research groups at Carnegie Mellon. This release will be followed by periodic updates as we continue to develop the Scone engine and associated knowledge bases.

We are also working on a tutorial book that should make it much easier for beginners to make use of the Scone software in projects of their own. We hope that this will lead to an active worldwide community of Scone users who will extend the system in various ways and who will develop open-source knowledge bases for many domains.

The Scone Engine

Scone supports simple inference over the elements and statements in the knowledge base: inheritance of properties from more general descriptions, following chains of transitive relations, detection of type mismatches, and so on. In addition, Scone provides support for search within the knowledge base. For example, we can ask Scone to return all individuals or types represented in the KB that exhibit some set of properties, whether these properties are explicitly stated or inherited from a superior class in the type hierarchy.

Scone's type hierarchy allows multiple inheritance and exceptions. In addition, Scone supports multiple contexts in the knowledge base. The context mechanism allows us to efficiently represent and reason about different states of the knowledge base, including hypothetical or counter-factual states, various opinions, and groups of statements that are true only in some specific time or place.

The Scone "engine", a large Common Lisp program, implements Scone's basic procedures for representation, search, and inference. Procedures supporting more complex kinds of inference – conversion of units, for example, or procedures for checking the plausibility of new knowledge – can be added to the system. These procedures can be triggered by KB queries or by changes to Scone's stored knowledge.

A major emphasis of our research on Scone has been the desire to find search and inference algorithms that are efficient, and that remain usable even as the knowledge base grows to millions of entities and statements. Scone differs from other knowledge-base systems in the way it implements search and inference. Scone uses marker-passing algorithms originally designed for a hypothetical massively parallel machine (the NETL machine). These marker-passing algorithms cannot perform every kind of search and inference that can be handled by a general theorem-prover. However, the Scone algorithms are very fast, and they can handle most kinds of search and inference that are needed for common-sense reasoning. Scone's marker-passing algorithms will be described more fully in a forthcoming paper.

At present, the knowledge bases we have developed for Scone are relatively small: a few thousand statements and entities. However, we have successfully run benchmarks on a synthetic knowledge base with several million items on a $3000 workstation, with most simple queries being processed in a few milliseconds; most other KB systems bog down when loaded with a few thousand statements. If more processing power is needed, the Scone algorithms are well suited to parallel implementation on a network or grid of workstations, or on a data-parallel machine.

Adding Knowledge to Scone

In addition to the engine, the Scone system comes with a number of knowledge-base files, each of which is a collection of descriptions and statements about the entities in some subject area. The "core" KB includes a body of general knowledge that is useful in most other domains: knowledge about physical objects, materials, units of measure, time and space, people, and so on.

The greatest problem for users of current KB systems has been the difficulty of adding new knowledge to the system and making that knowledge fully effective. So a second major focus of our research is to make it easy for users with no special training to add new knowledge to the Scone KB. Scone eases the burden of knowledge entry by relatively clean design and by separating system-efficiency concerns from knowledge-entry concerns.

Our general plan for creation of new Scone knowledge bases is as follows:

      At present, complex knowledge must be entered into Scone as a collection of knowledge-entry statements – specialized Common Lisp expressions. For example, to create a new elephant named Clyde, we would enter the following form:

(new-indv {Clyde} {elephant})

When this form is entered, it is checked for consistency with any information already in the KB.

      A body of fundamental knowledge, such as Scone's representation of time, space, objects, and materials, has been created in this form by members of the Scone project. The process is ongoing.

      When we work with members of another research group that wants to use Scone, we teach them to create their own knowledge bases in Scone format. Many of these KBs are of general value and are added to the Scone library.

      One of our goals in releasing Scone as open-source software is to build a community that will create and share high-quality knowledge bases in any number of areas.

      We are also looking at techniques such as those developed by the Open Mind Project and the creators of the Peekaboom game, which entice large numbers of untrained Internet users to enter new knowledge by turning the process into a game.

      Knowledge can also be obtained by mining existing structured or semi-structured knowledge sources and converting their information content into Scone format. For example, as a demonstration, one student in the Scone group has extracted information about all the countries of the world – area, population, cities, and so on – from the HTML files of the online "CIA World Factbook" and from other sources on the Web. Technically, this conversion is a straightforward process in most cases, though it may require some amount of hand-editing and correction.

      Ultimately we want Scone to accept new knowledge in the form of simple English statements (or statements in the human language of your choice). We already can process many simple declarative English sentences into Scone format, and our coverage of English is increasing steadily. However, to handle the full range of English statements – the sort of text we might find in newspapers and textbooks – we must use the knowledge already in Scone to help us disambiguate the new text we are trying to process. Several of the students in the Scone Research Group are working on various aspects of this challenging problem.

Current and Potential Applications of Scone

In the long run, we believe that Scone could become a standard component for people writing knowledge-based software. A knowledge base could be used in as many different ways as databases are used today.

Of course, this depends on the efficiency and reliability of the system, and most of all on its ease of use. Our goal is to make Scone so easy that any smart college undergrad who is developing an adventure game will be able to read the Scone tutorial book, download the open-source software, and begin using Scone as a tool to hold the system's knowledge: "An ogre typically carries a club, lives in a cave, and likes to eat hobbits. Igor the Ogre has met Frodo and will recognize him if they meet again. But a character in disguise probably won't be recognized."

Of course, Scone can be used for more serious purposes as well. Here are a few example applications:

      Online catalogs: It is straightforward in Scone to represent hierarchies of products, their characteristics, their intended application, which components work together, information on prices, vendors, and availability, and so on.

      Help-desk support: Just as products can be described and searched in Scone, so too can families of problems, their symptoms, and their causes.

      Autonomic computing: Companies that develop or manage complex hardware/software installations face a serious problem in configuring these systems correctly, recognizing vulnerabilities and attacks, and diagnosing and repairing problems. The first step in managing this complexity is to create a symbolic description of the installation: its components, tasks, personnel and permissions, and the external environment. This is a job for a KB.

      Federated databases: Suppose two companies merge. Company A has a database of employees, but it does not cover temporary or part-time employees. Company B has a database, also labeled employees, which does contain their part-time and temporary employees, but it does not include salespeople who get commission rather than a salary. If we can represent the different types and subtypes of employees in a knowledge base, then we can begin to combine these two ontologies and to resolve the differences between them.

One possible solution is to send all database queries first to Scone, which will pick off and answer any odd or exceptional queries; Scone can then send the straightforward queries (perhaps in modified form) on to the appropriate DB.

      Computational biology: The literature in this field is huge and is growing at an alarming rate. Representing and organizing all this diverse knowledge, so that connections can be noticed and so that researchers can find the information that they need, is another job well-suited for a knowledge base, perhaps backed up by multiple databases for low-level data.

In all of these applications, it is important to keep in mind that we do not have to choose between a knowledge-based approach using Scone and a statistical approach, or one using conventional database technology. In many problem domains, as of today, none of these approaches provides a complete solution, but fortunately they all play well together.

For example, a little bit of symbolic knowledge – perhaps just a type hierarchy and some properties – can add a lot of power to a search engine or a classifier by augmenting queries and by filtering what the search engine returns. As Scone's knowledge base grows and evolves, it can play an ever-greater role in this partnership. But the key point here is that we do not have to wait for this. This research project is ambitious but it is not an all-or-nothing proposition.

During the spring 2006 term, Scone has been tested by three research projects at Carnegie Mellon's School of Computer Science. These projects are Radar, Javelin (Question Answering), and "Read the Web". In these applications, Scone serves both as a repository for background knowledge and as the store where newly learned knowledge can be saved.

Scone has already been used to improve message classification within the Radar system by augmenting the "bag of words" features with "implied" features. If a message mentions "Scott Fahlman", Scone adds additional features for "faculty", "CMU", "AI", "research", "Scone Project", and so on, based on the background knowledge in the KB. If these new virtual features are irrelevant, the classifier will learn to ignore them, but often they are valuable. If a user asks whether there are any upcoming "AI" talks and we have a message saying that "Scott Fahlman" is speaking, we can make the connection.

Software & Publications

Open-Source Scone Software (Coming Soon)

The Scone User's Guide (HTML, Word, PDF)

SconeEdit Browser/Editor for Scone

Other Scone-Related Publications

Members of the Scone Research Group

Faculty:
Scott E. Fahlman

Current Grad Students:
Allen Benson (Pitt), Ben Lambert, Wei Chen

Former Grad Students:
Daniel Chung Yong Lim, Daniel Olsher, E. Cinar Sahin, Alicia Tribble Sagae

Former Vistors:
Maria Jose Santofimia Romero, David Manzano-Macho

Recent Undergraduate Students:
Matthew Gormley, Jiquan Ngiam, Apaorn Suveepattananont

Acknowledgments

Development of Scone from 2003 through 2008 was supported in part by the Defense Advanced Research Projects Agency (DARPA) under contract numbers NBCHD030010 and FA8750-07-D-0185. Additional support for Scone development has been provided by generous research grants from Cisco Systems Inc. and from Google Inc.