The Scone Knowledge-Base Project
Scone is a
high-performance, open-source knowledge-base (KB) system intended for use as
a component in many different software applications. Like other KB systems – for example, Cyc and the various Description
Logic systems – Scone provides support
for representing symbolic knowledge about the world. This may be general "common
sense" knowledge or knowledge about a specific application domain.
Our plan is to release Scone
- the software, a relatively small "core" knowledge base, and a
programmer-level manual - as open-source software as soon as we have tested
the system with "friendly" users in various research groups at Carnegie
Mellon. This release will be followed
by periodic updates as we continue to develop the Scone
engine and associated knowledge bases.
We are also working on a tutorial book that
should make it much easier for beginners to make use of the Scone
software in projects of their own. We
hope that this will lead to an active worldwide community of Scone users who will extend the system in various ways
and who will develop open-source knowledge bases for many domains.
The
Scone Engine
Scone supports simple
inference over the elements and statements in the knowledge base: inheritance
of properties from more general descriptions, following chains of transitive
relations, detection of type mismatches, and so on. In addition, Scone
provides support for search within the knowledge base. For example, we can ask Scone to return all
individuals or types represented in the KB that exhibit some set of
properties, whether these properties are explicitly stated or inherited from
a superior class in the type hierarchy.
Scone's type
hierarchy allows multiple inheritance and exceptions. In addition, Scone
supports multiple contexts in the knowledge base. The context mechanism allows us to
efficiently represent and reason about different states of the knowledge base,
including hypothetical or counter-factual states, various opinions, and
groups of statements that are true only in some specific time or place.
The Scone
"engine", a large Common Lisp program, implements Scone's
basic procedures for representation, search, and inference. Procedures supporting more complex kinds of
inference – conversion of units, for example, or procedures for checking the
plausibility of new knowledge – can be added to the system. These procedures can be triggered by KB
queries or by changes to Scone's stored
knowledge.
A major emphasis of our research on Scone has been the desire to find search and inference
algorithms that are efficient, and that remain usable even as the knowledge
base grows to millions of entities and statements. Scone
differs from other knowledge-base systems in the way it implements search and
inference. Scone
uses marker-passing algorithms originally designed for a hypothetical
massively parallel machine (the NETL machine). These marker-passing algorithms cannot
perform every kind of search and inference that can be handled by a general
theorem-prover. However, the Scone algorithms are very fast, and they can handle
most kinds of search and inference that are needed for common-sense
reasoning. Scone's
marker-passing algorithms will be described more fully in a forthcoming
paper.
At present, the knowledge bases we have developed
for Scone are relatively small: a few
thousand statements and entities.
However, we have successfully run benchmarks on a synthetic knowledge
base with several million items on a $3000 workstation, with most simple
queries being processed in a few milliseconds; most other KB systems bog down
when loaded with a few thousand statements.
If more processing power is needed, the Scone
algorithms are well suited to parallel implementation on a network or grid of
workstations, or on a data-parallel machine.
Adding
Knowledge to Scone
In addition to the engine, the Scone
system comes with a number of knowledge-base files, each of which is a collection
of descriptions and statements about the entities in some subject area. The "core" KB includes a body of
general knowledge that is useful in most other domains: knowledge about
physical objects, materials, units of measure, time and space, people, and so
on.
The
greatest problem for users of current KB systems has been the difficulty of
adding new knowledge to the system and making that knowledge fully
effective. So a second major focus of
our research is to make it easy for users with no special training to add new
knowledge to the Scone KB. Scone eases the burden of knowledge entry by relatively
clean design and by separating system-efficiency concerns from
knowledge-entry concerns.
Our
general plan for creation of new Scone
knowledge bases is as follows:
·
At present, complex knowledge must be entered
into Scone as a collection of
knowledge-entry statements – specialized Common Lisp expressions. For example, to create a new elephant named
Clyde, we would enter the following form:
(new-indv {Clyde} {elephant})
When this form is entered, it is
checked for consistency with any information already in the KB.
·
A body of fundamental knowledge, such as Scone's representation of time, space, objects, and
materials, has been created in this form by members of the Scone
project. The process is ongoing.
·
When we work with members of another research
group that wants to use Scone, we teach them
to create their own knowledge bases in Scone
format. Many of these KBs are of
general value and are added to the Scone
library.
·
One of our goals in releasing Scone as open-source software is to build a community
that will create and share high-quality knowledge bases in any number of
areas.
·
We are also looking at techniques such as
those developed by the Open Mind Project and the creators of the Peekaboom game, which entice large numbers of untrained
Internet users to enter new knowledge by turning the process into a game.
·
Knowledge can also be obtained by mining
existing structured or semi-structured knowledge sources and converting their
information content into Scone format. For example, as a demonstration, one
student in the Scone group has extracted
information about all the countries of the world – area, population, cities,
and so on – from the HTML files of the online "CIA World Factbook"
and from other sources on the Web.
Technically, this conversion is a straightforward process in most
cases, though it may require some amount of hand-editing and correction.
·
Ultimately we want Scone
to accept new knowledge in the form of simple English statements (or
statements in the human language of your choice). We already can process many simple
declarative English sentences into Scone
format, and our coverage of English is increasing steadily. However, to handle the full range of
English statements – the sort of text we might find in newspapers and
textbooks – we must use the knowledge already in Scone
to help us disambiguate the new text we are trying to process. Several of the students in the Scone
Research Group are working on various aspects of this challenging
problem.
Current
and Potential Applications of Scone
In the long run, we believe that Scone could become a standard component for people
writing knowledge-based software. A
knowledge base could be used in as many different ways as databases are used
today.
Of course, this depends on the efficiency and
reliability of the system, and most of all on its ease of use. Our goal is to make Scone
so easy that any smart college undergrad who is
developing an adventure game will be able to read the Scone
tutorial book, download the open-source software, and begin using Scone as a tool to hold the system's knowledge: "An ogre typically carries a club,
lives in a cave, and likes to eat hobbits.
Igor the Ogre has met Frodo and will recognize him if they meet
again. But a character in disguise
probably won't be recognized."
Of course, Scone
can be used for more serious purposes as well. Here are a few example applications:
·
Online
catalogs: It is straightforward in Scone
to represent hierarchies of products, their characteristics, their intended application, which components work
together, information on prices, vendors, and availability, and so on.
·
Help-desk
support: Just as products can be
described and searched in Scone, so too can
families of problems, their symptoms, and their causes.
·
Autonomic
computing: Companies that develop
or manage complex hardware/software installations face a serious problem in
configuring these systems correctly, recognizing vulnerabilities and attacks,
and diagnosing and repairing problems.
The first step in managing this complexity is to create a symbolic
description of the installation: its components, tasks, personnel and
permissions, and the external environment.
This is a job for a KB.
·
Federated
databases: Suppose two companies
merge. Company A has a database of employees, but it does not cover
temporary or part-time employees.
Company B has a database, also labeled employees, which does contain their part-time and temporary
employees, but it does not include salespeople who get commission rather than
a salary. If we can represent the
different types and subtypes of employees in a knowledge base, then we can
begin to combine these two ontologies and to resolve the differences between them.
One possible solution is to send
all database queries first to Scone, which will pick off and answer any odd or exceptional queries;
Scone can then send the straightforward queries (perhaps in modified form) on
to the appropriate DB.
·
Computational
biology: The literature in this
field is huge and is growing at an alarming rate. Representing and organizing all this
diverse knowledge, so that connections can be noticed and so that researchers
can find the information that they need, is another job well-suited for a
knowledge base, perhaps backed up by multiple databases for low-level data.
In all
of these applications, it is important to keep in mind that we do not have to
choose between a knowledge-based approach using Scone and a statistical
approach, or one using conventional database technology. In many problem domains, as of today, none
of these approaches provides a complete solution, but fortunately they all
play well together.
For
example, a little bit of symbolic knowledge – perhaps just a type hierarchy
and some properties – can add a lot of power to a search engine or a
classifier by augmenting queries and by filtering what the search engine
returns. As Scone's
knowledge base grows and evolves, it can play an ever-greater role in this
partnership. But the key point here is
that we do not have to wait for this.
This research project is ambitious but it is not an all-or-nothing
proposition.
During
the spring 2006 term, Scone has been tested
by three research projects at Carnegie Mellon's School of Computer
Science.
These projects are Radar, Javelin (Question Answering), and "Read
the Web". In these applications, Scone serves both as a repository for background knowledge
and as the store where newly learned knowledge can be saved.
Scone has
already been used to improve message classification within the Radar system
by augmenting the "bag of words" features with "implied"
features. If a message mentions
"Scott Fahlman", Scone adds additional features for
"faculty", "CMU", "AI", "research",
"Scone Project", and so on, based on the background knowledge in
the KB. If these new virtual features
are irrelevant, the classifier will learn to ignore them, but often they are
valuable. If a user asks whether there
are any upcoming "AI" talks and we have a message saying that
"Scott Fahlman" is speaking, we can make the connection.
Software
& Publications
Open-Source
Scone Software (Coming Soon)
The
Scone User's Guide (HTML,
Word, PDF)
SconeEdit Browser/Editor
for Scone
Other Scone-Related
Publications
Members
of the Scone Research Group
Faculty:
Scott E. Fahlman
Current Grad Students:
Allen Benson (Pitt), Ben Lambert, Wei Chen
Former Grad Students:
Daniel
Chung Yong Lim, Daniel Olsher, E. Cinar Sahin, Alicia Tribble Sagae
Former Vistors:
Maria
Jose Santofimia Romero, David Manzano-Macho
Recent Undergraduate Students:
Matthew Gormley, Jiquan
Ngiam, Apaorn Suveepattananont
Acknowledgments
Development of Scone from 2003 through 2008
was supported in part by the Defense Advanced Research Projects Agency
(DARPA) under contract numbers NBCHD030010 and FA8750-07-D-0185. Additional
support for Scone development has been provided by generous research grants
from Cisco Systems Inc. and from Google Inc.
Note:
A Rumanian translation of
this web page can be found here,
courtesy of Azoft.
|