The
Scone Knowledge-Base Project
Scone is a
high-performance, open-source knowledge-base (KB) system intended for use as
a component in many different software applications. Like other KB systems – for example, Cyc and the various Description
Logic systems – Scone provides support for
representing symbolic knowledge about the world. This may be general "common
sense" knowledge or knowledge about a specific application domain.
Our plan is to release Scone
- the software, a relatively small "core" knowledge base, and a
programmer-level manual - as open-source software as soon as we have tested
the system with "friendly" users in various research groups at
Carnegie Mellon. This release will be
followed by periodic updates as we continue to develop the Scone
engine and associated knowledge bases.
We are also working on a tutorial book that
should make it much easier for beginners to make use of the Scone
software in projects of their own. We
hope that this will lead to an active worldwide community of Scone
users who will extend the system in various ways and who will develop and
share open-source knowledge bases for many domains.
The Scone Engine
Scone supports simple inference
over the elements and statements in the knowledge base: inheritance of
properties from more general descriptions, following chains of transitive
relations, detection of type mismatches, and so on. In addition, Scone
provides support for search within the knowledge base. For example, we can ask Scone to return all
individuals or types represented in the KB that exhibit some set of
properties, whether these properties are explicitly stated or inherited from a
superior class in the type hierarchy.
Scone's type hierarchy
allows multiple inheritance and exceptions.
In addition, Scone supports multiple contexts
in the knowledge base. The context
mechanism allows us to efficiently represent and reason about different
states of the knowledge base, including hypothetical or counter-factual
states, various opinions, and groups of statements that are true only in some
specific time or place.
The Scone
"engine", a large Common Lisp program, implements Scone's
basic procedures for representation, search, and inference. Procedures supporting more complex kinds of
inference – conversion of units, for example, or procedures for checking the
plausibility of new knowledge – can be added to the system. These procedures can be triggered by KB
queries or by changes to Scone's stored knowledge.
A major emphasis of our research on Scone
has been the desire to find search and inference algorithms that are
efficient, and that remain usable even as the knowledge base grows to
millions of entities and statements. Scone
differs from other knowledge-base systems in the way it implements search and
inference. Scone
uses marker-passing algorithms originally designed for a hypothetical
massively parallel machine (the NETL machine). These marker-passing algorithms cannot perform
every kind of search and inference that can be handled by a general
theorem-prover. However, the Scone
algorithms are very fast, and they can handle most kinds of search and
inference that are needed for common-sense reasoning. Scone's
marker-passing algorithms are described briefly this
paper from KSEM’06 and will be described more fully in future
publications.
At present, the knowledge bases we have developed
for Scone are relatively small: tens of thousands of
statements and entities. However, we
have successfully run benchmarks on a synthetic knowledge base with more than
a million items on a $3000 workstation, with most simple queries being
processed in a few milliseconds; most other KB systems bog down when loaded
with a few thousand statements. If
more processing power is needed, the Scone algorithms
are well suited to parallel implementation on a network or grid of
workstations, or on a data-parallel machine.
Adding Knowledge to Scone
In addition to the engine, the Scone
system comes with a number of knowledge-base files, each of which is a
collection of descriptions and statements about the entities in some subject
area. The "core" KB includes
a body of general knowledge that is useful in most other domains: knowledge
about physical objects, materials, units of measure, time and space, people,
and so on.
The
greatest problem for users of current KB systems has been the difficulty of
adding new knowledge to the system and making that knowledge fully
effective. So a second major focus of
our research is to make it easy for users with no special training to add new
knowledge to the Scone KB. Scone eases the burden of knowledge entry by
relatively clean design and by separating system-efficiency concerns from
knowledge-entry concerns.
Our
general plan for creation of new Scone
knowledge bases is as follows:
·
At present, complex knowledge must be entered
into Scone as a collection of knowledge-entry
statements – specialized Common Lisp expressions. For example, to create a new elephant named
Clyde, we would enter the following form:
(new-indv {Clyde} {elephant})
When this form is entered, it is
checked for consistency with any information already in the KB.
·
A body of fundamental knowledge, such as Scone's
representation of time, space, objects, and materials, has been created in
this form by members of the Scone project. The process is ongoing.
·
When we work with members of another research
group that wants to use Scone, we teach them to create
their own knowledge bases in Scone format. Many of these KBs
are of general value and are added to the Scone
library.
·
One of our goals in releasing Scone
as open-source software is to build a community that will create and share high-quality
knowledge bases in any number of areas.
·
We are also looking at techniques such as
those developed by the Open Mind Project and the creators of the Peekaboom game, which entice large numbers of untrained
Internet users to enter new knowledge by turning the process into a
game. However, quality control of the
resulting knowledge bases is an issue.
·
Knowledge can also be obtained by mining
existing structured or semi-structured knowledge sources and converting their
information content into Scone format. For example, as a demonstration, one
student in the Scone group has extracted information
about all the countries of the world – area, population, cities, and so on –
from the HTML files of the online "CIA World Factbook" and from
other sources on the Web. Technically,
this conversion is a straightforward process in most cases, though it may
require some amount of hand-editing and correction.
·
Ultimately we want Scone
to accept new knowledge in the form of simple English statements (or
statements in the human language of your choice). We already can process many simple
declarative English sentences into Scone format, and
our coverage of English is increasing steadily. However, to handle the full range of
English statements – the sort of text we might find in newspapers and
textbooks – we must use the knowledge already in Scone
to help us disambiguate the new text we are trying to process. Several of the students in the Scone
Research Group are working on various aspects of this challenging problem.
Current and Potential Applications of Scone
In the long run, we believe that Scone
could become a standard component for people writing knowledge-based
software. A knowledge base could be
used in as many different ways as databases are used today.
Of course, this depends on the efficiency and
reliability of the system, and most of all on its ease of use. Our goal is to make Scone
so easy that any smart college undergrad who is
developing an adventure game will be able to read the Scone
tutorial book, download the open-source software, and begin using Scone
as a tool to hold the system's knowledge: "An
ogre typically carries a club, lives in a cave, and likes to eat
hobbits. Igor the Ogre has met Frodo
and will recognize him if they meet again.
But a character in disguise probably won't be recognized."
Of course, Scone can be
used for more serious purposes as well.
Here are a few example applications:
·
Online
catalogs: It is straightforward in Scone to
represent hierarchies of products, their characteristics, their
intended application, which components work together, information on prices,
vendors, and availability, and so on.
·
Help-desk
support: Just as products can be
described and searched in Scone, so too can families
of problems, their symptoms, and their causes.
·
Knowledege-assisted
search engines: Statistical
"bag of words" search engines such as Google are an essential tool
of modern life, but their search is fundamentally based on the presence or
absence of specific words (or sometimes multi-word phrases) in a given
document. Even a little bit of
knowledge can improve the performance of a search engine, for example by
allowing us to retrieve "cat" and "dog" articles when the
user asks for "pets". A
system that begins to understand and represent the content of a document
could do much more. This is especially
important when the task is to match a short query (e.g. "Pets playing
with toys") against a short label such as a picture caption (e.g. "Dog
catching a Frisbee"). Literal
word-matching doesn't work very well in this case, but we can do much better
if we match word meanings against
one another.
·
Autonomic
computing: Companies that develop
or manage complex hardware/software installations face a serious problem in
configuring these systems correctly, recognizing vulnerabilities and attacks,
and diagnosing and repairing problems.
The first step in managing this complexity is to create a symbolic
description of the installation: its components, tasks, personnel and
permissions, and the external environment.
This is a job for a KB.
·
Federated
databases: Suppose two companies
merge. Company A has a database of employees, but it does not cover
temporary or part-time employees.
Company B has a database, also labeled employees, which does contain their part-time and temporary
employees, but it does not include salespeople who get commission rather than
a salary. If we can represent the
different types and subtypes of employees in a knowledge base, then we can
begin to combine these two ontologies and to resolve the differences between
them. One possible solution is to send
all database queries first to Scone, which will pick off and answer any odd or exceptional queries;
Scone can then send the straightforward queries (perhaps in modified form) on
to the appropriate DB.
·
Federated
ontologies: One way to merge multiple, independently developed ontologies
is to merge all the information into a single unified KB. Scone is well suited
for this: it can handle large amounts of information, and its representation
is more expressive than most other ontologies in current use, so Scone
can represent information from all of them.
However, it is sometimes impractical or legally forbidden to collect
all this information in one system. As
in the case with databases, we can use Scone as a
front-end system that can reference other ontology servers as needed,
translating and combining the results.
The advantage of using a KB system like Scone
as a front-end is that it can accumulate knowledge about what kinds of
knowledge live in each server, which server to believe in the event of a
conflict, differences in terminology, and so on.
·
Computational
biology and medicine: The
literature in this field is huge and is growing at an alarming rate. Representing and organizing all this
diverse knowledge, so that connections can be noticed and so that researchers
can find the information that they need, is another job well-suited for a
knowledge base, perhaps backed up by multiple databases or specialized
ontologies for low-level data.
·
Extracting
knowledge from ill-formed text: We have begun to look at the problem of
handling free-text notes in patient records, which are often hastily
scribbled by busy medical practitioners or are imperfectly transcribed from
speech. These texts are very informal
and telegraphic in nature: full of jargon, non-standard abbreviations, and
contractions (some made up on the spot).
Well-formed grammatical sentences are rare. We believe that the only way to understand
this type of text is to develop a natural language processing (NLP)system in which the KB full of
background knowledge is a full partner, consulted early and often during the
processing.
In all
of these applications, it is important to keep in mind that we do not have to
choose between a knowledge-based approach using Scone and a statistical
approach, or one using conventional database technology. In many problem domains, as of today, none
of these approaches provides a complete solution, but fortunately they all
play well together.
For
example, as noted above, a little bit of symbolic knowledge – perhaps just a
type hierarchy and some properties – can add a lot of power to a search
engine or a classifier by augmenting queries and by filtering what the search
engine returns. As Scone's knowledge base grows and evolves, it can
play an ever-greater role in this partnership. But the key point here is that we do not
have to wait for this. This research
project is ambitious but it is not an all-or-nothing proposition.
Since
2006, pre-release versions of Scone
have been tested by a number of research projects in Carnegie Mellon's School of Computer Science.
Among these projects are Radar, Javelin (Question Answering), and
"Read the Web". In these
applications, Scone serves both as a repository for background
knowledge and as the store where newly learned knowledge can be saved.
Scone has
been used to improve message classification within the Radar system by
augmenting the "bag of words" features with "implied"
features. If a message mentions
"Scott Fahlman", Scone adds additional features for
"faculty", "CMU", "AI", "research",
"Scone Project", and so on, based on the background knowledge in
the KB. If these new virtual features
are irrelevant, the classifier will learn to ignore them, but often they are
valuable. If a user asks whether there
are any upcoming "AI" talks and we have a message saying that
"Scott Fahlman" is speaking, we can make the connection.
Software & Publications
Open-Source
Scone Software (Coming Soon)
The
Scone User's Guide (HTML, Word)
Scott Fahlman's "Knowledge
Nuggets" Blog
SconeEdit Browser/Editor
for Scone
Other Scone-Related
Publications
Members of the Scone Research Group
Faculty:
Scott E. Fahlman
Graduate Students:
Wei
Chen, Ben Lambert, Daniel Chung Yong
Lim, E. Cinar Sahin, Alicia Tribble
Undergraduate Students:
Jiquan Ngiam
Acknowledgments
Development of Scone has been
supported in part by the Defense Advanced Research Projects Agency (DARPA)
under contract numbers NBCHD030010 and FA8750-07-D-0185. Additional support
for Scone development has been provided by generous
research grants from Cisco Systems Inc. and from Google Inc.
Any opinions, findings and conclusions or recommendations
expressed in this material are those of the author and do not necessarily
reflect the views of DARPA or our other sponsors.
|