From whitten@netcom.com Thu Feb 9 16:25:50 EST 1995 Article: 27197 of comp.ai Newsgroups: comp.ai,comp.ai.nat-lang,comp.ai.philosophy Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!udel!gatech!howland.reston.ans.net!ix.netcom.com!netcom.com!whitten From: whitten@netcom.com (David Whitten) Subject: CYC FAQ Message-ID: Summary: CYC Frequently Asked Questions Keywords: CYC FAQ Organization: AOT incorporated X-Newsreader: TIN [version 1.2 PL1] Date: Wed, 8 Feb 1995 00:17:12 GMT Lines: 740 Xref: glinda.oz.cs.cmu.edu comp.ai:27197 comp.ai.nat-lang:2803 comp.ai.philosophy:25292 The Unofficial, Unauthorized CYC Frequently Asked Questions Information Sheet. Written by David Whitten, with various input from other net citizens If you think of questions that are appropriate for this FAQ, or would like to improve an answer, please send email to me at whitten@netcom.com *** Copyright: Copyright (c) 1994 by David Whitten. All rights reserved. Portions copyright (c) by MCC and Cycorp. This FAQ may be freely redistributed in its entirety without modification provided that this copyright notice is not removed. It may not be sold for profit or incorporated in commercial documents (e.g., published for sale on CD-ROM, floppy disks, books, magazines, or other print form) without the prior written permission of the copyright holder. Permission is expressly granted for this document to be made available for file transfer from installations offering unrestricted anonymous file transfer on the Internet. This article is provided AS IS without any express or implied warranty. *** Topics Covered: [1] Introduction [1-1] What is CYC ? [1-2] How do I find this FAQ ? [2] Who is doing CYC ? [2-1] Who is sponsoring it ? [2-2] Who is Douglas Lenat ? [2-3] Who else was previously or is currently working on CYC ? [2-4] How do I contact the authors myself ? [3] When did they start CYC and when will it be completed ? [4] Why are they creating CYC ? [5] Where can I get the source code ? [6] What do they use to make it work ? [6-1] What programming language is it written in ? [6-2] What machines does it run on ? [7] How does it work ? [7-1] How do they store common sense in a computer ? [7-2] How do they input common sense ? [7-3] What theoretical foundation is behind CYC ? [7-4] What is the difference between CYC and an Expert System ? [7-5] What are they doing in Natural Language Processing ? [8] Can CYC reason about information not stored in CYC format databases? [9] What are the details of the functional interface to CYCL ? [A] Acknowledgements [B] Bibliography Search for [#] to get to question number # quickly. *** Recent changes: ;;; 1.0: 15-DEC-94 djw Initial release ---------------------------------------------------------------- Subject: [1] Introduction Certain questions and topics come up frequently in the various artificial intelligence discussion groups about the CYC program. This file/article is an attempt to gather these questions and their answers together as a convenient reference for AI researchers, students, hobbyists, and practitioners. I post it whenever I notice a newbie asking about CYC on one of the newsgroups I read. I hope this will cut down on network traffic, and increase the enjoyment of these newsgroups for the regular readers by eliminating the necessity to read and respond to the same questions over and over again. It may even answer some questions that readers may not have thought about yet, and hopefully will stimulate new discussions by increasing the total amount of information available. Currently this FAQ covers the obvious questions and answers, but I plan to add new questions and answers as they become common. ---------------------------------------------------------------- Subject: [1-1] What is CYC ? CYC is the name of a very large, multi-contextual knowledge base and inference engine, the development of which started at the Microelectronics and Computer Technology Corporation (MCC) in Austin, Texas during the early 1980s. Over the past eleven years the members of the CYC team have added to the knowledge base a huge amount of fundamental human knowledge: facts, rules of thumb, and heuristics for reasoning about the objects and events of modern everyday life. CYC is an attempt to do symbolic AI on a massive scale. It is not based on numerical methods such as statistical probabilities, nor is it based on neural networks or fuzzy logic. All of the knowledge in CYC is represented declaratively in the form of logical assertions. CYC presently contains approximately 400,000 significant assertions, which include simple statements of fact, rules about what conclusions to draw if certain statements of fact are satisfied (true), and rules about how to reason with certain types of facts and rules. New conclusions are derived by the inference engine using deductive reasoning. Its avowed purpose is to break the software brittleness bottleneck once and for all by constructing a foundation of basic "common sense" knowledge -- a sort of semantic substratum of terms, rules, and relations -- that will enable a variety of knowledge-intensive products and services. CYC is intended to provide a "deep" layer of definitions/understanding that can be used by other programs (such as domain-specific expert systems) to make them more flexible. To date, CYC has made possible ground-breaking pilot applications in the areas of heterogeneous database browsing and integration, captioned image retrieval, and natural language processing. ---------------------------------------------------------------- Subject: [1-2] How do I find this FAQ ? This FAQ is posted semi-regularly on the comp.ai and comp.ai.nat-lang newsgroups, as well as being available from its author. (whitten@netcom.com) It is not currently available from archive sites, WWW sites, and ftp sites. ---------------------------------------------------------------- Subject: [2] Who is doing CYC ? Much of the CYC work has been done at the Microelectronics and Computer Technology Corporation in Austin, Texas. As of the First of January in 1995, a new independent company named Cycorp has been created to further the work done on the CYC project. Cycorp continues to be based in Austin, Texas. ---------------------------------------------------------------- Subject: [2-1] Who is sponsoring it ? The development of CYC has been supported by several organizations, including Apple, Bellcore, DEC, DoD, Interval, Kodak, and Microsoft. ---------------------------------------------------------------- Subject: [2-2] Who is Douglas Lenat ? Doug Lenat is one of the world's leading computer scientists and is head of the CYC Project at MCC and President of Cycorp. He has been a Professor of Computer Science at Carnegie-Mellon University and Stanford University. He is a prolific author, whose publications include the books Knowledge Based Systems in Artificial Intelligence (1982, McGraw-Hill) Building Expert Systems (1983, Addison-Wesley) Knowledge Representation (1988, Addison-Wesley) Building Large Knowledge Based Systems (1989, Addison-Wesley) His 1976 Stanford thesis earned him the bi-annual IJCAI Computers and Thought Award in 1977. He was named one of America's brightest scientists under the age of 40 in the December 1984 Science Digest. In 1986, he was elected as Councilor of AAAI. ---------------------------------------------------------------- Subject: [2-3] Who else was previously or is currently working on CYC ? Here is a complete list of everyone who has worked with CYC for more than one continous year since 1988. Many other persons have made sporadic or episodic contributions to CYC. Present employees of Cycorp -- full-time, part-time, occasional, or currently on leave -- are marked with *. Paul Blair (1991-1993) is a graduate student in Philosophy who worked on CYC as a knowledge enterer. His contributions to the project included writing a clear introduction to CYCL, CYC's representation language. Paul is currently continuing his graduate studies in New York City. Judy Bowman (1986-1994) was the secretary for the CYC project during most of its existence at MCC. Rupert Brauch (1992-1994) holds an undergraduate degree in Philosophy and contributed to CYC as a knowledge enterer. He is now pursuing a graduate degree in Computer Science at Stanford. *Kathy Burns (1993-present) is a member of the Cycorp Technical Board, and directs Cycorp's natural language processing development effort. She holds an undergraduate degree in Linguistics from the University of Texas at Austin, and has done graduate study in Linguistics at McGill University. Kathy has written most of the semantic rules that allow translation from English to CYCL (CYC's formal representation language). She played an important role in developing the new CYC database browsing and retrieval pilot application. She has composed CYC Mosaic interface and help pages, and has done a significant amount of knowledge entry. Mark Derthick (1988-1994) holds a PhD in Computer Science from Carnegie-Mellon University. He developed and maintained a variety of interface, browsing, and knowledge entry tools, including the Heuristic Level to Epistemological Level translator that proved to be a crucial addition to the earlier (pre-1991) frame-based version of CYC. With Karen Pittman, he played a major role in developing the Cyccess pilot application, which demonstrated the value of using CYC to integrate the data contained in disparate structured information sources such as database tables and spreadsheets. Mark is currently working on a data representation and visualization project at CMU. *Lila Ghemri (1992-present) holds a PhD in Computational Linguistics from the University of Bristol, England. She works on the CYC natural language processing effort, has done a significant amount of knowledge entry, and has played an important part in expanding CYC's English language lexicon. She is responsible for augmenting and maintaining the syntax templates that enable translation from English to CYCL (CYC's representation language), and for using corpora to test CYC's NL system. *Keith Goolsbey (1990-present) is a member of Cycorp's Technical Board. He holds a Bachelor's degree in Electrical Engineering and a Master's degree in Computer Science, both from the University of Texas at Austin. Keith has played a dominant role in porting CYC from Common Lisp to C, including work on a Common Lisp to C translator that allows comparable implementations of CYC to exist in both languages simultaneously. He has done a significant amount of knowledge entry, especially concerning CYC's treatment of quantities and scalar intervals, and was a key participant in building an early version of CYC's captioned image retrieval pilot application. He has worked to make possible aesthetically pleasing and practical Mosaic interfaces to CYC, including the construction of a new browsing/editing interface. Ramanathan V. Guha (1987-1994) holds an MS in Mechanical Engineering from UC-Berkeley, and a PhD in Computer Science from Stanford. From 1990 to 1994 he was the technical director of the CYC project. He has written several important papers and technical reports, including (as co-author) the book Building Large Knowledge Based Systems (1989, Addison-Wesley). Many features of CYC as it exists today are due directly to R. V. Guha's vision and hard work. His most notable contributions include converting CYC from a frame-based system to one in which the fundamental data objects are logical assertions, and the implementation of logical contexts (microtheories) as a method for structuring and effectively reasoning with the knowledge in the CYC knowledge base. More recently, he was the principal developer of an innovative CYC database browsing and retrieval pilot application. Guha left the CYC project at the end of 1994 to pursue other interests. *Bill Jarrold (1990-present) holds a BS in Cognitive Science from MIT. He has done a great deal of knowledge entry work for CYC, most notably in the domains of weather and naive spatial relations. He is involved in the design and implementation of test suites for the CYC knowledge base, the inference engine, and the underlying source code. Bill is currently pursuing a PhD degree in Counseling Psychology at the University of Texas at Austin. Kate Joly (1992-1994) holds a Bachelor's degree in Linguistics from UC-Santa Cruz. She made important contributions to CYC's natural language processing effort, including writing most of the syntactic parsing templates for translating from English to CYCL. Liz Lempert (1992-1994) holds a Bachelor's degree in Symbolic Systems from Stanford. She has done a great deal of knowledge entry for CYC in a wide variety of domains. She is now pursuing graduate studies in Boston. *Bill MacCartney (1994-present) holds a Bachelor's degree in Philosophy from Princeton. In addition to doing knowledge entry, he has helped with the process of porting CYC from Common Lisp to C, and has worked on CYC interface development in C++ and Mosaic. Alan McKendree (1991-1994) holds a BS in Mathematics from the University of Texas at Austin. He worked on CYC as a knowledge enterer and a system support specialist. *Kathy Mitchell (1991-present) holds a Bachelor's degree in Computer Science from Texas A & M and an MS in Computer Science from the University of Texas at Austin. She has done much knowledge entry for CYC in just about every domain. She played an important part in testing and debugging the CYC captioned image retrieval pilot application, and more recently has worked on the CYC natural language processing effort. *Kenneth Murray (1987-present) is pursuing a PhD in Computer Science at the University of Texas at Austin, with a focus on developing and applying large foundational knowledge bases. He has contributed to CYC as a knowledge enterer in a wide variety of domains. In addition to knowledge entry, he is responsible for helping to debug and maintain the CYC source code. Deborah Nichols (1992-1994) did knowledge entry for CYC in several domains. She is now pursuing a PhD in Philosophy at the University of Texas at Austin. *Karen Pittman (1987-present) is a member of Cycorp's Technical Board, and plays a prominent role in directing general knowledge entry work on CYC, training new knowledge enterers, scoping out new knowledge domains, and planning application-specific knowledge entry tasks. She is responsible for a great deal of the knowledge presently in the CYC knowledge base. She holds an MS in Botany from the University of Texas at Austin, and is currently pursuing an MS in Computer Science (also at UT). With Mark Derthick, she played a major role in developing the Cyccess pilot application, which demonstrated the value of using CYC to integrate the data contained in disparate structured information sources such as database tables and spreadsheets. *Dexter Pratt (1989-present) holds a BS in Chemistry from Yale. Now an independent software developer and consultant, he was a pioneer in the development of the Lisp machine workstation in the early 1980s, when he worked for LMI. He has contributed to CYC in many ways, including writing and maintaining interfaces, system software development and maintenance, helping to port CYC from Common Lisp to C, and knowledge entry in a variety of domains. With Nick Siegel, he played a major role in developing CYC's captioned image retrieval pilot application. He now works for Cycorp on an occasional basis. Wanda Pratt (1990-1993) holds an MS in Computer Science from the University of Texas at Austin. She contributed to CYC as a knowledge enterer and a system software developer and maintainer. Her knowledge entry work covered many different domains. She is now pursuing a PhD in Computer Science at Stanford. Wei-Min Shen (1989-1991) holds a PhD in Computer Science from Carnegie-Mellon University. He did knowledge entry for CYC and explored his interest in machine learning. He is now pursuing a career in university teaching and research. *Mary Shepherd (1984-present) is a Sociologist by training. She has contributed to CYC as a knowledge enterer, and for much of the past eleven years has dealt with the onerous administrative and personnel tasks necessary to keep the CYC team functioning. She is presently the administrative manager of Cycorp. *Nick Siegel (1988-present) is a member of Cycorp's Technical Board. He does knowledge entry on CYC, helps to plan and direct knowledge entry, trains new knowledge enterers, and has been actively involved in testing the system and adding knowledge-entry capability to the Mosaic user interface. He holds a BA in History of Religions from Creighton University and an MA in Cultural Anthropology from the University of Texas at Austin. With Dexter Pratt, he played a major role in developing CYC's captioned image retrieval pilot application. *Srinija Srinivasan (1993-present) holds a Bachelor's degree in Symbolic Systems from Stanford. She has done a great deal of knowledge entry for CYC in a wide variety of domains, most notably in the area of human emotional states. Jamie Stephens (1992-1994) worked on CYC as a knowledge enterer and system support specialist. He is now pursuing graduate studies in Mathematics at the University of Texas at Austin. Dan Torosian (1990-1993) worked on CYC as a knowledge enterer. A professional jazz musician (clarinet, saxophone), he is now pursuing his musical career full-time in Austin, Texas. Ginger Webb (1990-1991) holds a Master's degree in French Linguistics from the University of Texas at Austin. She worked on CYC as a knowledge enterer, contributing to several different domains. Alan Kay, Michael Lesk, John McCarthy, Marvin Minsky, Tom Murphy, Bob Simpson, Marvin Weinberger, and Steve Chenoweth have all provided useful comments and met with the CYC team at various times. ---------------------------------------------------------------- Subject: [2-5] How do I contact the authors myself ? The CYC group maintains a low profile on the Internet. As it would be easy to be deluged by email, they have chosen not to publicize their addresses. If you wish to discuss the ideas behind CYC, or the philosophy of the CYC project, you probably will get a faster response by posting to the comp.ai, the comp.ai.philosophy, or the comp.ai.nat-lang newsgroups. There are many talented people who read these groups, and it is possible that someone not affiliated with the CYC project will be able to answer your questions. ---------------------------------------------------------------- Subject: [3] When did they start CYC and when will it be completed ? The CYC project began as a dream to create a computerized encyclopedia. When Alan Kay, one of computing's legendary figures, was at Atari's research center, he asked Doug Lenat for something original to add to this project. After Atari hit financial difficulties, Doug Lenat relocated his idea to MCC. Initially, CYC was based on a re-implementation of RLL (the frame-based language underlying Eurisko) which was similar to the simultaneously but independently developed KRL. The original ten year funding period for the CYC project at MCC was supposed to end in 1994, but was extended for one year to the close of 1995. >From the perspective of some people on the CYC team, asking when CYC will be completed is sort of like asking of a person, "When will s/he be finished?" A more pertinent question is, when will it be useful? The practical answer is, when it can solve the problems and perform the tasks its builders would like for it to perform. The CYC team believes that CYC is now ready to be used in some interesting and useful applications. ---------------------------------------------------------------- Subject: [4] Why are they creating CYC ? The CYC team doesn't believe there is any short cut toward being intelligent or creating an artificial intelligence based agent. Addressing the need for a large body of knowledge complete with content and context may only be done by manually organizing and collating information. This knowledge includes heuristic, rule of thumb problem solving strategies, as well as facts that can only be known to a machine if it is told. Much of the useful common sense knowledge needed for life is prescientific and has therefore not been analyzed in detail. Thus a large part of the work of the CYC project is to formalize common relationships and fill in the gaps between the highly systematized knowledge used by specialists in the modern world. ---------------------------------------------------------------- Subject: [5] Where can I get the source code ? It is not free, nor is it freely available. If you or your company are willing to become a corporate sponsor of the CYC development effort, you will be able to have the same access to the internal details and internal documentation available to the other sponsors (excluding, of course, information that is proprietary to particular sponsors). This is not an option for everyone, and Cycorp reserves the right to determine if they will accept sponsorship by your company. As the current sponsors have invested a considerable sum of money in developing CYC, please do not pursue this option unless you or your company are willing to make a similar contribution. Serious inquiries regarding collaboration or sponsorship may be sent to: Doug Lenat Cycorp, Inc. 3500 West Balcones Center Drive Austin, Texas 78759 While the intent is to make CYC widely available (so that it will become the standard representation and reasoning system), Cycorp is committed to protecting the intellectual property rights of those who have invested in CYC's development. ---------------------------------------------------------------- Subject: [6] What do they use to make it work ? The CYC system itself (the knowledge base, inference engine, interface modules, etc.) is a fairly large, complex piece of software. However, it now runs on what is basically stock, off-the-shelf hardware. CYC can be run in a networked mode (information provided on one machine is available to all the other machines) or on a stand-alone workstation serving several users at once. ---------------------------------------------------------------- Subject: [6-1] What programming language is it written in ? There are both Common Lisp and C versions of CYC. Most development is currently done in Common Lisp running on Symbolics Lisp machines. Lisp source code is translated into C, using a Common Lisp to C translator developed by the CYC team, to produce source code that can be compiled by a variety of standard ANSI C compilers. The CYC team expects that by mid-1996, most development will be done entirely in C. ---------------------------------------------------------------- Subject: [6-2] What machines does it run on ? The C version of CYC is intended to run on any system that provides an ANSI C compiler, virtual memory with at least 150 Mb of swap space, and at least a 32-bit flat virtual address space. As of January 1995, C versions of CYC have been compiled and tested on the following OS/hardware combinations: UNIX OS: Sun Sparc DEC Alpha Apple System 7 OS: Macintosh Powerbook Macintosh Quadra Power Macintosh The CYC team expects to have a C version of CYC running under Microsoft Windows NT on a 486-based IBM PC Compatible shortly. The Common Lisp version runs on Symbolics Lisp machines, and under Lucid on Sparc 10s (with memory requirements similar to those for the C version). ---------------------------------------------------------------- Subject: [7] How does it work ? In the old days (before 1991), CYC's representation language (CYCL) was primarily a frame-based language, the CYC KB was thought of as a set of unit/slot/entry triples, and inferencing was done pretty much by inheritance. This led to a set of increasingly baroque add-ons and work-arounds, such as encoding higher-arity predicates as entries which were tuples, having variant forms of predicates (in which the only difference was the order of the arguments), and placing more and more stress on frame-oriented editing interfaces to navigate around in the knowledge base. The CYC team now thinks of the CYC KB as a "sea" of assertions, with each assertion being no more "about" its first argument than its last one. For example, if one says that Fred is Sally's father, this is now regarded as being just as much a statement "about" Sally as Fred. Inference has broadened out into general logical deduction, with AI's well-known named inference engines (such as inheritance, automatic classification, etc.) just special cases that might or might not get treated specially in any particular implementation of the CYC system; but in any event the persons entering knowledge do not need to cater to that, or even know about it. So one way to visualize the CYC KB is as a circle filled with assertions; a circular "assertion sea". Above this sea (or outside it, from a two-dimensional perspective) sit all the "constants". Attached to each constant is a bundle of thin wires or strings. The other ends are attached to all the assertions, in the sea, that mention that constant anywhere. Moreover, each of the assertions in the sea can itself be treated as a constant, if you want, and have its own wires reaching to other assertions which mention IT. Inference rules in CYC can now be thought of as ways of saying that if you have certain assertions in the sea (a set of them, that match a certain pattern) then you are justified in adding a particular new assertion. Each time an assertion is added, wires are automatically strung to all the constants that are mentioned anywhere inside the assertion, and "ripples" of its adding may cause yet other inferences to occur, yet other new assertions to get dumped into the sea, etc. Sometimes one of the new assertions is the answer someone was waiting for, for some problem; sometimes one of the inference procedures reaches a contradiction and has to cope with that. CYCL, the CYC representation language, is essentially a form of First Order Predicate Calculus (FOPC) with equality, augmentations for default reasoning, skolemization, and some second-order features (e.g., quantification over predicates is allowed in some circumstances). It uses a form of circumscription, includes the unique names assumption, and can make use of the closed world assumption where appropriate. CYC currently does not store most of the information you would find in a dictionary, encyclopedia, or an almanac. For example, CYC may not know that Birendra Bir Vikram Shah Dev is the current king of Nepal, or that Kathmandu is its capital city. It does know what the characteristics of a capital city are, and it knows the significance of being a head of state. ---------------------------------------------------------------- Subject: [7-1] How do they store common sense in a computer ? See sections [1-1] and [7], above. Each assertion in CYC (a statement of fact or a "rule-of-thumb") is located in (or associated with) a specific microtheory or context. Each microtheory captures one "fairly adequate" solution to some knowledge representation area (knowledge domain). These solutions may address general areas like representing and reasoning about space, common devices, time, substances, agents, and causality or specific areas like weather, manufacturing a particular thing, and walking. Different areas may have several different microtheories, since the way an area is perceived or modeled may be different. Different points of view, different assumptions, different levels of granularity, and even what distinctions are important or not important may be significant enough to require creating a separate microtheory. A microtheory may be considered to be a smaller and more modular knowledge-base within CYC, which is specialized on a particular topic. The important thing to realize is that neither the CYC team, nor CYC itself claims to have a unified theory of time, space, and the universe. Nor does it embody some great master Laws of Thought. What they do have is a suite of specialized microtheories whose union covers the most common cases. ---------------------------------------------------------------- Subject: [7-2] How do they input common sense ? The CYC team's basic knowledge editing tool is called the NUE (New Unit Editor). The NUE is basically a full screen assertion editor that allows the user to view assertions in the knowledge base and perform a variety of operations, including adding assertions, removing assertions, creating new constants, killing constants, renaming constants, setting inference performance parameters (e.g., forward or backward propagation for rules), asking for conclusions to be derived (if possible), and viewing the inference chains that resulted in particular conclusions. Over the past few months (late 1994 - early 1995) the CYC team has written an entirely new set of interface tools for knowledge browsing and knowledge editing, based on Mosaic and HTML. These tools duplicate or exceed the functionality of the NUE, with the added benefit of complete standardization and portability across platforms. As these tools are for inhouse development, there is no public World Wide Web site available. There currently is some discussion of providing a subset of the CYC database on an example Web site. Should this happen, the address will be publicized. This may be available before the end of 1995. There is also a variety of test suites that are run periodically to test the integrity of the knowledge base and the functioning of the inference engine. The CYC team expects to give more emphasis to regular, automatic testing of the system now that product development is on the horizon. ---------------------------------------------------------------- Subject: [7-3] What theoretical foundation is behind CYC ? CYC is not a theoretical effort, although there has been a lot of theory used in its construction. The primary focus of the CYC project is to actually start consolidating a cohesive knowledge bank. Any theoretical issues which have been addressed have been directly motivated by the requirements of solving specific problems. Like First Order Predicate Calculus, CYCL allows using ForAll (universal quantification), ThereExists (existential quantification), and LogImplication (material implication), as well as the other common ways of combining variables and logical expressions such as LogAnd (conjunction), LogOr (disjunction), and LogNot (negation). The CYC team believes that a hand-encoded effort using symbolic logic may express a significant fraction of the fundamental human knowledge typically shared by most people. This bootstrap process is greatly enhanced by the redundant nature of knowledge. Most knowledge uses and re-uses the same basic ideas and relationships in many different ways. The day to day entering of knowledge is not based on ethereal definitions of elaborate Causality and Time-Space-Intelligence collections. Most data is as plebian as 'living organisms have to eat to stay alive' or 'broom handles tend to be made of wood'. It is hoped that as the Natural Language effort continues (see [7-5], below), more knowledge may be entered by persons typing in assertions in English, and eventually by having CYC 'read' source materials for itself, bothering its human attendants only when disambiguation is required. ---------------------------------------------------------------- Subject: [7-4] What is the difference between CYC and an Expert System ? The knowledge in CYC is more densely interrelated, CYC has more information about the common attributes of the world, and CYC has a broader focus than any individual expert system. A typical expert system uses highly detailed knowledge about a single, tightly-focused domain. CYC encodes general knowledge about many different domains, viewed from a variety of perspectives. Based on the bodies of information (microtheories) it uses in inferencing, CYC may draw differing conclusions. CYC may be thought of as a tool for building expert systems and other programs that use a rule-based knowledge representation. It supports and uses both forward and backward chaining. CYC has an integrated argumentation-based Truth Maintenance System to provide logical reasoning as well as supporting non-monotonicity. It can dynamically create terms, and has several conflict resolution strategies. ---------------------------------------------------------------- Subject: [7-5] What are they doing in Natural Language Processing ? The CYC NL system is unique in having access to a very large, declaratively represented common sense knowledge base. CYC helps the natural language system handle word/phrase disambiguation, and also provides a target internal representation language (CYCL) that can be used to do interesting things, such as inference. A substantial portion of the CYC natural language processing system (the lexicon and many semantic rules) is actually represented in the CYC knowledge base; CYC "knows about" words just like it "knows about" cars or trees. Syntactic parsing is carried out by a template-matching procedure. Semantic rules are applied to the output of the syntax module. It is in the application of the semantic rules that the knowledge in the knowledge base is proving especially advantageous. Most of the CYC pilot applications now under development have some NL component in their interfaces. The captioned image retrieval application, for example, accepts queries in English, and allows captioners to describe new images to the system using English sentences. The CYC NL team is currently expanding the lexicon, extending the parser, and adding new semantic capabilities to the system. ---------------------------------------------------------------- Subject: [8] Can CYC reason about information not stored in CYC-format databases ? Yes. There is an application named Cyccess which is used to interface CYC to structured information sources (SIS) such as databases or spreadsheets. Cyccess uses CYC to understand the contents of structured sources, to retrieve information, and to pose queries that depend on a combination of CYC knowledge and the data in the SIS. After the information in an SIS is appropriately linked to assertions in CYC, all the CYC inferencing, guessing, and consistency checking capabilities are available. An interesting implication of this is that CYC may use specific facts or time-sensitive information without duplicating it within the CYC knowledge base. ---------------------------------------------------------------- Subject: [9] What are the details of the functional interface to CYCL ? The Functional Interface of CYC is a method that can be used by external programs to query CYC for conclusions or general information. It currently consists of six operations. ASK given a logical formula, possibly including free variables, given a knowledge base subset return a binding list of variables that will make the formula true. ASSERT given a logical formula, given a knowledge base subset return a modified knowledge base with the formula as an axiom UNASSERT given a logical formula, given a knowledge base subset return a modified knowledge base with the formula not an axiom formula may still be a theorem,(follow from other axioms) JUSTIFY given a logical formula given a knowledge base subset return a minimal subset of the knowledge base sufficient to justify derivation of the formula as a theorem CREATE given a name string given a collection (set) term create and return a constant that is a member of the collection KILL given a term return a modified knowledge base in which the term no longer exists and any assertion of which it was a component is no longer true ---------------------------------------------------------------- Subject: [A] Acknowledgements This FAQ has been based on magazine articles and books published by the CYC team, notably Doug Lenat and R.V. Guha, and personal communication with Doug Lenat and Nick Siegel. Any mistakes in it are my sole responsibility, although this is not a warranty. I would appreciate a note about any inaccuracies or misrepresentations herein. This FAQ could not be created without the generosity of the CYC team in sharing information with the computing community about the methods and philosophy they have been using. ---------------------------------------------------------------- Subject: [B] Bibliography of Expert Systems books, introductions, documentation, periodicals, and conference proceedings. Lenat, D.B. and Guha, R.V Building Large Knowledge Based Systems, Addison Wesley, Reading Mass, 1990 Guha, R.V and Lenat, D.B. Pittman K., Pratt, D. Shepherd M. CYC: a midterm report. Communications of the ACM July 1990/ Vol 33. No 8 Guha, R.V and Lenat, D.B. CYC: a midterm report. A.I. Magazine, Fall 1990 Lenat, D.B. and Guha, R.V Enabling Agents to Work Together, Communications of the ACM July 1994/ Vol 37. No. 7 Davidson, Clive Common Sense and the Computer, New Scientist April 2, 1994 ----------------------------------------------------------------