An Interface to Access THEO Knowledge Bases Tom Mitchell September, 2009 OVERVIEW This file describes functions to access the contents of a Theo Knowledge Base over the web, or on your local workstation. If you are doing many queries, it is probably best to place the knowledge base on your local workstation for efficiency. At present, these functions allow only accessing the data, and not altering it. They are implemented in Matlab, and (soon) Java. Note on terminology: Theo is a frame-based representation system. We use the term 'entity' to refer to a frame, or object being described (e.g., "Pittsburgh"), the term "slot" to refer to a property of the entity (e.g., "mayorOfCity"), and the term "value" to refer to the property's value (e.g., "Luke Ravenstahl'). You can think of the knowledge base as a set of relation triples of the form slot(entity,value), such as mayorOfCity(pittsburgh,luke_ravenstahl). The key functions are: initializeTheo() loads Theo into Matlab. useKB(kbDirectory, , ) declares the web or local location of your KB isEntityKB(entity) returns 1 if entity exists in the KB getValueKB(slot,entity) retrieves the value of slot of entity, and probability and source when available getEntitySlotsKB(entity) returns the list of slots associated with entity getEntitiesWithSlotValue(slot, ) return all KB entities with slot value printEntity(entity) prints the description of entity to the terminal screen printHierachy(entity) prints the hierarchy of specializations of entity to the terminal screen INSTALLATION To install and run the Matlab version of Theo, 1. download and untar it into a directory we'll call $THEO_DIR$ (e.g., '/usr/sue/Theo/') 2. edit the third non-comment line of $THEO_DIR$/code/coreTheo/initializeTheo.m, to declare your top level Theo directory (i.e., $THEO_DIR$) 3. start Matlab 4 addpath '$THEO_DIR$/code/coreTheo/' so Matlab can find the function initializeTheo.m 5. type to the Matlab prompt: >> initializeTheo EXAMPLE RUN To run Theo, start Matlab, be sure the directory with 'initializeTheo' is on your Matlab path, then type initializeTheo. Here is a simple sequence of commands that should work for you out-of-the-box. >> initializeTheo >> useKB('http://rtw.ml.cmu.edu/sslnlp09/',0,1) % declares the KB you'll be inspecting >> printEntity('pirates') >> printEntity('pirates',3) >> getValueKB('plays_against','pirates') >> [val prob src] = getValueKB('plays_against','pirates') >> isEntityKB('mets') >> getEntitySlotsKB('mets') >> getEntitiesWithSlotValue('plays_sport_team') % collect entities with a known 'plays_sport_team' value >> getEntitiesWithSlotValue('plays_sport_team','baseball') % entities for which 'plays_sport_team'='baseball' >> printHierarchy('sports_team'); % warning, there are a few hundred sports_teams FUNCTION DOCUMENTATION Below are more descriptions of the above functions. In Matlab, you can get documentation on any of these by typing "help " to the Matlab prompt. success = useKB(kbDirectory, , ) This function tells THEO to use the knowledge base (KB) stored on the local directory or web URL given by "kbDirectory". The function also has two optional arguments, but in most cases you need not specify them and they will default to the correct values. The function returns 1 if successful, else 0. The optional arguments are provided for backward compatibility with old KBs. The optional argument "depth" is an integer, and specifies that entities are stored on subdirectories nested "depth" levels deep. If "depth" is not provided, it defaults to THEO.kbSubdirDepth. The optional argument 'caseSensitiveFilenames' determines whether filenames in the KB use the same font case as entity names (if caseSensitiveFilenames=1), or whether filenames are all lower case regardless of entity case (if caseSensitiveFilenames=0). The default value for this optional argument is THEO.caseSensitiveFilenames. Note the first example below does not provide values for these optional arguments, whereas the second does. Example: useKB('/Users/tommitchell/Documents/MATLAB/Theo/RTW_KB_2009_03_19_ORS/') Example: useKB('http://rtw.ml.cmu.edu/sslnlp09/', 0, 1) [value, probability, source] = getValueKB(slot, entity) Returns the value of slot of entity, using the knowledge base specified by the most recent useKB command. Also returns its probability (if one is associated with the value), and the source justification for the value. If the slot contains a list of values, then the returned value will be a list, and the probability and source will be lists containing the same number of items as values -- one for each value. If probability or source are unavailable, then it returns -1 for their values. If there is no value for slot of entity, or if entity does not exist, then the function returns the string 'NO_THEO_VALUE'. Example: [val, prob, src] = getValueKB('team_members','mets') Example: getValueKB('probability',{'mets','team_members'}) rslt = isEntityKB(entity) Returns 1 if entity is defined in the KB, else 0. rslt = getEntitySlotsKB(entity) Returns the list of slots associated with entity, as a Matlab cell array. Also includes the string token 'val' if the entity happens to be a slot instance with a value. Example: getEntitySlotsKB('giants') rslt = getEntitiesWithSlotValue(slot, ) Returns the list of all KB entities that contain "val" as a value for "slot". If only the first input argument is provided, then it returns all KB entities that contain any value for "slot". Warning: this requires time linear in the number of entities in the domain of slot Examples: get all entities for which slot 'friendly' has value 'yes' getEntitiesWithSlotValue('friendly','yes'); get all entities for which slot 'friendly' has any known value getEntitiesWithSlotValue('friendly'); printEntity(entity ) Print the entity on the user's terminal screen. "printDepth" is an optional argument which defaults to 1. It determines how deeply to show the slots, subslots, subsubsubslots, etc.. "slotsToPrint" is an optional argument that defaults to 'all'. If it is a list of strings, these are taken to be the names of the slots to be printed. Its default value of 'all' causes it to print every slot except 'specializations'. For convenience, the function pre is defined as a synonym for printEntity. Examples: pre('mets') : prints the entity mets, including all known slot values pre('mets', 2, {'plays_against', 'team_plays_in_city'}) : print to depth 2, only two slots printHierarchy(rootentity ) Prints the hierarchy of specializations under rootentity, as well as any known values for slots included in the optional argument "slotsToPrint". For convenience, the function prh is defined as a synonym for printHierarchy. Examples: prh('person' {'mother' 'father'}) : prints tree of specializations under "person" including any cached slot values of "mother" or "father" prh('slot') : print the tree of specializations of 'slot' prh('relationships' 'all') : prints specializations of "relationships", including ALL known slot values of every entity printed. NOTE ON REFERRING TO INSTANCES OF SLOTS, SUBSLOTS, SUBSUBSLOTS, ETC. Theo uses a highly uniform representation in which slots can have subslots, which can have subsubslots, nested to arbitrary depth. Furthermore, each instance of a slot or subslot is itself considered to be an entity. For example, consider the entity describing the 'mets': mets: generalizations = {sports_team} team_members = {carlos_delgado} source = {{{OLv1-Iter:5-From:plays_for 2009/03/19-09:41:52 rtw-full, fromInverse} } } probability = {0.9} We can, of course, get the value of the 'team_members' slot of 'mets' as follows: >> getValueKB('team_members','mets') Similarly, we can get the value of the "probability" subslot of the slot instance {'mets','team_members'} as follows: >> getValueKB('probability',{'mets','team_members'}) As the above example illustrates, you can refer to any slot or subslot instance as an entity. The way to refer to that entity is to give the path list from the top level entity to that (sub)slot. For example, the following three are all legitimate entity references in Theo: 'mets', {'mets','team_members'}, {'mets','team_members','probability'}. Any function that takes an entity as input will also accept this kind of reference to a slot instance. Hence, the following are legitimate calls: >> printEntity({'mets','team_members'}) >> getEntitySlotsKB({'mets','team_members'}) >> getValueKB('source',{'mets','team_members'})