An Interface to Access THEO Knowledge Bases

Tom Mitchell
September, 2009


OVERVIEW

This file describes functions to access the contents of a Theo Knowledge Base over the web, or on
your local workstation.  If you are doing many queries, it is probably best to place the knowledge
base on your local workstation for efficiency.  At present, these functions allow only accessing the
data, and not altering it.  They are implemented in Matlab, and (soon) Java.

Note on terminology: Theo is a frame-based representation system.  We use the term 'entity' to refer
to a frame, or object being described (e.g., "Pittsburgh"), the term "slot" to refer to a property
of the entity (e.g., "mayorOfCity"), and the term "value" to refer to the property's value (e.g.,
"Luke Ravenstahl').  You can think of the knowledge base as a set of relation triples of the form
slot(entity,value), such as mayorOfCity(pittsburgh,luke_ravenstahl).

The key functions are:
 initializeTheo()  loads Theo into Matlab.
 useKB(kbDirectory, <depth>, <caseSensitiveFilenames>) declares the web or local location of your KB
 isEntityKB(entity) returns 1 if entity exists in the KB
 getValueKB(slot,entity) retrieves the value of slot of entity, and probability and source when available 
 getEntitySlotsKB(entity) returns the list of slots associated with entity
 getEntitiesWithSlotValue(slot, <optional: value>) return all KB entities with slot value
 printEntity(entity) prints the description of entity to the terminal screen
 printHierachy(entity) prints the hierarchy of specializations of entity to the terminal screen
 


INSTALLATION

To install and run the Matlab version of Theo, 
1. download and untar it into a directory we'll call $THEO_DIR$ (e.g., '/usr/sue/Theo/')
2. edit the third non-comment line of $THEO_DIR$/code/coreTheo/initializeTheo.m, to declare 
   your top level Theo directory (i.e., $THEO_DIR$)
3. start Matlab
4  addpath '$THEO_DIR$/code/coreTheo/' so Matlab can find the function initializeTheo.m
5. type to the Matlab prompt:
  >> initializeTheo


EXAMPLE RUN

To run Theo, start Matlab, be sure the directory with 'initializeTheo' is on your Matlab path, then
type initializeTheo. Here is a simple sequence of commands that should work for you out-of-the-box.
 
 >> initializeTheo
 >> useKB('http://rtw.ml.cmu.edu/sslnlp09/',0,1)        % declares the KB you'll be inspecting
 >> printEntity('pirates')
 >> printEntity('pirates',3)
 >> getValueKB('plays_against','pirates')
 >> [val prob src] = getValueKB('plays_against','pirates')
 >> isEntityKB('mets')
 >> getEntitySlotsKB('mets')
 >> getEntitiesWithSlotValue('plays_sport_team')          % collect entities with a known 'plays_sport_team' value
 >> getEntitiesWithSlotValue('plays_sport_team','baseball')       % entities for which 'plays_sport_team'='baseball' 
 >> printHierarchy('sports_team');                      % warning, there are a few hundred sports_teams          


FUNCTION DOCUMENTATION

Below are more descriptions of the above functions.  In Matlab, you can get documentation on any
of these by typing "help <function>" to the Matlab prompt.

success = useKB(kbDirectory, <depth>, <caseSensitiveFilenames>)

  This function tells THEO to use the knowledge base (KB) stored on the local directory or web URL
  given by "kbDirectory".  The function also has two optional arguments, but in most cases you need
  not specify them and they will default to the correct values.  The function returns 1 if
  successful, else 0.  

  The optional arguments are provided for backward compatibility with old KBs.  The optional
  argument "depth" is an integer, and specifies that entities are stored on subdirectories nested
  "depth" levels deep.  If "depth" is not provided, it defaults to THEO.kbSubdirDepth.  The optional
  argument 'caseSensitiveFilenames' determines whether filenames in the KB use the same font case as
  entity names (if caseSensitiveFilenames=1), or whether filenames are all lower case regardless of
  entity case (if caseSensitiveFilenames=0).  The default value for this optional argument is
  THEO.caseSensitiveFilenames.  Note the first example below does not provide values for these
  optional arguments, whereas the second does.

  Example: useKB('/Users/tommitchell/Documents/MATLAB/Theo/RTW_KB_2009_03_19_ORS/')
  Example: useKB('http://rtw.ml.cmu.edu/sslnlp09/', 0, 1)


[value, probability, source] = getValueKB(slot, entity) 

  Returns the value of slot of entity, using the knowledge base specified by the most recent useKB
  command.  Also returns its probability (if one is associated with the value), and the source
  justification for the value.  If the slot contains a list of values, then the returned value will
  be a list, and the probability and source will be lists containing the same number of items as
  values -- one for each value. If probability or source are unavailable, then it returns -1 for
  their values.  If there is no value for slot of entity, or if entity does not exist, then the
  function returns the string 'NO_THEO_VALUE'.

  Example: [val, prob, src] = getValueKB('team_members','mets')
  Example: getValueKB('probability',{'mets','team_members'})


rslt = isEntityKB(entity)	
  Returns 1 if entity is defined in the KB, else 0.


rslt = getEntitySlotsKB(entity)
  Returns the list of slots associated with entity, as a Matlab cell array. Also includes
  the string token 'val' if the entity happens to be a slot instance with a value.  
  
  Example:
      getEntitySlotsKB('giants')
 

rslt = getEntitiesWithSlotValue(slot, <val>)

  Returns the list of all KB entities that contain "val" as a value for "slot".  If only the first
  input argument is provided, then it returns all KB entities that contain any value for "slot".  

   Warning: this requires time linear in the number of entities in the domain of slot

  Examples:
   get all entities for which slot 'friendly' has value 'yes'
   getEntitiesWithSlotValue('friendly','yes');   

   get all entities for which slot 'friendly' has any known value
   getEntitiesWithSlotValue('friendly');   


printEntity(entity <printDepth 1> <slotsToPrint 'all'>) 

  Print the entity on the user's terminal screen. "printDepth" is an optional argument which
  defaults to 1. It determines how deeply to show the slots, subslots, subsubsubslots, etc..
  "slotsToPrint" is an optional argument that defaults to 'all'.  If it is a list of strings, these
  are taken to be the names of the slots to be printed.  Its default value of 'all' causes it to
  print every slot except 'specializations'.
  For convenience, the function pre is defined as a synonym for printEntity.
 
  Examples:
   pre('mets') : prints the entity mets, including all known slot values
   pre('mets', 2, {'plays_against', 'team_plays_in_city'}) : print to depth 2, only two slots                      


printHierarchy(rootentity <slotsToPrint>)

 Prints the hierarchy of specializations under rootentity, as well as any known values for slots
 included in the optional argument "slotsToPrint". 
 For convenience, the function prh is defined as a synonym for printHierarchy.

 Examples:
  prh('person' {'mother' 'father'}) : prints tree of specializations under "person"
                                      including any cached slot values of "mother" or "father"
  prh('slot') : print the tree of specializations of 'slot'
  prh('relationships' 'all') : prints specializations of "relationships", 
                               including ALL known slot values of every entity printed.



NOTE ON REFERRING TO INSTANCES OF SLOTS, SUBSLOTS, SUBSUBSLOTS, ETC.

Theo uses a highly uniform representation in which slots can have subslots, which can have
subsubslots, nested to arbitrary depth.  Furthermore, each instance of a slot or subslot is itself
considered to be an entity. For example, consider the entity describing the 'mets':

mets:
  generalizations = {sports_team} 
  team_members = {carlos_delgado} 
    source = {{{OLv1-Iter:5-From:plays_for 2009/03/19-09:41:52 rtw-full, fromInverse} } } 
    probability = {0.9} 

We can, of course, get the value of the 'team_members' slot of 'mets' as follows:

  >> getValueKB('team_members','mets')

Similarly, we can get the value of the "probability" subslot of the slot instance
{'mets','team_members'} as follows:

    >> getValueKB('probability',{'mets','team_members'})

As the above example illustrates, you can refer to any slot or subslot instance as an entity.  The way to
refer to that entity is to give the path list from the top level entity to that (sub)slot.  For
example, the following three are all legitimate entity references in Theo: 'mets',
{'mets','team_members'}, {'mets','team_members','probability'}.  Any function that takes an entity
as input will also accept this kind of reference to a slot instance.  Hence, the following are
legitimate calls:

   >> printEntity({'mets','team_members'})
   >> getEntitySlotsKB({'mets','team_members'})
   >> getValueKB('source',{'mets','team_members'}) 
  
