------- Protein Motif Knowledge Base Using Quixote -------

OVERVIEW

The analysis of the genes of living organisms is essential technology
to the deciphering of biological phenomena.Today,with increasing
advance in bio-technology,the genes that have been identified but
which have yet to be analyzed are rapidly increasing. To enable the
automatic analysis of massive genes to extract biological information
from them,the introduction of knowledge-based analysis is necessary,in
addition to the development of high-speed computers and fast analysis
algorithms.This is partly because the quality of analysis without
biological knowledge is not high enough and partly because the time
required for the analysis can be reduced remarkably by the
introduction of biological knowledge.
  Nowadays, data about genetic information is stored in databases.
But,these databases have been constructed with little consideration to
their application to knowledge engineering. So,it is difficult to use
these databases to create a system that enables high-level knowledge
processing. To do this,the effective representation of biological
knowledge is necessary.
  We used DOOD(Deductive Object-Oriented Database) language Quixote to
represent biological knowledge.We thought that the Object-Oriented
feature of Quixote would be useful for the representation .

  We constructed our biological knowledge base using Quixote. The
biological knowledge base contains knowledge on motif  and its related
knowledge.
  Motifs in Prosite and related knowledge are represented in a
hierachical and multiple inheritance scheme.
The examples of description of knowledge by Quixote is shown below.


 Ex.1)
        dna_rna_associated >= {bind_to_Nucleic_acid,
                               synthesis_DNA_RNA, 
                               other_DNA_RNA_associated
                              };;

            synthesis_DNA_RNA >= {ribosomal_protein,
                                  dna_mismatch_repair,
                                  rna_related};;


 Ex.2)
    synthesis_DNA_RNA::{
       ribosomal_protein;;
       dna_mismatch_repair/[function=repair[to="DNA"]];;
       dna_mismatch_repair[name = "DNA mismatch repair proteins 
                       mutL / hexB / PMS1 signature"]/
            [pattern = "L-G-F-R-G-E-A-L",
             ps = psS00058,
             pd = pdoc00057,
             position = n_terminal
            ]
       };;

  The Ex.1 describes the class structure and the Ex.2 shows the
entries of motif. In  Ex.1 A >= B means B is a sub-concept of A.
  The Ex.2 shows that there are three objects in module
"synthesis_DNA_RNA". The attributes of objects are indicated according
to the description of Prosite. The attribute "ps"  indicates the
accession number associated with the entry in the "prosite.dat" file,
and the attribute "pdoc" indicates the pointer to the entry in the
"prosite.doc"  file.

  
  About the syntax of Quixote ,please refer to  the documents of Quixote.


SYSYTEM CONFIGRATION

Both UNIX workstation and PIM/PSI are required.
Quixote Ver.III is required to use this program.



FILES 

Quixote program;
  prosite.qxt
  qxt_init.kl1

for reference;
  prosuser.txt
  prosite.dat
  prosite.doc
  prosite.lis


REFERENCE

  Bairoch,A. Prosite:A Dictionary of Protein Sites and Patterns.
  User Manual Release 9.0,June 1992
