RDFDB

RDFDB : An RDF Database

R.V.Guha

RDFDB is a database for storing directed labelled graphs, a la RDF. All the source code is available under the Mozilla public license.

I welcome feedback. Please send it to me.

Goals

The world needs a very scalable and very fast triple store. The goals of this project are to build a database that is capable of

Supporting a graph oriented API via a textual query language ala SQL.
Load/Reload an RDF file from a url into the database (this requires the database to do http, have a parser, ...).
Scalable to millions of nodes and triples.
Provide support for RDF Schemas.
Provide support for some basic forms of inferencing.
Provide both C and Perl access to the database.
Support multiple concurrent users.
The Perl philosophy applies : Simple things should be simple and complex things should be possible.

The data model

An Database is a directed labelled graph. The graph can be constructed either by reading in one or more files (in XML-RDF or some other format such as RSS) or by making a sequence of insertions/deletions.

Queries are addressed to a database. Version 1 will only support simple conjunctive queries, written using a datalog query language.

Query Language

The syntax of the query language is as similar to SQL as possible, purely for pedagogical reasons.

We keep having to make a distinction between the string "foo" and the resource whose identifier is "foo". The latter will be written as foo, while the former is written as "foo". This causes ambiguity between number sequences (such as 23928) as integers and number sequences as uris. To keep things simple, we interpret these as integers. Object identifiers/URIs can use namespaces.

Create database [database_name]
e.g., create database foo
Drop database [database_name]
e.g., drop database foo
Load [file_type] [url] into database
This loads the contents of url (assumed to be of type file_type, typically RDF in XML or RSS) into the database. If we had already loaded the contents of that url into the database, it first gets unloaded and then reloaded.
e.g., load RDF_XML file http://dmoz.org/rdf/structure.rdf into foo
Unload file [url] from database [database]
e.g., unload file http://dmoz.org/rdf/structure.rdf from foo
Enter namespace xmlns[:prefix] [namespace]
Start associating the namespace prefix [prefix] with [namespace] in subsequent queries.
e.g., enter namespace xmlns:r http://www.w3.org/TR/RDF
enter namespace xmlns http://directory.mozilla.org/rdf
Leave namespace xmlns[:prefix]
insert into [database_name] (arc₁ source₁ target₁), (arc₂ source₂ target₂)...
e.g., insert into foo (narrow http://dmoz.org/rdf/structure.rdf#Top FlyingPizzas)
delete from [database_name] (arc₁ source₁ target₁), (arc₂ source₂ target₂)...

e.g., delete from foo (narrow http://dmoz.org/rdf/structure.rdf#Top FlyingPizzas)
select [var1], [var2], ... from [database1_name] where (arc₁ source₁ target₁), (arc₂ source₂ target₂)...
Any of the source_i or target_i could be the same. Any of them (including the arc) can also be a variable. Variables are syntactically identified as symbols that start with the character '?'.
e.g., select ?x ?y from foo where (title ?x ?y) (createdBy ?x RichSkrenta) (type ?x Topic)
List the id's and titles of all objects of type Topic created by RichSkrenta.

Interaction

The process of querying the database should in principle be as simple as getting data using http. Database connections have traditionally been rather heavyweight. Database connections with RDFDB are as lightweight as http connections. You simply telnet into a specified port, issue the query and get back results.

Here is a simple sample session.

Getting it

Here is the binary for linux. Here is a tarball of the source. The current version uses Berkeley DB from Sleepycat software. So, if you want to build it yourself, you will need to get and build Berkeley DB.

This is the perl code for a simple WWW front end for browsing a database.

Send bugs to guha@guha.com.