Perl and Databases

an introduction to DBI

Chris Winters
chris@cwinters.com


Table of Contents


NAME

Perl and Databases -- an introduction to the DBI


SYNOPSIS

This presentation is a brief introduction to the Database Interface (DBI), perl's method for accessing databases. In this presentation you will learn:

This is not a syntax guide; see the excellent documentation that comes with the DBI for that. This is also probably old news for most experienced DBAs, although learning how to use perl to access databases may be new.


INTRODUCTION

The DBI has been in development for about five years. Earlier incarnations were known by their database-specific names: oraperl, sybperl, and so forth. These are primarily Perl-4 modules: to use oraperl, you would have to statically link the Oracle libraries and oraperl routines with the perl binary.

The primary developer behind the DBI is Tim Bunce. I won't give his e-mail address out here since he's constantly deluged with requests of varying legitimacy. Installing the DBI will give you contact information for not only him but also the rest of the DBI world via the dbi-users and dbi-dev mailing lists. (Contact information for these lists and other DBI resources are given in the LINKS section at the end of this document.)


What is it?

The DBI is an abstraction layer allowing you to access all types of datastores using a consistent and object-oriented interface. The DBI is designed to be as thin a layer as possible to ensure database operations aren't slowed by the accessor method. The DBI documentation has benchmarks for Oracle calls; the speed of accessing your database may vary.

The DBI does directly not do much by itself. It requires a Database Driver (DBD) to access a particular database. These drivers are generally labeled: DBD::<driver_name>, such as DBD::Sybase, DBD::Oracle, DBD::Informix, and so on. These drivers are generally compiled with the database client libraries where necessary and accessed as a sharable object from perl. (And that's the extent of my knowledge there.)

Folks from the Windows world may find a comforting parallel with ODBC: the DBI is analagous to the ODBC Administrator (or driver manager); each DBD corresponds to an ODBC driver. Confusing the issue is the fact that there exists a DBD::ODBC driver which allows you to use ODBC drivers in place of the native database client drivers.

For a more thorough description of the DBI architecture (with graphics!), check out the introductory article by Tim Bunce and Alligator Descartes in Perl Journal #5 (link found at the end of this document).


What does it look like?

A quick example with lots of concepts:

Example 1: Quick Overall Example

   1: my $dbh = DBI->connect( 'DBI:Sybase:mydb', 'cwinters', 'mypass',
   2:                    { RaiseError => 1 } )
   3:           || die "Cannot connect to database: $DBI::errstr";
   4: my $sql = qq(
   5:   SELECT fname, lname, login_date
   6:     FROM users
   7:    WHERE login_date > '1999-06-25'
   8: );
   9: my ( $sth );
  10: eval {
  11:   $sth = $dbh->prepare( $sql );
  12:   $sth->execute;
  13: };
  14: die "Cannot prepare/execute $sql\nError: $@" if ( $@ );
  15: 
  16: my ( $fname, $lname, $login );
  17: $sth->bind_columns( undef, \$fname, \$lname, \$login );
  18: print "Database Login Information\n",
  19:       "==========================\n";
  20: while ( $sth->fetch ) {
  21:   print "Last login by $lname, $fname: $login\n";
  22: }
  23: $sth->finish;
  24: $dbh->disconnect;

Let's go through this really quick:

Now that we've had a brief overview, let's go through everything we need to know to make a database application.


INSTALLING

We'll keep this brief. The DBI, like most perl modules, is fairly simple to install:

 tar -zxvf DBI-1.06.tar.gz
 cd DBI-1.06/
 perl Makefile.PL
 make
 make test
 make install

You can also use the CPAN shell, or if you're using the ActiveState port of Win32 perl, the PPM installation utility.

Installing the individual DBD modules are more difficult. Generally, you need to:

  1. Read the README! Like all perl modules, the authors spend a good deal of time writing the module and documentation. They've already gone through a number of the gotchas themselves. Save yourself some time and read the available documentation.
  2. Know where your libraries are. Did you install your relational database? If not, track down your local DBA and find out where the client libraries are located. Some modules attempt to make a best guess, some modules require that you set environment variables or set a parameter when building the module.
  3. Check the archives. The dbi-users mailing list has one or two sites where the mailing list is archived. Unless you're using a truly unusual combination of database and operating system, someone else has likely had your problem.

Note that the module installers I'm aware of, CPAN and PPM, have both worked well for me when installing new modules, even on Win32 systems.

If you've got everything lined up, building the module is much the same set of steps as building the DBI. Some modules require that you have a database defined for testing purposes so that it can manipulate tables and records without stomping on other data. Most packages accept 'test' as the database name by default, although you can normally also define your own.


CONNECTING AND THE DATABASE HANDLE

Connecting to the database is the first step you'll do. All actions going to the database are done through a database handle.

Create a database handle like this:

 my $dbh = DBI->connect( 'DBI:mysql:test',
                         'username', 'password',
                         { RaiseError => 1 } );
where:

  DBI:mysql:test - Data Source Name (DSN)
  DBI            - DBI (necessary first part of the DSN)
  mysql          - Name of DBD
  test           - Name of database we are connecting to.
  username       - User we are connecting to this database as
  password       - Password for the user

You should pass a username and password even for systems, such as DBD::XBase or DBD::CSV, that do not use them at all. Just pass an empty string for both fields. The most recent version of DBI (1.09) tripped a few people up since the behavior changed slightly. (See the Changes file that comes with DBI 1.09.)

The final argument to the connect call is a hashref of options. An example is:

 { RaiseError => 1, AutoCommit => 0 }

This argument is used as a shorthand for setting parameters of the database handle. You can if you prefer call:

 $dbh->{RaiseError} = 1;
 $dbh->{AutoCommit} = 1;

As the DBI documentation states, connect() returns undef on failure. See ERROR HANDLING for more information on how to deal with this.

The DBI does not really care what statements you send through this database handle. You can think of the DBI as a pass-through module: it defines an interface for sending requests to and receiving data from a database, but it does not really care what the statements or data are.

So if you're having problems with your SQL, the DBI is not the place to look. As long as the data is getting to the database and you're getting messages back from it, the DBI is doing its job.

Now that you have a database handle, what can you do with it? You can use it directly, as we discuss below. But, like all objects in Perl, the $dbh you've now defined is very versatile. It's really just a reference. my @userlist = get_all_users( $dbh, $table_name ); ... sub get_all_users { my $dbh = shift; my $table_name = shift; ... }

You can also embed it in other objects:

Example 2: Database handle passing

   1: my $user = MyApp::User->new( dbh => $dbh, 
   2:                         username => $username );
   3: my $full_name = $user->full_name();
   4: 
   5: package MyApp::User;
   6: 
   7: sub new {
   8:  my $pkg = shift;
   9:  my $class = ref( $pkg ) || $pkg;
  10:  my %params = @_;
  11:  my $self = {};
  12:  $self->{dbh} = $params{dbh};
  13:  $self->{username} = $params{username};
  14:  return bless( $self, $class );
  15: }
  16: 
  17: sub full_name {
  18:  my $self = shift;
  19:  if ( ! $self->{full_name} ) {
  20:     my $sql = qq/
  21:       SELECT FirstName, LastName
  22:         FROM Users
  23:        WHERE Username = $self->{usename}
  24:     /;
  25:     eval {
  26:       $sth = $self->{dbh}->prepare( $sql );       
  27:       $sth->execute();
  28:     };
  29:     die "Error: $@ with SQL\n$sql" if ( $@ );
  30:     my $row = $sth->fetchrow_arrayref;
  31:     $self->{full_name} = join ' ', @{ $row };
  32:  }
  33:  return $self->{full_name};
  34: }

Note that you do not need to dereference your handle when you pass it to a subroutine or use it within an object. A relatively common sight on the dbi-users mailing list is this:

Example 3: Unnecessary handle dereferencing

   1: my $fetched_rows = fetch_rows( $dbh, 'SELECT * FROM TABLE' );
   2: 
   3: sub fetch_rows {
   4:  my $sql = shift;
   5:  my $dbh = shift;
   6:  my ( $sth );
   7:  eval {
   8:    $sth = $$dbh->prepare( $sql );
   9:    $sth->execute();
  10:  };

The deferencing on line 8 is entirely unnecessary, and it's wrong. Running it will get you an error like:

  Not a SCALAR reference at myprog.pl line 6.


Multiple database handles

Nothing in the DBI stops you from opening as many database handles as you like -- you may be limited by your database's licensing restrictions, but that's a separate issue. Multiple handles are typically used when moving data from one database to another.

Since the DBI has so many drivers available, it's fairly trivial to write a program that will move data from one database to another, particularly when one database is a flatfile (XBase, CSV) and the other is a relational system. In this case, you'll probably open up one handle to the 'from' database and one handle to the 'to' database.

To move data within a database, from one table to another, you can typically use one database handle but two statement handles. Some databases cannot support this: DBD::Sybase silently opens up another database handle for you when you try to use multiple statement handles concurrently from a single database handle.


NON-SELECT STATEMENTS

Non-select data manipulation and definition statements are fairly simple to execute within the DBI. All are executed with the DBI do method. As the documentation states, do returns the number of rows affected by the statement, -1 if not known, and undef on an error.

The ``number of rows affected by the statement'' is not normally a controversial number: an DELETE statement will return the number of rows deleted; an UPDATE statement will return the number of rows matched by the statement and updated. However, if you rely on this figure for consistency checking you may want to ensure that it returns what you think it should return. A small brouhaha recently erupted on the dbi-users mailing list when it was determined that MySQL returns not the number of records matched by an UPDATE statement but rather the number of records actually updated after the optimizer removes the records from consideration that already match the update values.

Example 4: Creating a MySQL table

   1: print "Creating table...";
   2: my $sql = qq/
   3:   CREATE TABLE my_widgets (
   4:      my_id int not null auto_increment,
   5:      widget_ref varchar(20) null,
   6:      widget_name varchar(100) null,
   7:      widget_factory varchar(15),
   8:      primary key ( my_id )
   9:   )
  10: /;
  11: my ( $rv );
  12: eval {
  13:   $rv = $dbh->do( $sql );
  14: };
  15: if ( ! $rv or $@ ) {
  16:   my $err_msg = $@ || $dbh->errstr;
  17:   print "error.\nMessage: $err_msg\nSQL: $sql";
  18: }
  19: else {
  20:   print "success.\n";
  21: }

Even though we use a MySQLism at line 4 (auto_increment), the DBI doesn't care. It just passes the statement along to the driver. (More about auto incrementing fields later.)


Quoting

One of the problems every user first has when writing even simple database applications is quoting. A basic SQL statement looks like this:

 INSERT INTO my_widgets
 ( widget_ref, widget_name, widget_factory )
 VALUES
 ( '09Fx67812', 'Thumbscrew thingy', '7691823-FC' )

(Some databases allow you to use double quotes instead of single quotes to demarcate string values.)

But what happens when one of your string values already has a quote in it?

 INSERT INTO my_widgets
 ( widget_ref, widget_name, widget_factory )
 VALUES
 ( '09Fx67812', 'Toilet for Army's hospital', '7691823-FC' ) 

Uh oh. You're likely to get an error like:

 DBD::mysql::db do failed: parse error near 's hospital', '7691823-FC' ) 
 ' at line 5 at test_stmt.pl line 37.

or:

 DBD::Sybase::db do failed: Server message number=102 severity=15  
 state=1 line=6 text=Incorrect syntax near 's'.
 Server message number=105 severity=15 state=1 line=6 text=Unclosed  
 quote before the character string ' ) 
 '.

(The syntax of these error messages depends entirely on the database; error messages are discussed further in ERROR HANDLING below.)

So, how do we get around this? The DBI has several ways, all preferable to the homegrown solution. A typical homegrown solution looks like this:

 $value = q(Toilet for Army's hospital); 
 ( $quoted_value = $value ) =~ s/'/''/g;
 $quoted_value =~ s/^(.*)$/'$1'/;
 $sql = qq/
   INSERT INTO my_widgets 
   ( widget_name ) VALUES ( $quoted_value )
 /;

This is, of course, horribly messy. Most people advance this solution to create a subroutine that does all the quoting for them. But it's still database-dependent: MiniSQL uses a slash (\) as an escape character while many other databases (MySQL and Sybase included) use a single quote ('). When you change your database, you have to change your quoting subroutine.

The DBI has a much better way. There are two primary methods to quote values. First, you can quote the values on an individual basis:

 $quoted_value = $dbh->quote( $value );

Easy and simple. The string $quoted value is now ready to be inserted into the database your database handle is connected to:

 $value = q(Toilet for Army's hospital);
 $quoted_value = $dbh->quote( $value );
 $sql = qq/
   INSERT INTO my_widgets 
   ( widget_name ) VALUES ( $quoted_value )
 /;
 $rv = $dbh->do( $sql );

As a corrollary, you can skip the temporary variable and do something like:

 $value = q(Toilet for Army's hospital);
 $sql = qq/
   INSERT INTO my_widgets 
   ( widget_name ) VALUES ( @{[ $dbh->quote( $value ) ]} )
 /;
 $rv = $dbh->do( $sql );

You can also use a separate method known as binding values. Binding values is very elegant:

 my $value = q(Toilet for Army's hospital);
 $sql = qq/
   INSERT INTO my_widgets
   ( widget_name ) VALUES ( ? )
 /; 
 $rv = $dbh->do( $sql, undef, $value );

That question mark is called a placeholder and puts up a flag for the DBI when the statement is actually executed. You can use a placeholder for any number of values but not complex expressions. When the statement is executed, the DBD reads the values from an ordered list you pass it and plugs the values into the statement.

When the driver plugs these values in, it also quotes them free of charge. So many DBI applications will never have a call to the quote() method of the database handle, preferring instead to use bound values.

Another benefit of placeholders is performance. On more robust database systems, preparing a statement once and executing it many times with values afterward is several times faster than preparing and executing (or using the do() method) each individual statement. For instance, assuming a colon-separated input file:

Example 5: Preparing once, executing many

   1: my $sql = qq/
   2:   INESRT INTO SysUsers
   3:   ( username, first_name, last_name, email, phone )
   4:   VALUES 
   5:   ( ?, ?, ?, ?, ? )
   6: /;
   7: my ( $sth );
   8: eval {
   9:   $sth = $dbh->prepare( $sql );
  10: };
  11: die "Cannot prepare statement! Error: $@" if ( $@ );
  12: open( USERS, $user_data ) || die "Cannot open user data! Error: $!";
  13: while( <USERS> ) {
  14:   chomp;
  15:   my ( $user, $fn, $ln, $email, $phone ) = split /:/;
  16:   eval {
  17:     $sth->execute( $user, $fn, $ln, $email, $phone );
  18:   };
  19:   die "Cannot execute with $user|$fn|$ln|$email|$phone ",
  20:       "with error $@"                          if ( $@ );
  21: }

The DBI documentation has more information about placeholders and using them to bind values. We'll also deal with this further below with SELECT statements.


Auto-increment fields

Auto-increment fields are present in many databases. Such a field ensures that each row gets a unique number by keeping an internal counter and incrementing it with every new row inserted. Other databases (such as Oracle and MiniSQL) accomplish this sort of thing through a sequence object.

This is very useful to keep your rows unique and saves some administrative headache, but once the value is inserted, you frequently need to know the value that was just used. For instance, when doing a multi-table insert joined by a common value.

The DBI does not currently define an interface for retrieving the value of the auto-increment field, so what follows is database-dependent.

MySQL uses the 'AUTO_INCREMENT' tag to identify a field in a table that will be incremented with each INSERT. (There can be only one per table.) The DBD::mysql module makes retrieving the value for the auto-increment field very simple:

Example 6: Retrieving auto-increment value from MySQL

   1: my $sql = qq/
   2:   INSERT INTO my_widgets
   3:   ( widget_ref, widget_name, widget_factory )
   4:   VALUES ( ?, ?, ? )
   5:   /;
   6: my ( $sth ); 
   7: eval {
   8:   $sth = $dbh->prepare( $sql );
   9:   $sth->execute( '568172-XB', 'Specialized hammer', '912P918-G1' );
  10: };
  11: die "Cannot prepare/execute!\nSQL: $sql\nError: $@" if ( $@ );
  12: my $inserted_id = $sth->{insertid}; 

Sybase and Microsoft SQL Server both use Transact-SQL as their flavor of SQL. Defining a field as auto-incrementing is typically done like this:

  CREATE TABLE my_widgets (
    my_id numeric(14,0) identity not null,
  ...

You can set seed and step values as desired; see your favorite Transact-SQL book for more. Retrieving the value for a Transact-SQL identity column is done like this:

Example 7: Retrieving auto-increment value from Sybase/MSSQL

  1 - 11: same as Example 6
  12: $sth = $dbh->prepare( 'SELECT @@IDENTITY' );
  13: $sth->execute;
  14: my $row = $sth->fetchrow_arrayref;
  15: my $inserted_id = $row->[0];

(We'll get to how the SELECT statement and the statement handle work shortly.)

When writing an application, it's typically a good idea to write a generic fetch_insert_id() method that you can use. Then that method can be aliased to a database-specific method for actually retrieving the value as needed.

As for the sequence object, DBI and DBD::Oracle author Tim Bunce recently posted the following on the dbi-users mailing list. It's presented here without modification or testing and for your information only. YMMV.

Example 8: Retrieving sequence value from Oracle

   1: $sth = $dbh->prepare(q{
   2:           INSERT INTO foo (Id) VALUES (seqn.NEXTVAL) RETURNING Id INTO :PId;
   3: });
   4: $sth->bind_param_inout(":PId", \$PId, 100);
   5: $sth->execute;


SELECTING

Selecting (or retrieving) data from a database is one of the most common operations done with the DBI. This will get you on your way.


The statement handle: how to fetch data

SELECT statements are different from the statements discussed above. Rather than returning a value indicating the success of the statement, you can receive any number of datasets, or rows. Retrieving this information requires something more complex than a scalar return value.

This something is called a statement handle. Create a statement handle like this:

 my $sth = $dbh->prepare( $sql );

To finish the process so you can fetch data from the database, call the execute() method on it:

 $sth->execute;

You can now call any of the methods that actually fetch the data from the database. In turn, these are:

Of these methods, fetchrow_arrayref is the fastest. And the DBI documentation warns that fetchrow_hashref shouldn't be used in situations where performance is important. =head2 Why prepare and execute?

Except for the special methods mentioned below, two separate methods, prepare and execute are used to create a statement handle for fetching data from a database. Why? One big reason is placeholders.

As mentioned above, placeholders allow you to easily quote data. They also allow you


Binding results

Combined with the bind columns method, you'll likely often see the following idiom in DBI applications:

Example 9: Binding values and columns

   1: my $sql = qq/
   2:   SELECT field1, field2, field3
   3:     FROM table
   4:    WHERE field2 = ?
   5: /;
   6: my ( $sth );
   7: eval {
   8:   $sth = $dbh->prepare( $sql );
   9:   $sth->execute( $value2 );
  10: };
  11: die "Cannot select data with $sql ",
  12:     "and values $value2\nError: $@"   if ( $@ );
  13: my ( $field1, $field2, $field3 );
  14: $sth->bind_columns( undef, \$field1, \$field2, \$field3 );
  15: while ( $sth->fetch ) {
  16:   print "Field1: $field1\nField2: $field2\nField3: $field3";
  17: }
  18: $sth->finish;


Multiple statement handles

It often happens, particularly when you're moving or copying data within a table or from one table to another, that you've got more than one statement handle open at a time. The DBI cautions that multiple statement handles from one database handle are not supported across all drivers. One driver, DBD::Sybase, silently fakes this behavior by opening a new database handle when you try to open a new statement handle on a database handle that already has an active statement handle.

Here's an example using multiple statement handles. This is taken from part of a recent project of mine. Two tables ($SYSTEM_TABLE and $COUNTY_TABLE) exist with transit system data and the counties those transit systems serve, respectively. The two tables have a common field, system_id, that we've already ensured correspond to one another. We're taking the data from these two tables from one database, cleaning the data (removing blank entires) and creating records in an entirely separate database. Note that we don't see here where the records are created -- we just see subroutine names and values being passed in.

Example 10: Multiple statement handles

   1: print "Selecting systems and counties...";
   2: my $sqls = qq/
   3:   SELECT system_id, agency_name, phone, city, state, 
   4:          miles, fleet, expenses, trips
   5:     FROM $SYSTEM_TABLE
   6:    ORDER BY system_id
   7: /;
   8: my $sqlc = qq/
   9:   SELECT *
  10:     FROM $COUNTY_TABLE
  11:    ORDER BY system_id
  12: /;
  13: my ( $sts, $stc );
  14: eval {
  15:   $sts = $dbh->prepare( $sqls );
  16:   $sts->execute;
  17:   $stc = $dbh->prepare( $sqlc );
  18:   $stc->execute;
  19: };
  20: die "Having problem selecting system and county data: $@" if ( $@ );
  21: print "done\n";
  22:
  23: print "Analyzing and moving data...";
  24: while ( my $sys_row = $sts->fetchrow_hashref ) {
  25:   my $sys_tb_id = save_system( %{ $sys_row } );
  26:   my $area_data = $stc->fetchrow_arrayref;
  27:   my @area_list = @{ $area_data };
  28:   my ( $system_id_cty, $state ) = splice( @area_list, 0, 2 );
  29:   if ( $sys_row->{ system_id } != $system_id_cty ) {
  30:     die "County system_id and System list system_id don't match! ",
  31:         "($system_id/$system_id_cty)";
  32:   }
  33:   $state =~ s/\s*$//;
  34:   my $area_total = 1;
  35: ITEM:
  36:   foreach my $area ( @area_list ) {     
  37:     my $type = ( $area_total <= 20 ) ? 'county' : 'city';
  38:     next ITEM if ( ! $area ); # skip blanks
  39:     $area =~ s/\s*$//;  # remove trailing blanks
  40:     save_area( system_id => $sys_tb_id, state => $state, 
  41:                name => $area, type => $type );
  42:     $area_total++;
  43:   }
  44: }
  45: print "done\n";

Again, be sure to check and see if your database supports multiple statement handles per database handle.


TRANSACTIONS

Most modern relational database systems are based around the idea of a transaction. A transaction is basically a set of related SQL statements that are commited to the database at one time. If any of the statements fail, the transaction is sent a rollback signal, which removes the changes any SQL statement in the transaction made.

The classic example for transactions is a bank: a customer wishes to transfer funds from one account to another. This requires two discrete statements: an addition to one account and a subtraction from the other.

What happens if the bank's power goes out after the first statement but before the second? In a database without transactions, you have one happy customer! But in a database with transactions, the database upon starting up will automatically rollback all uncommitted transactions, putting everything back the way it was before the power went out.

In Perl, the above would look something like this (in a very abstracted sense):

 if ( add_funds( $dbh, $account, $amount ) and
     remove_funds( $dbh, $account, $amount ) {
     $dbh->commit;
  }
  else {
     $dbh->rollback;
  }

The {AutoCommit} attribute of the database handle controls how your database will handle commits and rollbacks. If {AutoCommit} is set to 1 (which is the default due to ODBC expectations), each submission is implicitly wrapped in a begin/end transaction.

You should be aware that that this can cause problems for data definition statements such as CREATE TABLE or ALTER TABLE, since they may not be allowed within a transaction at all.


ERROR HANDLING

Error handling with the DBI is like error handling with most other things in Perl: always check your return codes. The DBI can make it easier with one of the attributes you can set when you first create your database handle:

 my $dbh = DBI->connect( ..., { RaiseError => 1 } );

The RaiseError attribute throws a die every time there is an error with your statement and sets $@ to the error returned by the database. You can then wrap your DBI calls in an eval{} and trap your errors:

Example 11: Obvious error

   1: my $sql = qq/
   2:   SELECT FROM TableName
   3:   WHERE Field = 'value'
   4: /;
   5: my ( $sth );
   6: eval {
   7:   $sth = $dbh->prepare( $sql );
   8:   $sth->execute;
   9: };
  10: if ( $@ ) {
  11:   print "Error in the database: $@"
  12:   exit;
  13: }

After the eval{} on Line 10, $@ probably has an error like: ``Parse error near 'FROM', no fieldnames for SELECT.''

Alternatively, you can set the PrintError attribute which, you guessed it, prints your error (using the warn call) instead of throwing a die with it.

The latest error string from the database can be found in: $dbh->errstr, and if you have a problem connecting to the database, you can retireve the error from $DBI::errstr. You can also retrieve errors from each statement handle: $sth->errstr.


OTHER DBI-RELATED MODULES

A number of modules interoperate with the DBI.


Apache::DBI

One problem with web-based DBI applications goes hand-in-hand with difficulties due to plain vanilla CGI: startup cost. As you're likely aware, CGI scripting does not scale because every request must start a process, run the process then complete it. Starting the perl interpreter thousands of times or more gets very inefficient.

Similarly, opening and closing database connections thousands of times is inefficient. Lightweight databases like MySQL don't bear much of a cost, but industrial strength databases like Oracle bear a significant penalty in terms of time and resources when creating database connections. Philip Greenspun, web developer and author (among other things), spoke of this phenomenon in a section of his book Philip and Alex's Guide to Web Publishing when he noted as one of the reasons he doesn't like CGI scripts: ``My RDBMS Does Not Like to Be Opened and Closed 500,000 Times a Day'' (see LINKS for more about the book).

A great solution to the CGI problem is mod_perl, which embeds a perl interpreter in the Apache web server process. This allows you not only full access to the Apache API, but also to compile scripts once per child process and have them immediately accessible for subsequent requests.

Other web application development environments, such as Active Server Pages (ASP) or Cold Fusion, allow the server to cache database connections. Using the Apache::DBI module, you can do the same with mod_perl.

There are several components to Apache::DBI, but the base functionality allows you to cache database connections: one per connection profile per child process. The great thing is that it comes at virtually no cost: Apache::DBI overrides both the connect() and disconnect() methods of DBI, so you don't even have to change existing scripts.

What happens is this: Apache::DBI sees the connect() call from DBI and looks at the connection profile (which is composed of the data source name, username, password and hash reference of attributes). If a call to this connection profile has not been made in this child process, Apache::DBI makes the connection and stores it. Subsequent calls with the same connection profile are then handed out the stored connection. Easy!

All you need to do is ensure you bring in Apache::DBI early -- it should be one of the first calls to PerlModule you make.

Apache::DBI also comes with Apache::AuthenDBI and Apache::AuthzDBI which allow you to authenticate and authorize web users from a DBI database. Just set some variables and you're on your way:

Example 12: Apache::DBI authentication and authorization

   1: <Location /yapc>
   2:  AuthName "YAPC Authentication Demo"
   3:  AuthType Basic
   4:  PerlAuthenHandler                Apache::AuthenDBI
   5:  PerlSetVar Auth_DBI_data_source  'dbi:Sybase:server=yapc;database=web'
   6:  PerlSetVar Auth_DBI_username     'webuser'
   7:  PerlSetVar Auth_DBI_password     'webpass'
   8:  PerlSetVar Auth_DBI_pwd_table    'SysUser'
   9:  PerlSetVar Auth_DBI_uid_field    'email'
  10:  PerlSetVar Auth_DBI_pwd_field    'password'
  11:  PerlSetVar Auth_DBI_encrypted    'off'
  12:  require valid-user
  13:  AllowOverride None
  14: </Location>

One further example deals with logging. This is a slight adaptation of a program that logs information from all the virtual hosts running on our webservers. The code was swiped from two writings by Lincoln Stein -- one from a Web Techniques article, another from items in the book he and Doug MacEachern wrote on mod_perl.

First, you need to setup some simple configuration:

Example 13a: Apache configuration settings for logging web requests to a database

Put this in the httpd.conf file, or wherever you define the VirtualHost settings:

   1: PerlSetVar INTES_VHOST    'www.cpahelp.com'
   2: PerlLogHandler            Apache::INTES_LogDBI

Easy! Line 1 just sets a variable that travels along with this request -- this seemed easier to me than retrieving the information from the Apache request object. (Translation: I am lazy.) Line 2 tells Apache that we want the subroutine handler in the module Apache::INTES_LogDBI to deal with any logging requests.

That module looks like this:

Example 13b: Module for logging web requests to a database

   1: package Apache::INTES_LogDBI;
   2: use Apache::Constants ':common';
   3:  use strict;
   4: use vars qw/ 
   5:   $dbh $sth
   6: /;
   7: use DBI;
   8: use POSIX 'strftime';
   9: 
  10: my $DSN       = 'DBI:mysql:Logs';
  11: my $db_user   = 'db_user';
  12: my $db_passwd = 'db_passwd';
  13: my $log_table = 'Weblogs';
  14: $dbh = DBI->connect( $DSN, $db_user, $db_passwd ) 
  15:               || die "Error connecting: $!";
  16: my $sql = qq/
  17:   INSERT INTO $log_table
  18:   ( when, host, method, url, auth,
  19:     browser, referer, status, bytes, vhost )
  20:   VALUES 
  21:   ( ?,?,?,?,?,
  22:     ?,?,?,?,? )
  23: /;
  24: $sth = $dbh->prepare( $sql ) || die $dbh->errstr;
  25: 
  26: sub handler {
  27:  my $r = shift;
  28:  my $url     = $r->uri;
  29:  return DECLINED if ( $url =~ /(gif|jpg|jpeg|tiff|png)$/i );
  30:  my $vhost   = $r->dir_config( 'INTES_VHOST' );
  31:  my $date    = strftime( '%Y-%m-%d %H:%M:%S', localtime );
  32:  my $host    = $r->get_remote_host;
  33:  my $method  = $r->method;
  34:  my $user    = $r->connection->user;
  35:  my $referer = $r->header_in( 'Referer' );
  36:  my $browser = $r->header_in( 'User-agent' );
  37:  my $status  = $r->status;
  38:  my $bytes   = $r->bytes_sent;
  39:  $sth->execute( $date, $host, $method, $url, $user,
  40:           $browser, $referer, $status, $bytes, $vhost );
  41:  return OK;
  42: }
  43: 1;

This is a very simple application of a mod_perl module, but it gives a good idea of how you can use the DBI and mod_perl together.


Tie::DBI

The Tie::DBI module, by Lincoln Stein, allows you to tie a hash to a DBI database table: changes made to the hash get made automatically to the table. Very nifty. It has quite a bit of overhead, however.


DBIx::Recordset

Similar to Tie::DBI, except it can do joins and a number of other very useful actions in an object-oriented fashion.


Win32::ODBC

Win32::ODBC is a module for Win32 systems only that allows more access to native ODBC functions than DBD::ODBC. However, it's generally a one-way street -- applications written for the DBI aren't directly compatible with Win32::ODBC and vice versa. There are compatibility wrappers written to move betwen the two, but I am not familiar with how effective they are.


Others

The Apache::Session module allows you to create persistent data among web sessions. One method to store the data is the DBI.


LINKS

Cribbed from various sources, including the DBI documentation and an intro list Monty Dickerson <Monty.Dickerson@econophone.com> put together.


DBI materials

A number of hints have been dropped by Tim Bunce about a DBI book being published by O'Reilly. The grapevine says early Fall 1999 for a publishing date. Time will tell..

FAQ

 http://www.hermetica.com/technologia/perl/DBI/doc/faq.html

DBI Frequently Asked Questions; by Alligator Descartes <descarte@arcana.co.uk> This document serves to answer the most frequently asked questions on both the DBI Mailing Lists and personally to members of the DBI development team.

Article from The Perl journal about the DBI

 http://www.arcana.co.uk/technologia/perl/DBI/doc/tpj5/
DBI Mailing Lists

Information (subscribing and unsubscribing):

 http://www.isc.org/dbi-lists.html 

Archives:

 http://www.bitmechanic.com/mail-archives/dbi-users/ (US)
 http://outside.organic.com/mail-archives/dbi-users/ (US)
 http://www.xray.mpe.mpg.de/mailing-lists/dbi/ (Europe)
DBI Examples

 http://eskimo.tamu.edu/~jbaker/dbi-examples.html

The purpose of this document is to provide some examples for using the Perl DBI modules to access databases. The motivation for this is to relieve some of the FAQ traffic on the DBI mailing list. Over the course of this document, we will build a robust DBI program to access an Oracle database. We will start with DBI basics, then introduce concepts to improve performance and reliability.

Hotwired tutorial

 http://www.hotwired.com/webmonkey/backend/tutorials/tutorial1.html

Web Database Crash Course; by Richard Dice <rdice@pobox.com> This five-part crash course will take you, step by step, from general information about your database options to the nitty-gritty of setting up an online database.

Perl-Win32 Database FAQ

 http://www.geocities.com/SiliconValley/Way/6278/perl-win32-database.html

This is a result of seeing too many questions about the same subject in mailing list Perl-Win32-Database.


SQL and RDBMS

Structured Query Language and Relational Database Management Systems (RDBMS) references:

Philip and Alex's Guide to Web Publishing

 http://www.photo.net/wtr/thebook/

(see the CGI warnings at /wtr/thebook/server-programming.html)

Anyone developing web applications should read this book. It's even online -- you'll also get to see some fantastic photos along the way.

SQL Tutorial

 http://w3.one.net/~jhoffman/sqltut.htm
Advantages of a DBMS

 http://www.compapp.dcu.ie/databases/f001.html
Ask the SQL Pro

 http://www.inquiry.com/techtips/thesqlpro/
SQL field types

 ftp://jerry.ece.umassd.edu/isowg3/dbl/SQL_Registry
mSQL PC Tutorials

 http://blnet.com/msqlpc/tutorials.htm

This page contains documentation for newbies on relational and client/server databases in general.


Database vendors

These are the vendors mentioned in this document:

MySQL

 http://www.mysql.com/ or 
 http://www.tcx.se/
Sybase

 http://www.sybase.com/
Oracle

 http://www.oracle.com/
MiniSQL

 http://www.hughes.com.au/
xBase file specifications

 http://www.e-bachmann.dk/docs/xbase.htm


Miscellanous

Writing Apache Modules with Perl and C

by Lincoln Stein and Doug MacEachern. The mod_perl book. Learn it, love it, live it if you're working with Apache and perl. See:

 http://www.oreilly.com/catalog/wrapmod/


AUTHOR

This document was created by Chris Winters. All mistakes and omissions are his own; all worthwhile knowledge was gleaned from the community at-large. Special thanks go to Tim Bunce for the DBI module and unfailing patience; Michael Peppler for the DBD::Sybase module and his uncanny ability to monitor every Sybase mailing list/newsgroup in the known universe; Jochen Wiedmann for the DBD::mysql module; Edmund Mergl for Apache::DBI; everyone else who has contributed to the DBI; and the DC Perlmongers group for hearing a very, very early version of this presentation.

Please e-mail comments, suggestions, corrections to <chris@cwinters.com> The most current version of this document (maybe even with slides!) is at:

 http://www.cwinters.com/tech/

A possibly updated version of this will be on the yapc website.


REVISIONS