Perl and Databases

an introduction to DBI

NAME
SYNOPSIS
INTRODUCTION
- What is it?
- What does it look like?
INSTALLING
CONNECTING AND THE DATABASE HANDLE
- Multiple database handles
NON-SELECT STATEMENTS
- Quoting
- Auto-increment fields
SELECTING
TRANSACTIONS
ERROR HANDLING
OTHER DBI-RELATED MODULES
LINKS
AUTHOR
REVISIONS

NAME

Perl and Databases -- an introduction to the DBI

SYNOPSIS

This presentation is a brief introduction to the Database Interface (DBI), perl's method for accessing databases. In this presentation you will learn:

how the DBI works,
its relationships to the individual drivers,
everyday usage of the DBI
where to strive for database-independence
one way to use the DBI in a modular fashion
how to use mod_perl and Apache::DBI to cache database handles for web applications

This is not a syntax guide; see the excellent documentation that comes with the DBI for that. This is also probably old news for most experienced DBAs, although learning how to use perl to access databases may be new.

INTRODUCTION

The DBI has been in development for about five years. Earlier incarnations were known by their database-specific names: oraperl, sybperl, and so forth. These are primarily Perl-4 modules: to use oraperl, you would have to statically link the Oracle libraries and oraperl routines with the perl binary.

The primary developer behind the DBI is Tim Bunce. I won't give his e-mail address out here since he's constantly deluged with requests of varying legitimacy. Installing the DBI will give you contact information for not only him but also the rest of the DBI world via the dbi-users and dbi-dev mailing lists. (Contact information for these lists and other DBI resources are given in the LINKS section at the end of this document.)

What is it?

The DBI is an abstraction layer allowing you to access all types of datastores using a consistent and object-oriented interface. The DBI is designed to be as thin a layer as possible to ensure database operations aren't slowed by the accessor method. The DBI documentation has benchmarks for Oracle calls; the speed of accessing your database may vary.

The DBI does directly not do much by itself. It requires a Database Driver (DBD) to access a particular database. These drivers are generally labeled: DBD::<driver_name>, such as DBD::Sybase, DBD::Oracle, DBD::Informix, and so on. These drivers are generally compiled with the database client libraries where necessary and accessed as a sharable object from perl. (And that's the extent of my knowledge there.)

Folks from the Windows world may find a comforting parallel with ODBC: the DBI is analagous to the ODBC Administrator (or driver manager); each DBD corresponds to an ODBC driver. Confusing the issue is the fact that there exists a DBD::ODBC driver which allows you to use ODBC drivers in place of the native database client drivers.

For a more thorough description of the DBI architecture (with graphics!), check out the introductory article by Tim Bunce and Alligator Descartes in Perl Journal #5 (link found at the end of this document).

What does it look like?

A quick example with lots of concepts:

Example 1: Quick Overall Example

   1: my $dbh = DBI->connect( 'DBI:Sybase:mydb', 'cwinters', 'mypass',
   2:                    { RaiseError => 1 } )
   3:           || die "Cannot connect to database: $DBI::errstr";
   4: my $sql = qq(
   5:   SELECT fname, lname, login_date
   6:     FROM users
   7:    WHERE login_date > '1999-06-25'
   8: );
   9: my ( $sth );
  10: eval {
  11:   $sth = $dbh->prepare( $sql );
  12:   $sth->execute;
  13: };
  14: die "Cannot prepare/execute $sql\nError: $@" if ( $@ );
  15: 
  16: my ( $fname, $lname, $login );
  17: $sth->bind_columns( undef, \$fname, \$lname, \$login );
  18: print "Database Login Information\n",
  19:       "==========================\n";
  20: while ( $sth->fetch ) {
  21:   print "Last login by $lname, $fname: $login\n";
  22: }
  23: $sth->finish;
  24: $dbh->disconnect;

Let's go through this really quick:

Lines 1-3: Connect to a database. This is the typical connection method for DBI. We'll go over the specifics shortly, but every connect call needs a data source name (DSN); you may also specify a username and password for the second and third arguments. The fourth argument is a hashref of options for the database handle itself. The connect call returns undef if it fails so we die with an error if it's undefined.
Lines 4-8: Define a generic SQL call. SQL (Structured query language) is a near-universal language to access modern relational databases. This tutorial will not teach you SQL: see the LINKS section in this document for more information on it.
Note also that I have my own method for formatting SQL. I tend to do this so the string comes out looking decent if it's placed in an error message. Since SQL is not whitespace-defined, you can format it however you wish.
Lines 9-14: Create a statement handle. A statement handle is what you use to actually get information from the database. A database handle can have multiple statement handles open at a time, but some drivers handle it different than others.
To retrieve information from the database (via a SELECT SQL statement or a stored procedure or otherwise), you need to first prepare the SQL statement, assigning the result to a statement handle. You then need to execute the statement handle. The reasons for this separating will become apparent soon.
Note that we also prepare and execute within an eval{} block. This method of eval{} only traps errors, and since we created the database handle with the RaiseError attribute set to a positive value, any error in either the prepare or execute statements will result in a die being thrown.
Lines 15-17: Bind the output. The DBI allows you to bind the output of a SELECT statement (or a stored procedure) to perl scalars. This is a very efficient way to retrieve data and also makes programs more readable.
Lines 18-19: Print out our header.
Lines 20-22: Fetch the data.
Lines 23-24: Close the handles. Calling finish() on your statement handle is not mandatory but ensures that you don't step on any of the handle attributes if you reuse it with a subsequent statement. However, if you don't call disconnect() on your database handle before it goes out of scope and you're using the strict pragma (you're using strict, right?) you'll get a warning: ``Database handle destroyed without explicit disconnect.''

Now that we've had a brief overview, let's go through everything we need to know to make a database application.

INSTALLING

We'll keep this brief. The DBI, like most perl modules, is fairly simple to install:

 tar -zxvf DBI-1.06.tar.gz
 cd DBI-1.06/
 perl Makefile.PL
 make
 make test
 make install

You can also use the CPAN shell, or if you're using the ActiveState port of Win32 perl, the PPM installation utility.

Installing the individual DBD modules are more difficult. Generally, you need to:

Read the README! Like all perl modules, the authors spend a good deal of time writing the module and documentation. They've already gone through a number of the gotchas themselves. Save yourself some time and read the available documentation.
Know where your libraries are. Did you install your relational database? If not, track down your local DBA and find out where the client libraries are located. Some modules attempt to make a best guess, some modules require that you set environment variables or set a parameter when building the module.
Check the archives. The dbi-users mailing list has one or two sites where the mailing list is archived. Unless you're using a truly unusual combination of database and operating system, someone else has likely had your problem.

Note that the module installers I'm aware of, CPAN and PPM, have both worked well for me when installing new modules, even on Win32 systems.

If you've got everything lined up, building the module is much the same set of steps as building the DBI. Some modules require that you have a database defined for testing purposes so that it can manipulate tables and records without stomping on other data. Most packages accept 'test' as the database name by default, although you can normally also define your own.

CONNECTING AND THE DATABASE HANDLE

Connecting to the database is the first step you'll do. All actions going to the database are done through a database handle.

Create a database handle like this:

 my $dbh = DBI->connect( 'DBI:mysql:test',
                         'username', 'password',
                         { RaiseError => 1 } );
where:

  DBI:mysql:test - Data Source Name (DSN)
  DBI            - DBI (necessary first part of the DSN)
  mysql          - Name of DBD
  test           - Name of database we are connecting to.
  username       - User we are connecting to this database as
  password       - Password for the user

You should pass a username and password even for systems, such as DBD::XBase or DBD::CSV, that do not use them at all. Just pass an empty string for both fields. The most recent version of DBI (1.09) tripped a few people up since the behavior changed slightly. (See the Changes file that comes with DBI 1.09.)

The final argument to the connect call is a hashref of options. An example is:

 { RaiseError => 1, AutoCommit => 0 }

This argument is used as a shorthand for setting parameters of the database handle. You can if you prefer call:

 $dbh->{RaiseError} = 1;
 $dbh->{AutoCommit} = 1;

As the DBI documentation states, connect() returns undef on failure. See ERROR HANDLING for more information on how to deal with this.

The DBI does not really care what statements you send through this database handle. You can think of the DBI as a pass-through module: it defines an interface for sending requests to and receiving data from a database, but it does not really care what the statements or data are.

So if you're having problems with your SQL, the DBI is not the place to look. As long as the data is getting to the database and you're getting messages back from it, the DBI is doing its job.

Now that you have a database handle, what can you do with it? You can use it directly, as we discuss below. But, like all objects in Perl, the $dbh you've now defined is very versatile. It's really just a reference. my @userlist = get_all_users( $dbh, $table_name ); ... sub get_all_users { my $dbh = shift; my $table_name = shift; ... }

You can also embed it in other objects:

Example 2: Database handle passing

   1: my $user = MyApp::User->new( dbh => $dbh, 
   2:                         username => $username );
   3: my $full_name = $user->full_name();
   4: 
   5: package MyApp::User;
   6: 
   7: sub new {
   8:  my $pkg = shift;
   9:  my $class = ref( $pkg ) || $pkg;
  10:  my %params = @_;
  11:  my $self = {};
  12:  $self->{dbh} = $params{dbh};
  13:  $self->{username} = $params{username};
  14:  return bless( $self, $class );
  15: }
  16: 
  17: sub full_name {
  18:  my $self = shift;
  19:  if ( ! $self->{full_name} ) {
  20:     my $sql = qq/
  21:       SELECT FirstName, LastName
  22:         FROM Users
  23:        WHERE Username = $self->{usename}
  24:     /;
  25:     eval {
  26:       $sth = $self->{dbh}->prepare( $sql );       
  27:       $sth->execute();
  28:     };
  29:     die "Error: $@ with SQL\n$sql" if ( $@ );
  30:     my $row = $sth->fetchrow_arrayref;
  31:     $self->{full_name} = join ' ', @{ $row };
  32:  }
  33:  return $self->{full_name};
  34: }

Note that you do not need to dereference your handle when you pass it to a subroutine or use it within an object. A relatively common sight on the dbi-users mailing list is this:

Example 3: Unnecessary handle dereferencing

   1: my $fetched_rows = fetch_rows( $dbh, 'SELECT * FROM TABLE' );
   2: 
   3: sub fetch_rows {
   4:  my $sql = shift;
   5:  my $dbh = shift;
   6:  my ( $sth );
   7:  eval {
   8:    $sth = $$dbh->prepare( $sql );
   9:    $sth->execute();
  10:  };

The deferencing on line 8 is entirely unnecessary, and it's wrong. Running it will get you an error like:

  Not a SCALAR reference at myprog.pl line 6.

Multiple database handles

Nothing in the DBI stops you from opening as many database handles as you like -- you may be limited by your database's licensing restrictions, but that's a separate issue. Multiple handles are typically used when moving data from one database to another.

Since the DBI has so many drivers available, it's fairly trivial to write a program that will move data from one database to another, particularly when one database is a flatfile (XBase, CSV) and the other is a relational system. In this case, you'll probably open up one handle to the 'from' database and one handle to the 'to' database.

To move data within a database, from one table to another, you can typically use one database handle but two statement handles. Some databases cannot support this: DBD::Sybase silently opens up another database handle for you when you try to use multiple statement handles concurrently from a single database handle.

NON-SELECT STATEMENTS

Non-select data manipulation and definition statements are fairly simple to execute within the DBI. All are executed with the DBI do method. As the documentation states, do returns the number of rows affected by the statement, -1 if not known, and undef on an error.

The ``number of rows affected by the statement'' is not normally a controversial number: an DELETE statement will return the number of rows deleted; an UPDATE statement will return the number of rows matched by the statement and updated. However, if you rely on this figure for consistency checking you may want to ensure that it returns what you think it should return. A small brouhaha recently erupted on the dbi-users mailing list when it was determined that MySQL returns not the number of records matched by an UPDATE statement but rather the number of records actually updated after the optimizer removes the records from consideration that already match the update values.

Example 4: Creating a MySQL table

   1: print "Creating table...";
   2: my $sql = qq/
   3:   CREATE TABLE my_widgets (
   4:      my_id int not null auto_increment,
   5:      widget_ref varchar(20) null,
   6:      widget_name varchar(100) null,
   7:      widget_factory varchar(15),
   8:      primary key ( my_id )
   9:   )
  10: /;
  11: my ( $rv );
  12: eval {
  13:   $rv = $dbh->do( $sql );
  14: };
  15: if ( ! $rv or $@ ) {
  16:   my $err_msg = $@ || $dbh->errstr;
  17:   print "error.\nMessage: $err_msg\nSQL: $sql";
  18: }
  19: else {
  20:   print "success.\n";
  21: }

Even though we use a MySQLism at line 4 (auto_increment), the DBI doesn't care. It just passes the statement along to the driver. (More about auto incrementing fields later.)

Quoting

One of the problems every user first has when writing even simple database applications is quoting. A basic SQL statement looks like this:

 INSERT INTO my_widgets
 ( widget_ref, widget_name, widget_factory )
 VALUES
 ( '09Fx67812', 'Thumbscrew thingy', '7691823-FC' )

(Some databases allow you to use double quotes instead of single quotes to demarcate string values.)

But what happens when one of your string values already has a quote in it?

 INSERT INTO my_widgets
 ( widget_ref, widget_name, widget_factory )
 VALUES
 ( '09Fx67812', 'Toilet for Army's hospital', '7691823-FC' )

Uh oh. You're likely to get an error like:

 DBD::mysql::db do failed: parse error near 's hospital', '7691823-FC' ) 
 ' at line 5 at test_stmt.pl line 37.

or:

 DBD::Sybase::db do failed: Server message number=102 severity=15  
 state=1 line=6 text=Incorrect syntax near 's'.
 Server message number=105 severity=15 state=1 line=6 text=Unclosed  
 quote before the character string ' ) 
 '.

(The syntax of these error messages depends entirely on the database; error messages are discussed further in ERROR HANDLING below.)

So, how do we get around this? The DBI has several ways, all preferable to the homegrown solution. A typical homegrown solution looks like this:

 $value = q(Toilet for Army's hospital); 
 ( $quoted_value = $value ) =~ s/'/''/g;
 $quoted_value =~ s/^(.*)$/'$1'/;
 $sql = qq/
   INSERT INTO my_widgets 
   ( widget_name ) VALUES ( $quoted_value )
 /;

This is, of course, horribly messy. Most people advance this solution to create a subroutine that does all the quoting for them. But it's still database-dependent: MiniSQL uses a slash (\) as an escape character while many other databases (MySQL and Sybase included) use a single quote ('). When you change your database, you have to change your quoting subroutine.

The DBI has a much better way. There are two primary methods to quote values. First, you can quote the values on an individual basis:

 $quoted_value = $dbh->quote( $value );

Easy and simple. The string $quoted value is now ready to be inserted into the database your database handle is connected to:

 $value = q(Toilet for Army's hospital);
 $quoted_value = $dbh->quote( $value );
 $sql = qq/
   INSERT INTO my_widgets 
   ( widget_name ) VALUES ( $quoted_value )
 /;
 $rv = $dbh->do( $sql );

As a corrollary, you can skip the temporary variable and do something like:

 $value = q(Toilet for Army's hospital);
 $sql = qq/
   INSERT INTO my_widgets 
   ( widget_name ) VALUES ( @{[ $dbh->quote( $value ) ]} )
 /;
 $rv = $dbh->do( $sql );

You can also use a separate method known as binding values. Binding values is very elegant:

 my $value = q(Toilet for Army's hospital);
 $sql = qq/
   INSERT INTO my_widgets
   ( widget_name ) VALUES ( ? )
 /; 
 $rv = $dbh->do( $sql, undef, $value );

That question mark is called a placeholder and puts up a flag for the DBI when the statement is actually executed. You can use a placeholder for any number of values but not complex expressions. When the statement is executed, the DBD reads the values from an ordered list you pass it and plugs the values into the statement.

When the driver plugs these values in, it also quotes them free of charge. So many DBI applications will never have a call to the quote() method of the database handle, preferring instead to use bound values.

Another benefit of placeholders is performance. On more robust database systems, preparing a statement once and executing it many times with values afterward is several times faster than preparing and executing (or using the do() method) each individual statement. For instance, assuming a colon-separated input file:

Example 5: Preparing once, executing many

   1: my $sql = qq/
   2:   INESRT INTO SysUsers
   3:   ( username, first_name, last_name, email, phone )
   4:   VALUES 
   5:   ( ?, ?, ?, ?, ? )
   6: /;
   7: my ( $sth );
   8: eval {
   9:   $sth = $dbh->prepare( $sql );
  10: };
  11: die "Cannot prepare statement! Error: $@" if ( $@ );
  12: open( USERS, $user_data ) || die "Cannot open user data! Error: $!";
  13: while( <USERS> ) {
  14:   chomp;
  15:   my ( $user, $fn, $ln, $email, $phone ) = split /:/;
  16:   eval {
  17:     $sth->execute( $user, $fn, $ln, $email, $phone );
  18:   };
  19:   die "Cannot execute with $user|$fn|$ln|$email|$phone ",
  20:       "with error $@"                          if ( $@ );
  21: }

Lines 1-6: Create the SQL statement.
Lines 7-11: Declare the statement handle, prepare it and check for errors. If the prepare statement throws an error, we will catch the error in the eval{} block. Note that we do not need to immediately run the execute method.
Lines 12-15: Open up the file with the user information, start looping through the file line-by-line, lop off the linefeed and separate the line into individual fields.
Lines 16-21: Run the execute() method on the prepared statement handle with the data read in from the file. Capture any errors in the eval{} block and die with relevant information if necessary.

The DBI documentation has more information about placeholders and using them to bind values. We'll also deal with this further below with SELECT statements.

Auto-increment fields

Auto-increment fields are present in many databases. Such a field ensures that each row gets a unique number by keeping an internal counter and incrementing it with every new row inserted. Other databases (such as Oracle and MiniSQL) accomplish this sort of thing through a sequence object.

This is very useful to keep your rows unique and saves some administrative headache, but once the value is inserted, you frequently need to know the value that was just used. For instance, when doing a multi-table insert joined by a common value.

The DBI does not currently define an interface for retrieving the value of the auto-increment field, so what follows is database-dependent.

MySQL uses the 'AUTO_INCREMENT' tag to identify a field in a table that will be incremented with each INSERT. (There can be only one per table.) The DBD::mysql module makes retrieving the value for the auto-increment field very simple:

Example 6: Retrieving auto-increment value from MySQL

   1: my $sql = qq/
   2:   INSERT INTO my_widgets
   3:   ( widget_ref, widget_name, widget_factory )
   4:   VALUES ( ?, ?, ? )
   5:   /;
   6: my ( $sth ); 
   7: eval {
   8:   $sth = $dbh->prepare( $sql );
   9:   $sth->execute( '568172-XB', 'Specialized hammer', '912P918-G1' );
  10: };
  11: die "Cannot prepare/execute!\nSQL: $sql\nError: $@" if ( $@ );
  12: my $inserted_id = $sth->{insertid};

Sybase and Microsoft SQL Server both use Transact-SQL as their flavor of SQL. Defining a field as auto-incrementing is typically done like this:

  CREATE TABLE my_widgets (
    my_id numeric(14,0) identity not null,
  ...

You can set seed and step values as desired; see your favorite Transact-SQL book for more. Retrieving the value for a Transact-SQL identity column is done like this:

Example 7: Retrieving auto-increment value from Sybase/MSSQL

  1 - 11: same as Example 6
  12: $sth = $dbh->prepare( 'SELECT @@IDENTITY' );
  13: $sth->execute;
  14: my $row = $sth->fetchrow_arrayref;
  15: my $inserted_id = $row->[0];

(We'll get to how the SELECT statement and the statement handle work shortly.)

When writing an application, it's typically a good idea to write a generic fetch_insert_id() method that you can use. Then that method can be aliased to a database-specific method for actually retrieving the value as needed.

As for the sequence object, DBI and DBD::Oracle author Tim Bunce recently posted the following on the dbi-users mailing list. It's presented here without modification or testing and for your information only. YMMV.

Example 8: Retrieving sequence value from Oracle

   1: $sth = $dbh->prepare(q{
   2:           INSERT INTO foo (Id) VALUES (seqn.NEXTVAL) RETURNING Id INTO :PId;
   3: });
   4: $sth->bind_param_inout(":PId", \$PId, 100);
   5: $sth->execute;

SELECTING

Selecting (or retrieving) data from a database is one of the most common operations done with the DBI. This will get you on your way.

The statement handle: how to fetch data

SELECT statements are different from the statements discussed above. Rather than returning a value indicating the success of the statement, you can receive any number of datasets, or rows. Retrieving this information requires something more complex than a scalar return value.

This something is called a statement handle. Create a statement handle like this:

 my $sth = $dbh->prepare( $sql );

To finish the process so you can fetch data from the database, call the execute() method on it:

 $sth->execute;

You can now call any of the methods that actually fetch the data from the database. In turn, these are:

```
 @row_ary  = $sth->fetchrow_array;
```
Fetch the values from the next row into a list; an empty list is returned if there are no further rows to fetch. The fields will be in the same order as specified in the SELECT statement. (You're not using * to select all the fields, are you?) You can also use the metadata methods to retrieve the field information.
```
 $ary_ref  = $sth->fetchrow_arrayref;
```
```
 (Alias: $sth->fetch)
```
Same basic primciple as above, except a list reference is returned instead of a list. Returns undef when no more rows can be fetched. Note that you cannot store the reference in a separate data structure for use later because the same reference is used. So the following will not work:
```
 while ( my $row = $sth->fetchrow_arrayref ) {
   push @ordered_rows, $row;
 } 
```
but this will:
```
 while ( my $row = $sth->fetchrow_arrayref ) {
    my @items = @{ $row };
    push @ordered_rows, \@items;
 }
```
Instead of doing either of these, however, you'd probably use the next method...
```
 $ary_ref  = $sth->fetchall_arrayref;
```
Fetch all the rows returned by the SELECT statement into an array reference of array references.
```
 $hash_ref = $sth->fetchrow_hashref;     
```
Fetch the next row into a hash reference. The keys of the reference are the same values returned by the {NAME} call to the statement handle.
This method is not entirely portable due to the fact that databases may return the field names in different ways than you expect. First, it may return them in all upper-case or all lower-case. Second, you need to ensure that like named fields in your SELECT statement are returned properly. For instance, the statement:
```
 SELECT my_widgets.name, my_factories.name
   FROM my_widgets, my_factories
  WHERE my_factories.name LIKE 'Fresno%'
    AND my_widgets.factory_id = my_factories.factory_id
```
would be problematic due to the fields named 'name'. Since the fetchrow_hashref() method does not use table names along with the field name (most databases do not return it), you'd find a hashref with only one key name. (This problem could also be solved by better database design, but sometimes you don't have a choice.)
Most databases will also allow you to modify your statement like this:
```
 SELECT my_widgets.name as widget_name, 
        my_factories.name as factory_name
   FROM my_widgets, my_factories
  WHERE my_factories.name LIKE 'Fresno%'
    AND my_widgets.factory_id = my_factories.factory_id
```
which will return different names for the fields from the database, eliminating the problem.

Of these methods, fetchrow_arrayref is the fastest. And the DBI documentation warns that fetchrow_hashref shouldn't be used in situations where performance is important. =head2 Why prepare and execute?

Except for the special methods mentioned below, two separate methods, prepare and execute are used to create a statement handle for fetching data from a database. Why? One big reason is placeholders.

As mentioned above, placeholders allow you to easily quote data. They also allow you

Binding results

Combined with the bind columns method, you'll likely often see the following idiom in DBI applications:

Example 9: Binding values and columns

   1: my $sql = qq/
   2:   SELECT field1, field2, field3
   3:     FROM table
   4:    WHERE field2 = ?
   5: /;
   6: my ( $sth );
   7: eval {
   8:   $sth = $dbh->prepare( $sql );
   9:   $sth->execute( $value2 );
  10: };
  11: die "Cannot select data with $sql ",
  12:     "and values $value2\nError: $@"   if ( $@ );
  13: my ( $field1, $field2, $field3 );
  14: $sth->bind_columns( undef, \$field1, \$field2, \$field3 );
  15: while ( $sth->fetch ) {
  16:   print "Field1: $field1\nField2: $field2\nField3: $field3";
  17: }
  18: $sth->finish;

Lines 1-12: Been there, done that, got the postcard.
Line 13: Declare your variables before binding them. (You are using use strict, aren't you?)
Line 14: Bind the column variables to the fields you've selected. Each time you fetch a row, the value for each column will be bound to its variable. According to the DBI documentation, ``The binding is performed at a very low level using perl aliasing so there is no extra copying taking place.'' I'm not exactly sure what this means, but it sounds good.
A syntax item: you can put your values in a list and use a reference to the list:
```
 $sth->bind_columns( undef, \( $field1, $field2, $field3 ) );
```
I would do this more, but CPerl mode in XEmacs doesn't like it too much...
Line 15: Control for loop that fetches rows. Since we've bound the columns, we don't need to assign a variable to the fetch call.
Line 16-17: Show a simple use of the variable that's bound and close the loop.
Line 18: Call the finish() method of the statement handle to ensure the database frees up any resources it can once the statement is done. As the documentation states, this is generally not necessary, particularly when you've selected all the data from the handle. If you stop fetching data while there is data left to fetch, it's probably a good idea to use it. Just in case.

Multiple statement handles

It often happens, particularly when you're moving or copying data within a table or from one table to another, that you've got more than one statement handle open at a time. The DBI cautions that multiple statement handles from one database handle are not supported across all drivers. One driver, DBD::Sybase, silently fakes this behavior by opening a new database handle when you try to open a new statement handle on a database handle that already has an active statement handle.

Here's an example using multiple statement handles. This is taken from part of a recent project of mine. Two tables ($SYSTEM_TABLE and $COUNTY_TABLE) exist with transit system data and the counties those transit systems serve, respectively. The two tables have a common field, system_id, that we've already ensured correspond to one another. We're taking the data from these two tables from one database, cleaning the data (removing blank entires) and creating records in an entirely separate database. Note that we don't see here where the records are created -- we just see subroutine names and values being passed in.

Example 10: Multiple statement handles

   1: print "Selecting systems and counties...";
   2: my $sqls = qq/
   3:   SELECT system_id, agency_name, phone, city, state, 
   4:          miles, fleet, expenses, trips
   5:     FROM $SYSTEM_TABLE
   6:    ORDER BY system_id
   7: /;
   8: my $sqlc = qq/
   9:   SELECT *
  10:     FROM $COUNTY_TABLE
  11:    ORDER BY system_id
  12: /;
  13: my ( $sts, $stc );
  14: eval {
  15:   $sts = $dbh->prepare( $sqls );
  16:   $sts->execute;
  17:   $stc = $dbh->prepare( $sqlc );
  18:   $stc->execute;
  19: };
  20: die "Having problem selecting system and county data: $@" if ( $@ );
  21: print "done\n";
  22:
  23: print "Analyzing and moving data...";
  24: while ( my $sys_row = $sts->fetchrow_hashref ) {
  25:   my $sys_tb_id = save_system( %{ $sys_row } );
  26:   my $area_data = $stc->fetchrow_arrayref;
  27:   my @area_list = @{ $area_data };
  28:   my ( $system_id_cty, $state ) = splice( @area_list, 0, 2 );
  29:   if ( $sys_row->{ system_id } != $system_id_cty ) {
  30:     die "County system_id and System list system_id don't match! ",
  31:         "($system_id/$system_id_cty)";
  32:   }
  33:   $state =~ s/\s*$//;
  34:   my $area_total = 1;
  35: ITEM:
  36:   foreach my $area ( @area_list ) {     
  37:     my $type = ( $area_total <= 20 ) ? 'county' : 'city';
  38:     next ITEM if ( ! $area ); # skip blanks
  39:     $area =~ s/\s*$//;  # remove trailing blanks
  40:     save_area( system_id => $sys_tb_id, state => $state, 
  41:                name => $area, type => $type );
  42:     $area_total++;
  43:   }
  44: }
  45: print "done\n";

Lines 1-12: Let the user know where we are in the process and define our SQL statements to retrieve the data from the two tables. The second statement ($sqlc) uses a * to retrieve all the fields because there are 42 separate fields in the table: the system_id, state, county1 - county20 and city1 - city20. It's generally a bad idea to use * to specify your fields, but there are always exceptions.
Lines 13-21: Define our statement handles and prepare/execute both of them within an eval{} block. If we have any problem doing this our exception is caught and we bail with a descriptive error. Finally, let the user know we've completed that task successfully.
Lines 23-24: Let the user know where we are and define the loop for fetching the system data. We use the fetchrow_hashref method here since we're simply passing the fieldname and value pairs off to the save_system() routine (not specified here). (Before I wrote this original script I knew that I would get the right information back from a fetchrow_hashref call.)
Line 25: Send the information just fetched out to a routine to save a record to a separate database. That routine returns the ID ($system_id) just inserted for use later.
Lines 26-27: Fetch the corresponding row from the table with system area information. Since we're relying on the fields being in a certain order, we use the fetchrow_arrayref method. (We also could have used the fetch alias, but I prefer as a matter of style to restrict this alias call as a loop controller rather than using it as an individual fetch.) Immediately assign the values of the arrayref to a separate list @area_list.
Line 28: Lop off the first two values of the @area_list and assign them to $system_id_cty and $state. The remainder of @area list is counties and cities.
Lines 29-32: If the system_id we just got from the area table isn't the same as the one we fetched from the system table, something is seriously wrong. Bail immediately.
Lines 33-34: General cleanup
Lines 35-43: Loop through every area (county or city), skip the blank ones, determine the type (items 1-20 are counties; the remainder are cities) and save the information to a separate database.

Again, be sure to check and see if your database supports multiple statement handles per database handle.

TRANSACTIONS

Most modern relational database systems are based around the idea of a transaction. A transaction is basically a set of related SQL statements that are commited to the database at one time. If any of the statements fail, the transaction is sent a rollback signal, which removes the changes any SQL statement in the transaction made.

The classic example for transactions is a bank: a customer wishes to transfer funds from one account to another. This requires two discrete statements: an addition to one account and a subtraction from the other.

What happens if the bank's power goes out after the first statement but before the second? In a database without transactions, you have one happy customer! But in a database with transactions, the database upon starting up will automatically rollback all uncommitted transactions, putting everything back the way it was before the power went out.

In Perl, the above would look something like this (in a very abstracted sense):

 if ( add_funds( $dbh, $account, $amount ) and
     remove_funds( $dbh, $account, $amount ) {
     $dbh->commit;
  }
  else {
     $dbh->rollback;
  }

The {AutoCommit} attribute of the database handle controls how your database will handle commits and rollbacks. If {AutoCommit} is set to 1 (which is the default due to ODBC expectations), each submission is implicitly wrapped in a begin/end transaction.

You should be aware that that this can cause problems for data definition statements such as CREATE TABLE or ALTER TABLE, since they may not be allowed within a transaction at all.

ERROR HANDLING

Error handling with the DBI is like error handling with most other things in Perl: always check your return codes. The DBI can make it easier with one of the attributes you can set when you first create your database handle:

 my $dbh = DBI->connect( ..., { RaiseError => 1 } );

The RaiseError attribute throws a die every time there is an error with your statement and sets $@ to the error returned by the database. You can then wrap your DBI calls in an eval{} and trap your errors:

Example 11: Obvious error

   1: my $sql = qq/
   2:   SELECT FROM TableName
   3:   WHERE Field = 'value'
   4: /;
   5: my ( $sth );
   6: eval {
   7:   $sth = $dbh->prepare( $sql );
   8:   $sth->execute;
   9: };
  10: if ( $@ ) {
  11:   print "Error in the database: $@"
  12:   exit;
  13: }

After the eval{} on Line 10, $@ probably has an error like: ``Parse error near 'FROM', no fieldnames for SELECT.''

Alternatively, you can set the PrintError attribute which, you guessed it, prints your error (using the warn call) instead of throwing a die with it.

The latest error string from the database can be found in: $dbh->errstr, and if you have a problem connecting to the database, you can retireve the error from $DBI::errstr. You can also retrieve errors from each statement handle: $sth->errstr.

OTHER DBI-RELATED MODULES

A number of modules interoperate with the DBI.

Apache::DBI

One problem with web-based DBI applications goes hand-in-hand with difficulties due to plain vanilla CGI: startup cost. As you're likely aware, CGI scripting does not scale because every request must start a process, run the process then complete it. Starting the perl interpreter thousands of times or more gets very inefficient.

Similarly, opening and closing database connections thousands of times is inefficient. Lightweight databases like MySQL don't bear much of a cost, but industrial strength databases like Oracle bear a significant penalty in terms of time and resources when creating database connections. Philip Greenspun, web developer and author (among other things), spoke of this phenomenon in a section of his book Philip and Alex's Guide to Web Publishing when he noted as one of the reasons he doesn't like CGI scripts: ``My RDBMS Does Not Like to Be Opened and Closed 500,000 Times a Day'' (see LINKS for more about the book).

A great solution to the CGI problem is mod_perl, which embeds a perl interpreter in the Apache web server process. This allows you not only full access to the Apache API, but also to compile scripts once per child process and have them immediately accessible for subsequent requests.

Other web application development environments, such as Active Server Pages (ASP) or Cold Fusion, allow the server to cache database connections. Using the Apache::DBI module, you can do the same with mod_perl.

There are several components to Apache::DBI, but the base functionality allows you to cache database connections: one per connection profile per child process. The great thing is that it comes at virtually no cost: Apache::DBI overrides both the connect() and disconnect() methods of DBI, so you don't even have to change existing scripts.

What happens is this: Apache::DBI sees the connect() call from DBI and looks at the connection profile (which is composed of the data source name, username, password and hash reference of attributes). If a call to this connection profile has not been made in this child process, Apache::DBI makes the connection and stores it. Subsequent calls with the same connection profile are then handed out the stored connection. Easy!

All you need to do is ensure you bring in Apache::DBI early -- it should be one of the first calls to PerlModule you make.

Apache::DBI also comes with Apache::AuthenDBI and Apache::AuthzDBI which allow you to authenticate and authorize web users from a DBI database. Just set some variables and you're on your way:

Example 12: Apache::DBI authentication and authorization

   1: <Location /yapc>
   2:  AuthName "YAPC Authentication Demo"
   3:  AuthType Basic
   4:  PerlAuthenHandler                Apache::AuthenDBI
   5:  PerlSetVar Auth_DBI_data_source  'dbi:Sybase:server=yapc;database=web'
   6:  PerlSetVar Auth_DBI_username     'webuser'
   7:  PerlSetVar Auth_DBI_password     'webpass'
   8:  PerlSetVar Auth_DBI_pwd_table    'SysUser'
   9:  PerlSetVar Auth_DBI_uid_field    'email'
  10:  PerlSetVar Auth_DBI_pwd_field    'password'
  11:  PerlSetVar Auth_DBI_encrypted    'off'
  12:  require valid-user
  13:  AllowOverride None
  14: </Location>

Lines 1 and 14 demarcate where we'd like this authentication and authorization to take place: whenever a user calls a URL that begins with /yapc.
Lines 2-3 define the title of the authorization that the user will see and the type of authentication we're using.
Line 4 tells mod_perl that we'll be using the Apache::AuthzDBI handler to deal with this authentication.
Lines 5-11 define the parameters necessary to connect to the database as well as the table and fieldnames that contain the authentication information. We happen to be using a user's e-mail as his/her username in this application. We also currently store the passwords in unencrypted format. This is generally a no-no for production systems. If set to on, Apache::AuthenDBI will use the standard perl crypt function to test the password given.
Lines 12-13 are directives for Apache necessary for authentication.

One further example deals with logging. This is a slight adaptation of a program that logs information from all the virtual hosts running on our webservers. The code was swiped from two writings by Lincoln Stein -- one from a Web Techniques article, another from items in the book he and Doug MacEachern wrote on mod_perl.

First, you need to setup some simple configuration:

Example 13a: Apache configuration settings for logging web requests to a database

Put this in the httpd.conf file, or wherever you define the VirtualHost settings:

   1: PerlSetVar INTES_VHOST    'www.cpahelp.com'
   2: PerlLogHandler            Apache::INTES_LogDBI

Easy! Line 1 just sets a variable that travels along with this request -- this seemed easier to me than retrieving the information from the Apache request object. (Translation: I am lazy.) Line 2 tells Apache that we want the subroutine handler in the module Apache::INTES_LogDBI to deal with any logging requests.

That module looks like this:

Example 13b: Module for logging web requests to a database

   1: package Apache::INTES_LogDBI;
   2: use Apache::Constants ':common';
   3:  use strict;
   4: use vars qw/ 
   5:   $dbh $sth
   6: /;
   7: use DBI;
   8: use POSIX 'strftime';
   9: 
  10: my $DSN       = 'DBI:mysql:Logs';
  11: my $db_user   = 'db_user';
  12: my $db_passwd = 'db_passwd';
  13: my $log_table = 'Weblogs';
  14: $dbh = DBI->connect( $DSN, $db_user, $db_passwd ) 
  15:               || die "Error connecting: $!";
  16: my $sql = qq/
  17:   INSERT INTO $log_table
  18:   ( when, host, method, url, auth,
  19:     browser, referer, status, bytes, vhost )
  20:   VALUES 
  21:   ( ?,?,?,?,?,
  22:     ?,?,?,?,? )
  23: /;
  24: $sth = $dbh->prepare( $sql ) || die $dbh->errstr;
  25: 
  26: sub handler {
  27:  my $r = shift;
  28:  my $url     = $r->uri;
  29:  return DECLINED if ( $url =~ /(gif|jpg|jpeg|tiff|png)$/i );
  30:  my $vhost   = $r->dir_config( 'INTES_VHOST' );
  31:  my $date    = strftime( '%Y-%m-%d %H:%M:%S', localtime );
  32:  my $host    = $r->get_remote_host;
  33:  my $method  = $r->method;
  34:  my $user    = $r->connection->user;
  35:  my $referer = $r->header_in( 'Referer' );
  36:  my $browser = $r->header_in( 'User-agent' );
  37:  my $status  = $r->status;
  38:  my $bytes   = $r->bytes_sent;
  39:  $sth->execute( $date, $host, $method, $url, $user,
  40:           $browser, $referer, $status, $bytes, $vhost );
  41:  return OK;
  42: }
  43: 1;

Lines 1-8: Declare our package name and use a few modules. We use the POSIX date formatting string routine strftime here.
Lines 10-15: Define our data source name and connect to the database. Since we're using the Apache::DBI module (not seen here), our database connection is cached one per child.
Lines 16-23: Define the SQL statement that will be executed every time we get a web request we'd like to log. Note that we're using placeholders for all the values we're inserting.
Line 24: Prepare the statement handle. Since this statement handle is a package variable, we can access it within the subroutine handler which gets called with each web request.
Lines 26-28: Start the handler routine, get the request object and fetch the URL from it.
Line 29: If the URL is a graphic we don't want to deal with it. This is strictly a policy decision -- if a particular virtual host wants to log the requests to graphics, we can not perform this check. We can (and do) other types of checks here as well -- perhaps one website is driven by templates and server-side includes (SSI). We probably don't want to log the requests for the pages referenced by SSI calls. A good regex will normally take care of anything we want to screen out here.
The constant DECLINED is imported with the Apache::Constants module and lets the server know that we haven't logged this request and don't want to.
Lines 30-38: Retrieve the variables we want from the Apache request object.
Lines 39-40: Execute our prepared statement handle with the values from the request object.

This is a very simple application of a mod_perl module, but it gives a good idea of how you can use the DBI and mod_perl together.

Tie::DBI

The Tie::DBI module, by Lincoln Stein, allows you to tie a hash to a DBI database table: changes made to the hash get made automatically to the table. Very nifty. It has quite a bit of overhead, however.

DBIx::Recordset

Similar to Tie::DBI, except it can do joins and a number of other very useful actions in an object-oriented fashion.

Win32::ODBC

Win32::ODBC is a module for Win32 systems only that allows more access to native ODBC functions than DBD::ODBC. However, it's generally a one-way street -- applications written for the DBI aren't directly compatible with Win32::ODBC and vice versa. There are compatibility wrappers written to move betwen the two, but I am not familiar with how effective they are.

Others

The Apache::Session module allows you to create persistent data among web sessions. One method to store the data is the DBI.

LINKS

Cribbed from various sources, including the DBI documentation and an intro list Monty Dickerson <Monty.Dickerson@econophone.com> put together.

DBI materials

A number of hints have been dropped by Tim Bunce about a DBI book being published by O'Reilly. The grapevine says early Fall 1999 for a publishing date. Time will tell..

FAQ

 http://www.hermetica.com/technologia/perl/DBI/doc/faq.html

DBI Frequently Asked Questions; by Alligator Descartes <descarte@arcana.co.uk> This document serves to answer the most frequently asked questions on both the DBI Mailing Lists and personally to members of the DBI development team.

Article from The Perl journal about the DBI

 http://www.arcana.co.uk/technologia/perl/DBI/doc/tpj5/

DBI Mailing Lists

Information (subscribing and unsubscribing):

 http://www.isc.org/dbi-lists.html

Archives:

 http://www.bitmechanic.com/mail-archives/dbi-users/ (US)
 http://outside.organic.com/mail-archives/dbi-users/ (US)
 http://www.xray.mpe.mpg.de/mailing-lists/dbi/ (Europe)

DBI Examples

 http://eskimo.tamu.edu/~jbaker/dbi-examples.html

The purpose of this document is to provide some examples for using the Perl DBI modules to access databases. The motivation for this is to relieve some of the FAQ traffic on the DBI mailing list. Over the course of this document, we will build a robust DBI program to access an Oracle database. We will start with DBI basics, then introduce concepts to improve performance and reliability.

Hotwired tutorial

 http://www.hotwired.com/webmonkey/backend/tutorials/tutorial1.html

Web Database Crash Course; by Richard Dice <rdice@pobox.com> This five-part crash course will take you, step by step, from general information about your database options to the nitty-gritty of setting up an online database.

Perl-Win32 Database FAQ

 http://www.geocities.com/SiliconValley/Way/6278/perl-win32-database.html

This is a result of seeing too many questions about the same subject in mailing list Perl-Win32-Database.

SQL and RDBMS

Structured Query Language and Relational Database Management Systems (RDBMS) references:

Philip and Alex's Guide to Web Publishing

 http://www.photo.net/wtr/thebook/

(see the CGI warnings at /wtr/thebook/server-programming.html)

Anyone developing web applications should read this book. It's even online -- you'll also get to see some fantastic photos along the way.

SQL Tutorial

 http://w3.one.net/~jhoffman/sqltut.htm

Advantages of a DBMS

 http://www.compapp.dcu.ie/databases/f001.html

Ask the SQL Pro

 http://www.inquiry.com/techtips/thesqlpro/

SQL field types

 ftp://jerry.ece.umassd.edu/isowg3/dbl/SQL_Registry

mSQL PC Tutorials

 http://blnet.com/msqlpc/tutorials.htm

This page contains documentation for newbies on relational and client/server databases in general.

Database vendors

These are the vendors mentioned in this document:

MySQL

 http://www.mysql.com/ or 
 http://www.tcx.se/

Sybase

 http://www.sybase.com/

Oracle

 http://www.oracle.com/

MiniSQL

 http://www.hughes.com.au/

xBase file specifications

 http://www.e-bachmann.dk/docs/xbase.htm

Miscellanous

Writing Apache Modules with Perl and C

by Lincoln Stein and Doug MacEachern. The mod_perl book. Learn it, love it, live it if you're working with Apache and perl. See:

 http://www.oreilly.com/catalog/wrapmod/

AUTHOR

This document was created by Chris Winters. All mistakes and omissions are his own; all worthwhile knowledge was gleaned from the community at-large. Special thanks go to Tim Bunce for the DBI module and unfailing patience; Michael Peppler for the DBD::Sybase module and his uncanny ability to monitor every Sybase mailing list/newsgroup in the known universe; Jochen Wiedmann for the DBD::mysql module; Edmund Mergl for Apache::DBI; everyone else who has contributed to the DBI; and the DC Perlmongers group for hearing a very, very early version of this presentation.

Please e-mail comments, suggestions, corrections to <chris@cwinters.com> The most current version of this document (maybe even with slides!) is at:

 http://www.cwinters.com/tech/

A possibly updated version of this will be on the yapc website.

REVISIONS

0.5 (1999 June)
Major overhaul for yapc. Lots more examples, same banter.
0.1 (1998 October)
First version. Stepped through all major DBI methods and illustrated some day-to-day usage. Included error handling, binding parameters in and out, some links and witty banter.

Perl and Databases

Table of Contents