Handling Databases with Perl DBI

Handling Databases with Perl DBI.

Perl DBI (CPAN doc)

Perl DBI is used to interact with databases, DBI stands for DataBase Independent Interface.

The DBI architecture is split into two main groups of software, the DBI itself, and the drivers.

             |<- Scope of DBI ->|
                  .-.   .--------------.   .--------------.
  .-------.       | |---| MySQL Driver |---| MySQL Engine |
  | Perl  |       | |   `--------------'   `--------------'
  | script|  |A|  |D|   .--------------.   .--------------.
  | using |--|P|--|B|---|Postgre Driver|---|Postgre Engine|
  | DBI   |  |I|  |I|   `--------------'   `--------------'
  | API   |       | |...
  |methods|       | |... Other drivers
  `-------'       | |...
                  `-'

Specific drivers are implemented for each different type of database and actually perform the operations on the databases

Handles

The DBI defines three main types of objects, known as handles, to interact with databases:

  • Handles for drivers represent loaded database drivers (one for PostgreSQL, an other for Oracle if needed, …). They are used by the DBI to create handles for database connections.
    Driver handles are normally not explicitly referenced, they are only for internal use within the DBI.
  • Handles for database connections ($dbh) encapsulate a single connection to a particular database. They are used to create handles for statements.
  • Handles for statements ($sth) encapsulate individual SQL statements to be executed within the database. Multiple statements can be created and executed within one script, and the data can be processed as it returns.

Database Driver and Parameter

First you need to tell the DBI which database you want to use (PostgreSQL, MySQL, etc.), where to find the database server (host, port, etc.) and the parameters for the connection (database, user, password, etc).
This done with a string called “data source name”. It starts with the characters dbi:, then the name of the driver, followed by another colon, what follows is passed to the driver’s own connect() method to be interpreted as it sees fit. Example for PostgreSQL:

DBI->connect( "dbi:Pg:dbname=database,username,password");

The instantiation of driver handles happens “under the hood” of DBI, when DBI->connect() is called. DBI->connect() returns a connection handle.

The data source string and its parameter syntax is different for every database.
For PostgreSQL all parameters, including the userid and password parameter of the connect command, have a hard-coded default which can be overridden by setting appropriate environment variables:

  Parameter  Environment Variable  Default
  ---------  --------------------  --------------
  dbname     PGDATABASE            current userid
  host       PGHOST                localhost
  port       PGPORT                5432
  options    PGOPTIONS             ""
  tty        PGTTY                 ""
  username   PGUSER                current userid
  password   PGPASSWORD            ""

You can get a list of all the available drivers installed on your machine by using the DBI->available_drivers() method.
Then you can invoke the DBI->data_sources() method against one or more of the drivers returned by DBI->available_drivers() to enumerate which data sources are known to the driver.

my @drivers = DBI->available_drivers();
foreach $driver ( @drivers ) {
    print "driver: $driver \n";
    my @sources = DBI->data_sources( $driver );
    print "\t sources: @sources\n\n";
}

Output (on my machine):

driver: ExampleP
	 sources: dbi:ExampleP:dir=.

driver: Pg
	 sources: dbi:Pg:dbname=dump_test
	          dbi:Pg:dbname=template0
		  dbi:Pg:dbname=template1
		  dbi:Pg:dbname=testbase
		  dbi:Pg:dbname=workshop

Database Handle: Connection and Disconnection

You connect to a database with the DBI->connect() method:

$DB_user    = 'me';
$DB_name    = 'workshop';
$TABLE_name = 'stud';
$DB_pwd     = '';

$dbh = DBI->connect("dbi:Pg:dbname=$DB_name","$DB_user","$DB_pwd");

$dbh->disconnect();

DBI->connect() instantiates the driver and returns a connection handle.

disconnect() closes the connection again.


Error Handling

The DBI performs basic automatic error reporting when the PrintError attribute is enabled (default). To disable this feature, set the value to 0 either via the connection handle, or via the attribute hash of the connect() method:

$dbh->{PrintError} = 0; # disable
$dbh->{PrintError} = 1; # enable

Or:

$dbh = DBI->connect("dbi:Pg:dbname=$DB_name","$DB_user","$DB_pwd", {
            PrintError => 0
        } )
       or die "Cannot connect: $DBI::errstr\n";

Upon failure connect() returns the value undef, which can be used to perform error checking on the call and the error message contained within the variable $DBI::errstr will be printed.

$DBI::errstr() is the string containing a description of the error, as provided by the underlying database. This string corresponds to the error number stored in $DBI::err().
$DBI::state is the string in the format of the standard SQLSTATE five-character error string.

Alternatively the functions $h->errstr(), $h->err() and $h->state() can be used, they return the values of the variables listed above.


Queries

Before the statement can be executed, it needs to be prepared for execution:

$sth  = $dbh->prepare("SELECT * FROM stud");

The prepare() function returns a statement handle (commonly called $sth).
Once a statement is prepared, you can execute it:

my $rv = $sth->execute;

The return value from execute() should be nonzero. If it is 0, you need to deal with errors in an appropriate way:

if (!$rv) {
  handleError($dbh->errstr);
  ...
}
else {
  ...
}

Why First Prepare?

The reason is Performance. With prepare you can use placeholders instead of literal values. With placeholders statements only needs to be prepared once. The bind values for each row can be given to the execute method each time it is called. By avoiding the need to re-prepare the statement for each row, the application typically runs many times faster. Example:

my $sth = $dbh->prepare(q{
  INSERT INTO sales (product_code, price) VALUES (?, ?)
}) or die $dbh->errstr;
while (<>) {
    chomp;
    my ($product_code, $price) = split /,/;
    $sth->execute($product_code, $price)
     or die $dbh->errstr;
}

(See execute and bind_param for more details.)
The do() method is fuse of prepare() and execute(). It can only be used for non non-SELECT statement, where you do not need the statement handle to access the results of the query:

$rows_affected = $dbh->do(
                 "UPDATE your_table SET foo = foo + 1");

do() returns the number of affected rows.
When you are done with the query, you should note that to Perl, so that associated information can be released:

$sth->finish;

Fetching the Data

To fetch the results of a SELECT command a row at a time you can use $sth->fetchrow_array(). It returns a new row at each call or the undefined value when no more data is left (this can be used as loop condition):

while ( ($id,$name) = $sth->fetchrow_array() )
{
    	print "$id\t\t $name \n";
}

Output:

1		 fred
3		 tom
2		 lisa
5		 BoB

Alternatively you can use the $sth->rows() function for a loop condition. It returns the number of rows affected by the last row affecting command, or -1 if the number of rows is not known or not available (the rows function is not supported by all database drivers). Example:

$rows = $sth->rows();

for ($i = 0; $i < $rows; $i++)
{
    ($id,$name) = $sth->fetchrow_array();
	print "$id\t\t $name \n";
}

$sth->fetchrow_arrayref() fetches the next row of data and returns a reference to an array holding the field values. Null fields are returned as undef values in the array. This is the fastest way to fetch data.
If there are no more rows or if an error occurs, then fetchrow_arrayref returns an undef.
Note that the same array reference is returned for each fetch.
Example:

while ( $ref = $sth->fetchrow_arrayref() )
{
    print "@{$ref}\n";
}

Output:

1 fred 2 bio m
3 tom 1 bio m
2 lisa 2 bio f
5 BoB  bio f

$sth->fetchrow_hashref() fetches the next row of data and returns it as a reference to a hash containing field name and field value pairs. Null fields are returned as undef values in the hash. Example:

while ( $ref = $sth->fetchrow_hashref() )
{
    print "$$ref{'id'} \t $$ref{'name'}\n";
}

Output:

1		 fred
3		 tom
2		 lisa
5		 BoB

$sth->fetchall_arrayref() can be used to fetch all the data. It returns a reference to an array that contains one reference per row. Example:

$ref = $sth->fetchall_arrayref;
foreach $row ( @{$ref} ) {
    print "@$row\n";
}

Output:

1 fred 2 bio m
3 tom 1 bio m
2 lisa 2 bio f
5 BoB  bio f

$sth->fetchall_hashref($key_field) can be used to fetch all the data. It returns a reference to a hash that contains, at most, one entry per row.
If there are no rows to return, fetchall_hashref returns a reference to an empty hash.
The $key_field parameter provides the name of the field that holds the value to be used for the key for the returned hash.
Example:

$ref = $sth->fetchall_hashref('id');
print "Name for id 1 is $ref->{1}->{name}\n";
foreach $key ( keys(%{$ref}) ) {
    print "$ref->{$key}->{'name'}\n";
}

Output:

Name for id 1 is fred
tom
lisa
fred
BoB

Complete Example

#!/usr/bin/perl

use DBI;

$DB_name    = 'workshop';
$DB_user    = 'me';
$DB_pwd     = '';

$dbh = DBI->connect("dbi:Pg:dbname=$DB_name","$DB_user","$DB_pwd");

print "\nConnection error: $DBI::errstr\n\n";

$sth  = $dbh->prepare("SELECT * FROM stud");
$sth->execute();

while ( ($id,$name) = $sth->fetchrow_array() )
{
    	print "$id\t\t $name \n";
}

$sth->finish();

$dbh->disconnect();

Output:

Connection error: 

1		 fred
3		 tom
2		 lisa
5		 BoB

Debugging with trace

DBI trace information can be enabled for all handles using the trace DBI class method. To enable trace information for a specific handle, use the $h->trace method.

  DBI->trace($trace_level)
  DBI->trace($trace_level, $trace_filename)

Initially trace output is written to STDERR. If $trace_filename is specified and can be opened in append mode then all trace output (including that from other handles) is redirected to that file.

Trace levels are as follows:

  0 - Trace disabled.
  1 - Trace DBI method calls returning with results or errors.
  2 - Trace method entry with parameters and returning with results.
  3 - As above, adding some high-level information from the driver
      and some internal information from the DBI.
  4 - As above, adding more detailed information from the driver.
  5 and above - As above but with more and more obscure information.

Trace level 1 is best for a simple overview of what is happening. Trace level 2 is a good choice for general purpose tracing. Levels 3 and above (up to 9) are best reserved for investigating a specific problem, when you need to see “inside” the driver and DBI.

Example (Add the following line at the top of the “Complete Example”):

DBI->trace( 1 );

Output:

DBI 1.28-nothread dispatch trace level set to 1
Note: perl is running without the recommended perl -w option
-> DBI->connect(dbi:Pg:dbname=workshop, felix, ****)
-> DBI->install_driver(Pg) for linux perl=5.008 pid=25415
   install_driver: DBD::Pg version 1.13
   loaded from /usr/lib/perl5/site_perl/5.8.0/
               i586-linux-thread-multi/DBD/Pg.pm
<- install_driver= DBI::dr=HASH(0x8232c7c)
pg_db_login
<- connect('dbname=workshop' 'felix' ...)=
DBI::db=HASH(0x8296b44) at DBI.pm line 490
dbd_db_STORE
<- STORE('PrintError' 1)= 1 at DBI.pm line 532
dbd_db_STORE
<- STORE('AutoCommit' 1)= 1 at DBI.pm line 532
<- connect= DBI::db=HASH(0x8296b44)

Connection error: 

dbd_st_prepare: statement = >SELECT * FROM stud<
dbd_st_preparse: statement = >SELECT * FROM stud<
<- prepare('SELECT * FROM stud')= DBI::st=HASH(0x8296c04)
at DBI_pg.comlpete.pl line 15
dbd_st_execute
<- execute= 9 at DBI_pg.comlpete.pl line 16
dbd_st_fetch
<- fetchrow_array= ( '1' 'fred' '2' 'bio' 'm' )
[5 items] row1 at DBI_pg.comlpete.pl line 18
1		 fred
dbd_st_fetch
3		 tom
dbd_st_fetch

...
...
...

(See also the DEBUGGING section.)


DBI Function Overview (CPAN doc)

use DBI;

@driver_names = DBI->available_drivers;
@data_sources = DBI->data_sources($driver_name, \%attr);

$dbh = DBI->connect($data_source, $username, $auth, \%attr);

$rv  = $dbh->do($statement);
$rv  = $dbh->do($statement, \%attr);
$rv  = $dbh->do($statement, \%attr, @bind_values);

$ary_ref  = $dbh->selectall_arrayref($statement);
$hash_ref = $dbh->selectall_hashref($statement, $key_field);

$ary_ref = $dbh->selectcol_arrayref($statement);
$ary_ref = $dbh->selectcol_arrayref($statement, \%attr);

@row_ary  = $dbh->selectrow_array($statement);
$ary_ref  = $dbh->selectrow_arrayref($statement);
$hash_ref = $dbh->selectrow_hashref($statement);

$sth = $dbh->prepare($statement);
$sth = $dbh->prepare_cached($statement);

$rc = $sth->bind_param($p_num, $bind_value);
$rc = $sth->bind_param($p_num, $bind_value, $bind_type);
$rc = $sth->bind_param($p_num, $bind_value, \%attr);

$rv = $sth->execute;
$rv = $sth->execute(@bind_values);

$rc = $sth->bind_param_array($p_num, $bind_values, \%attr);
$rv = $sth->execute_array(\%attr);
$rv = $sth->execute_array(\%attr, @bind_values);

$rc = $sth->bind_col($col_num, \$col_variable);
$rc = $sth->bind_columns(@list_of_refs_to_vars_to_bind);

@row_ary  = $sth->fetchrow_array;
$ary_ref  = $sth->fetchrow_arrayref;
$hash_ref = $sth->fetchrow_hashref;

$ary_ref  = $sth->fetchall_arrayref;
$ary_ref  = $sth->fetchall_arrayref( $slice, $max_rows );

$hash_ref = $sth->fetchall_hashref( $key_field );

$rv  = $sth->rows;

$rc  = $dbh->begin_work;
$rc  = $dbh->commit;
$rc  = $dbh->rollback;

$quoted_string = $dbh->quote($string);

$rc  = $h->err;
$str = $h->errstr;
$rv  = $h->state;

$rc  = $dbh->disconnect;