Section 11.2. DBM Modules

11.2. DBM Modules

A DBM-like file is a file that contains pairs of strings (key,data), with support for fetching or storing the data given a key, known as keyed access. DBM-like files were developed on early Unix systems, with functionality roughly equivalent to that of access methods popular on mainframe and minicomputers of the time, such as ISAM, the Indexed-Sequential Access Method. Today, many libraries, available for many platforms, let programs written in many different languages create, update, and read DBM-like files.

Keyed access, while not as powerful as the data access functionality of relational databases, may often suffice for a program's needs. If DBM-like files are sufficient, you may end up with a program that is smaller and faster than one using an RDBMS.

The classic dbm library, whose first version introduced DBM-like files many years ago, has limited functionality but tends to be available on many Unix platforms. The GNU version, gdbm, is richer and very widespread. The BSD version, dbhash, offers superior functionality. Python supplies modules that interface with each of these libraries if the relevant underlying library is installed on your system. Python also offers a minimal DBM module, dumbdbm (usable anywhere, as it does not rely on other installed libraries), and generic DBM modules, which are able to automatically identify, select, and wrap the appropriate DBM library to deal with an existing or new DBM file. Depending on your platform, your Python distribution, and what dbm-like libraries you have installed on your computer, the default Python build may install some subset of these modules. In general, as a minimum, you can rely on having module dbm on all Unix-like platforms, module dbhash on Windows, and dumbdbm on any platform.

11.2.1. The anydbm Module

The anydbm module is a generic interface to any other DBM module. anydbm supplies a single factory function.

open
open(filename,flag='r',mode=0666)

Opens or creates the DBM file named by filename (a string that can be any path to a file, not just a name) and returns a mapping object corresponding to the DBM file. When the DBM file already exists, open uses module whichdb to determine which DBM library can handle the file. When open creates a new DBM file, open chooses the first available DBM module in order of preference: dbhash, gdbm, dbm, or dumbdbm.

flag is a one-character string that tells open how to open the file and whether to create it, as shown in Table 11-1. mode is an integer that open uses as the file's permission bits if open creates the file, as covered in "Creating a File Object with open" on page 216. Not all DBM modules use flags and mode, but for portability's sake you should always supply appropriate values for these arguments when you call anydbm.open.

Table 11-1. flag values for anydbm.open
Flag
Read-only?
If file exists
If file does not exist
'r'
Yes
open opens the file.
open raises error.
'w'
No
open opens the file.
open raises error.
'c'
No
open opens the file.
open creates the file.
'n'
No
open TRuncates the file.
open creates the file.

anydbm.open returns a mapping object m with a subset of the functionality of dictionaries (covered in "Dictionary Operations" on page 59). m only accepts strings as keys and values, and the only mapping methods m supplies are m.has_key and m.keys. You can bind, rebind, access, and unbind items in m with the same indexing syntax m[key] that you would use if m were a dictionary. If flag is 'r', m is read-only, so that you can only access m's items, not bind, rebind, or unbind them. One extra method that m supplies is m.close, with the same semantics as the close method of a file object. Just like for file objects, you should ensure m.close( ) is called when you're done using m. The TRy/finally statement (covered in "try/finally" on page 123) is the best way to ensure finalization (in Python 2.5, the with statement, covered in "The with statement" on page 125, is even better than try/finally).

11.2.2. The dumbdbm Module

The dumbdbm module supplies minimal DBM functionality and mediocre performance. dumbdbm's advantage is that you can use it anywhere, since dumbdbm does not rely on any library. You don't normally import dumbdbm; rather, import anydbm, and let anydbm supply your program with the best DBM module available, defaulting to dumbdbm if nothing better is available on the current Python installation. The only case in which you import dumbdbm directly is the rare one in which you need to create a DBM-like file that you can later read from any Python installation. Module dumbdbm supplies an open function and an exception class error polymorphic to anydbm's.

11.2.3. The dbm, gdbm, and dbhash Modules

The dbm module exists only on Unix platforms, where it can wrap any of the dbm, ndbm, and gdbm libraries, since each supplies a dbm-compatibility interface. You hardly ever import dbm directly; rather, import anydbm, and let anydbm supply your program with the best DBM module available, including dbm if appropriate. Module dbm supplies an open function and an exception class error polymorphic to anydbm's.

The gdbm module wraps the GNU DBM library, gdbm. The gdbm.open function accepts other values for the flag argument and returns a mapping object m with a few extra methods. You may import gdbm directly to access nonportable functionality. I do not cover gdbm specifics in this book, since I focus on cross-platform Python.

The dbhash module wraps the BSDDB library in a DBM-compatible way. The dbhash.open function accepts other values for the flag argument and returns a mapping object m with a few extra methods. You may import dbhash directly to access nonportable functionality. For full access to the BSD DB functionality, however, you should instead import bsddb, as covered in "Berkeley DB Interfacing" on page 288.

11.2.4. The whichdb Module

The whichdb module attempts to guess which of the several DBM modules is appropriate to use. whichdb supplies a single function.

whichdb
whichdb(filename)

Opens the file specified by filename to discover which DBM-like package created the file. whichdb returns None if the file does not exist or cannot be opened and read. whichdb returns '' if the file exists and can be opened and read, but it cannot be determined which DBM-like package created the file (typically, this means that the file is not a DBM file). whichdb returns a string that names a module, such as 'dbm', 'dumbdbm', or 'dbhash', if it finds out which module can read the DBM-like file.

11.2.5. Examples of DBM-Like File Use

Keyed access is suitable when your program needs to record persistently the equivalent of a Python dictionary, with strings as both keys and values. For example, suppose you need to analyze several text files, whose names are given as your program's arguments, and record where each word appears in those files. In this case, the keys are words and, therefore, intrinsically strings. The data you need to record for each word is a list of (filename, line-number) pairs. However, you can encode the data as a string in several waysfor example, by exploiting the fact that the path separator string os.pathsep (covered in "Path-String Attributes of the os Module" on page 241) does not normally appear in filenames. (Note that more solid, general, and reliable approaches to the general issue of encoding data as strings are covered in "Serialization" on page 278.) With this simplification, the program that records word positions in files might be as follows:

import fileinput, os, anydbm wordPos = {  }
sep = os.pathsep for line in fileinput.input( ):
    pos = '%s%s%s'%(fileinput.filename( ), sep, fileinput.filelineno( ))
    for word in line.split( ):
        wordPos.setdefault(word,[  ]).append(pos)
dbmOut = anydbm.open('indexfile','n')
sep2 = sep * 2
for word in wordPos:
    dbmOut[word] = sep2.join(wordPos[word])
dbmOut.close( )

We can read back the data stored to the DBM-like file indexfile in several ways. The following example accepts words as command-line arguments and prints the lines where the requested words appear:

import sys, os, anydbm, linecache dbmIn = anydbm.open('indexfile')
sep = os.pathsep sep2 = sep * 2
for word in sys.argv[1:]:
    if not dbmIn.has_key(word):
         sys.stderr.write('Word %r not found in index file\n' % word)
         continue
    places = dbmIn[word].split(sep2)
    for place in places:
        fname, lineno = place.split(sep)
        print "Word %r occurs in line %s of file %s:" % (word,lineno,fname)
        print linecache.getline(fname, int(lineno)),