11.2. DBM ModulesA DBM-like file is a file that contains pairs of strings (key,data), with support for fetching or storing the data given a key, known as keyed access. DBM-like files were developed on early Unix systems, with functionality roughly equivalent to that of access methods popular on mainframe and minicomputers of the time, such as ISAM, the Indexed-Sequential Access Method. Today, many libraries, available for many platforms, let programs written in many different languages create, update, and read DBM-like files. Keyed access, while not as powerful as the data access functionality of relational databases, may often suffice for a program's needs. If DBM-like files are sufficient, you may end up with a program that is smaller and faster than one using an RDBMS. The classic dbm library, whose first version introduced DBM-like files many years ago, has limited functionality but tends to be available on many Unix platforms. The GNU version, gdbm, is richer and very widespread. The BSD version, dbhash, offers superior functionality. Python supplies modules that interface with each of these libraries if the relevant underlying library is installed on your system. Python also offers a minimal DBM module, dumbdbm (usable anywhere, as it does not rely on other installed libraries), and generic DBM modules, which are able to automatically identify, select, and wrap the appropriate DBM library to deal with an existing or new DBM file. Depending on your platform, your Python distribution, and what dbm-like libraries you have installed on your computer, the default Python build may install some subset of these modules. In general, as a minimum, you can rely on having module dbm on all Unix-like platforms, module dbhash on Windows, and dumbdbm on any platform. 11.2.1. The anydbm ModuleThe anydbm module is a generic interface to any other DBM module. anydbm supplies a single factory function.
11.2.2. The dumbdbm ModuleThe dumbdbm module supplies minimal DBM functionality and mediocre performance. dumbdbm's advantage is that you can use it anywhere, since dumbdbm does not rely on any library. You don't normally import dumbdbm; rather, import anydbm, and let anydbm supply your program with the best DBM module available, defaulting to dumbdbm if nothing better is available on the current Python installation. The only case in which you import dumbdbm directly is the rare one in which you need to create a DBM-like file that you can later read from any Python installation. Module dumbdbm supplies an open function and an exception class error polymorphic to anydbm's. 11.2.3. The dbm, gdbm, and dbhash ModulesThe dbm module exists only on Unix platforms, where it can wrap any of the dbm, ndbm, and gdbm libraries, since each supplies a dbm-compatibility interface. You hardly ever import dbm directly; rather, import anydbm, and let anydbm supply your program with the best DBM module available, including dbm if appropriate. Module dbm supplies an open function and an exception class error polymorphic to anydbm's. The gdbm module wraps the GNU DBM library, gdbm. The gdbm.open function accepts other values for the flag argument and returns a mapping object m with a few extra methods. You may import gdbm directly to access nonportable functionality. I do not cover gdbm specifics in this book, since I focus on cross-platform Python. The dbhash module wraps the BSDDB library in a DBM-compatible way. The dbhash.open function accepts other values for the flag argument and returns a mapping object m with a few extra methods. You may import dbhash directly to access nonportable functionality. For full access to the BSD DB functionality, however, you should instead import bsddb, as covered in "Berkeley DB Interfacing" on page 288. 11.2.4. The whichdb ModuleThe whichdb module attempts to guess which of the several DBM modules is appropriate to use. whichdb supplies a single function.
11.2.5. Examples of DBM-Like File UseKeyed access is suitable when your program needs to record persistently the equivalent of a Python dictionary, with strings as both keys and values. For example, suppose you need to analyze several text files, whose names are given as your program's arguments, and record where each word appears in those files. In this case, the keys are words and, therefore, intrinsically strings. The data you need to record for each word is a list of (filename, line-number) pairs. However, you can encode the data as a string in several waysfor example, by exploiting the fact that the path separator string os.pathsep (covered in "Path-String Attributes of the os Module" on page 241) does not normally appear in filenames. (Note that more solid, general, and reliable approaches to the general issue of encoding data as strings are covered in "Serialization" on page 278.) With this simplification, the program that records word positions in files might be as follows: import fileinput, os, anydbm wordPos = { } sep = os.pathsep for line in fileinput.input( ): pos = '%s%s%s'%(fileinput.filename( ), sep, fileinput.filelineno( )) for word in line.split( ): wordPos.setdefault(word,[ ]).append(pos) dbmOut = anydbm.open('indexfile','n') sep2 = sep * 2 for word in wordPos: dbmOut[word] = sep2.join(wordPos[word]) dbmOut.close( ) We can read back the data stored to the DBM-like file indexfile in several ways. The following example accepts words as command-line arguments and prints the lines where the requested words appear: import sys, os, anydbm, linecache dbmIn = anydbm.open('indexfile') sep = os.pathsep sep2 = sep * 2 for word in sys.argv[1:]: if not dbmIn.has_key(word): sys.stderr.write('Word %r not found in index file\n' % word) continue places = dbmIn[word].split(sep2) for place in places: fname, lineno = place.split(sep) print "Word %r occurs in line %s of file %s:" % (word,lineno,fname) print linecache.getline(fname, int(lineno)), |