I l@ve RuBoard

2.5 The os Module

As mentioned, os contains all the usual operating-system calls you may have used in your C programs and shell scripts. Its calls deal with directories, processes, shell variables, and the like. Technically, this module provides POSIX tools -- a portable standard for operating-system calls -- along with platform-independent directory processing tools as nested module os.path. Operationally, os serves as a largely portable interface to your computer's system calls: scripts written with os and os.path can usually be run on any platform unchanged.

In fact, if you read the os module's source code, you'll notice that it really just imports whatever platform-specific system module you have on your computer (e.g., nt, mac, posix). See the file os.py in the Python source library directory -- it simply runs a from* statement to copy all names out of a platform-specific module. By always importing os instead of platform-specific modules, though, your scripts are mostly immune to platform implementation differences.

2.5.1 The Big os Lists

Let's take a quick look at the basic interfaces in os. If you inspect this module's attributes interactively, you get a huge list of names that will vary per Python release, will likely vary per platform, and isn't incredibly useful until you've learned what each name means:

>>> import os
>>> dir(os)
['F_OK', 'O_APPEND', 'O_BINARY', 'O_CREAT', 'O_EXCL', 'O_RDONLY', 'O_RDWR', 
'O_TEXT', 'O_TRUNC', 'O_WRONLY', 'P_DETACH', 'P_NOWAIT', 'P_NOWAITO',
'P_OVERLAY', 'P_WAIT', 'R_OK', 'UserDict', 'W_OK', 'X_OK', '_Environ',
'__builtins__', '__doc__', '__file__', '__name__', '_execvpe', '_exit',
'_notfound', 'access', 'altsep', 'chdir', 'chmod', 'close', 'curdir', 
'defpath', 'dup', 'dup2', 'environ', 'error', 'execl', 'execle', 'execlp',
'execlpe', 'execv', 'execve', 'execvp', 'execvpe', 'fdopen', 'fstat', 'getcwd',
'getpid', 'i', 'linesep', 'listdir', 'lseek', 'lstat', 'makedirs', 'mkdir',
'name', 'open', 'pardir', 'path', 'pathsep', 'pipe', 'popen', 'putenv', 'read',
'remove', 'removedirs', 'rename', 'renames', 'rmdir', 'sep', 'spawnv', 
'spawnve', 'stat', 'strerror', 'string', 'sys', 'system', 'times', 'umask', 
'unlink', 'utime', 'write']

Besides all of these, the nested os.path module exports even more tools, most of which are related to processing file and directory names portably:

>>> dir(os.path)
['__builtins__', '__doc__', '__file__', '__name__', 'abspath', 'basename', 
'commonprefix', 'dirname', 'exists', 'expanduser', 'expandvars', 'getatime',
'getmtime', 'getsize', 'isabs', 'isdir', 'isfile', 'islink', 'ismount', 'join',
'normcase', 'normpath', 'os', 'split', 'splitdrive', 'splitext', 'splitunc',
'stat', 'string', 'varchars', 'walk']

2.5.2 Administrative Tools

Just in case those massive listings aren't quite enough to go on, let's experiment with some of the simpler os tools interactively. Like sys, the os module comes with a collection of informational and administrative tools:

>>> os.getpid(  )
-510737
>>> os.getcwd(  )
'C:\\PP2ndEd\\examples\\PP2E\\System'

>>> os.chdir(r'c:\temp')
>>> os.getcwd(  )
'c:\\temp'

As shown here, the os.getpid function gives the calling process's process ID (a unique system-defined identifier for a running program), and os.getcwd returns the current working directory. The current working directory is where files opened by your script are assumed to live, unless their names include explicit directory paths. That's why I told you earlier to run the following command in the directory where more.py lives:

C:\...\PP2E\System>python more.py more.py

The input filename argument here is given without an explicit directory path (though you could add one to page files in another directory). If you need to run in a different working directory, call the os.chdir function to change to a new directory; your code will run relative to the new directory for the rest of the program (or until the next os.chdir call). This chapter has more to say about the notion of a current working directory, and its relation to module imports, when it explores script execution context later.

2.5.3 Portability Constants

The os module also exports a set of names designed to make cross-platform programming simpler. The set includes platform-specific settings for path and directory separator characters, parent and current directory indicators, and the characters used to terminate lines on the underlying computer:^[4]

^[4] os.linesep comes back as \015\012 here -- the octal escape code equivalent of \r\n, reflecting the carriage-return + line-feed line terminator convention on Windows. See the discussion of end-of-line translations in Section 2.11 later in this chapter.

>>> os.pathsep, os.sep, os.pardir, os.curdir, os.linesep
(';', '\\', '..', '.', '\015\012')

Name os.sep whatever character is used to separate directory components on the platform Python is running on; it is automatically preset to "\" on Windows, "/" for POSIX machines, and ":" on the Mac. Similarly, os.pathsep provides the character that separates directories on directory lists -- ":" for POSIX and ";" for DOS and Windows. By using such attributes when composing and decomposing system-related strings in our scripts, they become fully portable. For instance, a call of the form string.split(dirpath,os.sep) will correctly split platform-specific directory names into components, even though dirpath may look like "dir\dir" on Windows, "dir/dir" on Linux, and "dir:dir" on Macintosh.

2.5.4 Basic os.path Tools

The nested module os.path provides a large set of directory-related tools of its own. For example, it includes portable functions for tasks such as checking a file's type (isdir, isfile, and others), testing file existence (exists), and fetching the size of a file by name (getsize):

>>> os.path.isdir(r'C:\temp'),        os.path.isfile(r'C:\temp')
(1, 0)
>>> os.path.isdir(r'C:\config.sys'),  os.path.isfile(r'C:\config.sys')
(0, 1)
>>> os.path.isdir('nonesuch'),        os.path.isfile('nonesuch')
(0, 0)

>>> os.path.exists(r'c:\temp\data.txt')
0
>>> os.path.getsize(r'C:\autoexec.bat')
260

The os.path.isdir and os.path.isfile calls tell us whether a filename is a directory or a simple file; both return (false) if the named file does not exist. We also get calls for splitting and joining directory path strings, which automatically use the directory name conventions on the platform on which Python is running:

>>> os.path.split(r'C:\temp\data.txt')
('C:\\temp', 'data.txt')
>>> os.path.join(r'C:\temp', 'output.txt')
'C:\\temp\\output.txt'

>>> name = r'C:\temp\data.txt'                            # Windows paths
>>> os.path.basename(name), os.path.dirname(name)
('data.txt', 'C:\\temp')

>>> name = '/home/lutz/temp/data.txt'                     # Unix-style paths
>>> os.path.basename(name), os.path.dirname(name)
('data.txt', '/home/lutz/temp')

>>> os.path.splitext(r'C:\PP2ndEd\examples\PP2E\PyDemos.pyw')
('C:\\PP2ndEd\\examples\\PP2E\\PyDemos', '.pyw')

Call os.path.split separates a filename from its directory path, and os.path.join puts them back together -- all in entirely portable fashion, using the path conventions of the machine on which they are called. The basename and dirname calls here simply return the second and first items returned by a split as a convenience, and splitext strips the file extension (after the last "."). This module also has an abspath call that portably returns the absolute full directory pathname of a file; it accounts for adding the current directory, ".." parents, and more:

>>> os.getcwd(  )
'C:\\PP2ndEd\\cdrom\\WindowsExt'
>>> os.path.abspath('temp')                    # expand to full path name
'C:\\PP2ndEd\\cdrom\\WindowsExt\\temp'
>>> os.path.abspath(r'..\examples')            # relative paths expanded
'C:\\PP2ndEd\\examples'
>>> os.path.abspath(r'C:\PP2ndEd\chapters')    # absolute paths unchanged
'C:\\PP2ndEd\\chapters'
>>> os.path.abspath(r'C:\temp\spam.txt')       # ditto for file names
'C:\\temp\\spam.txt'
>>> os.path.abspath('')                        # empty string means the cwd
'C:\\PP2ndEd\\cdrom\\WindowsExt'

Because filenames are relative to the current working directory when they aren't fully specified paths, the os.path.abspath function helps if you want to show users what directory is truly being used to store a file. On Windows, for example, when GUI-based programs are launched by clicking on file explorer icons and desktop shortcuts, the execution directory of the program is the clicked file's home directory, but that is not always obvious to the person doing the clicking; printing a file's abspath can help.

2.5.5 Running Shell Commands from Scripts

The os module is also the place where we run shell commands from within Python scripts. This concept is intertwined with others we won't cover until later in this chapter, but since this a key concept employed throughout this part of the book, let's take a quick first look at the basics here. Two os functions allow scripts to run any command line that you can type in a console window:

os.system: Run a shell command from a Python script
os.popen: Run a shell command and connect to its input or output streams

2.5.5.1 What's a shell command?

To understand the scope of these calls, we need to first define a few terms. In this text the term shell means the system that reads and runs command-line strings on your computer, and shell command means a command-line string that you would normally enter at your computer's shell prompt.

For example, on Windows, you can start an MS-DOS console window and type DOS commands there -- things like dir to get a directory listing, type to view a file, names of programs you wish to start, and so on. DOS is the system shell, and commands like dir and type are shell commands. On Linux, you can start a new shell session by opening an xterm window and typing shell commands there too -- ls to list directories, cat to view files, and so on. There are a variety of shells available on Unix (e.g., csh, ksh), but they all read and run command lines. Here are two shell commands typed and run in an MS-DOS console box on Windows:

C:\temp>dir /B                      ...type a shell command-line
about-pp.html                      ...its output shows up here
python1.5.tar.gz                   ...DOS is the shell on Windows
about-pp2e.html
about-ppr2e.html
newdir

C:\temp>type helloshell.py 
# a Python program
print 'The Meaning of Life'

2.5.5.2 Running shell commands

None of this is directly related to Python, of course (despite the fact that Python command-line scripts are sometimes confusingly called "shell tools"). But because the os module's system and popen calls let Python scripts run any sort of command that the underlying system shell understands, our scripts can make use of every command-line tool available on the computer, whether it's coded in Python or not. For example, here is some Python code that runs the two DOS shell commands typed at the shell prompt shown previously:

C:\temp>python
>>> import os
>>> os.system('dir /B')
about-pp.html
python1.5.tar.gz
about-pp2e.html
about-ppr2e.html
newdir
0

>>> os.system('type helloshell.py')
# a Python program
print 'The Meaning of Life'
0

The "0"s at the end here are just the return values of the system call itself. The system call can be used to run any command line that we could type at the shell's prompt (here, C:\temp>). The command's output normally shows up in the Python session's or program's standard output stream.

2.5.5.3 Communicating with shell commands

But what if we want to grab a command's output within a script? The os.system call simply runs a shell command line, but os.popen also connects to the standard input or output streams of the command -- we get back a file-like object connected to the command's output by default (if we pass a "w" mode flag to popen, we connect to the command's input stream instead). By using this object to read the output of a command spawned with popen, we can intercept the text that would normally appear in the console window where a command line is typed:

>>> open('helloshell.py').read(  )
"# a Python program\012print 'The Meaning of Life'\012"

>>> text = os.popen('type helloshell.py').read(  )
>>> text
"# a Python program\012print 'The Meaning of Life'\012"

>>> listing = os.popen('dir /B').readlines(  )
>>> listing
['about-pp.html\012', 'python1.5.tar.gz\012', 'helloshell.py\012', 
'about-pp2e.html\012', 'about-ppr2e.html\012', 'newdir\012']

Here, we first fetch a file's content the usual way (using Python files), then as the output of a shell type command. Reading the output of a dir command lets us get a listing of files in a directory which we can then process in a loop (we'll meet other ways to obtain such a list later in this chapter). So far, we've run basic DOS commands; because these calls can run any command line that we can type at a shell prompt, they can also be used to launch other Python scripts:

>>> os.system('python helloshell.py')       # run a Python program
The Meaning of Life
0
>>> output = os.popen('python helloshell.py').read(  )
>>> output
'The Meaning of Life\012'

In all of these examples, the command-line strings sent to system and popen are hardcoded, but there's no reason Python programs could not construct such strings at runtime using normal string operations (+, %, etc.). Given that commands can be dynamically built and run this way, system and popen turn Python scripts into flexible and portable tools for launching and orchestrating other programs. For example, a Python test "driver" script can be used to run programs coded in any language (e.g., C++, Java, Python) and analyze their outputs. We'll explore such a script in Section 4.4 in Chapter 4.

2.5.5.4 Shell command limitations

You should keep in mind two limitations of system and popen. First, although these two functions themselves are fairly portable, their use is really only as portable as the commands that they run. The preceding examples that run DOS dir and type shell commands, for instance, work only on Windows, and would have to be changed to run ls and cat commands on Unix-like platforms. As I wrote this, the popen call on Windows worked for command-line programs only; it failed when called from a program running on Windows with any sort of user interface (e.g., under the IDLE Python development GUI). This has been improved in the Python 2.0 release -- popen now works much better on Windows -- but this fix naturally works only on machines with the latest version of Python installed.

Second, it is important to remember that running Python files as programs this way is very different, and generally much slower, than importing program files and calling functions they define. When os.system and os.popen are called, they must start a brand-new independent program running on your operating system (on Unix-like platforms, they run the command in a newly forked process). When importing a program file as a module, the Python interpreter simply loads and runs the file's code in the same process, to generate a module object. No other program is spawned along the way.^[5]

^[5] The Python execfile built-in function also runs a program file's code, but within the same process that called it. It's similar to an import in that regard, but works more as if the file's text had been pasted into the calling program at the place where the execfile call appears (unless explicit global or local namespace dictionaries are passed). Unlike imports, execfile unconditionally reads and executes a file's code (it may be run more than once per process), and no module object is generated by the file's execution.

There are good reasons to build systems as separate programs too, and we'll later explore things like command-line arguments and streams that allow programs to pass information back and forth. But for most purposes, imported modules are a faster and more direct way to compose systems.

If you plan to use these calls in earnest, you should also know that the os.system call normally blocks (that is, pauses) its caller until the spawned command line exits. On Linux and Unix-like platforms, the spawned command can generally be made to run independently and in parallel with the caller, by adding an & shell background operator at the end of the command line:

os.system("python program.py arg arg &")

On Windows, spawning with a DOS start command will usually launch the command in parallel too:

os.system("start program.py arg arg")

The os.popen call generally does not block its caller -- by definition, the caller must be able to read or write the file object returned -- but callers may still occasionally become blocked under both Windows and Linux if the pipe object is closed (e.g., when garbage is collected) before the spawned program exits, or the pipe is read exhaustively (e.g., with its read( ) method). As we will see in the next chapter, the Unix os.fork/exec and Windows os.spawnv calls can also be used to run parallel programs without blocking.

Because the os system and popen calls also fall under the category of program launchers, stream redirectors, and cross-process communication devices, they will show up again in later parts of this and the following chapters, so we'll defer further details for the time being.

2.5.6 Other os Module Exports

Since most other os module tools are even more difficult to appreciate outside the context of larger application topics, we'll postpone a deeper look until later sections. But to let you sample the flavor of this module, here is a quick preview for reference. Among the os module's other weapons are these:

os.environ: Fetch and set shell environment variables
os.fork: Spawn a new child process on Unix
os.pipe: Communicate between programs
os.execlp: Start new programs
os.spawnv: Start new programs on Windows
os.open: Open a low-level descriptor-based file
os.mkdir: Create a new directory
os.mkfifo: Create a new named pipe
os.stat: Fetch low-level file information
os.remove: Delete a file by its pathname
os.path.walk: Apply a function to files in an entire directory tree

And so on. One caution up front: the os module provides a set of file open, read, and write calls, but these all deal with low-level file access and are entirely distinct from Python's built-in stdio file objects that we create with the built-in open function. You should normally use the built-in open function (not the os module) for all but very special file-processing needs.

Throughout this chapter, we will apply sys and os tools such as these to implement common system-level tasks, but this book doesn't have space to provide an exhaustive list of the contents of the modules we meet along the way. If you have not already done so, you should become acquainted with the contents of modules like os and sys by consulting the Python library manual. For now, let's move on to explore additional system tools, in the context of broader system programming concepts.

I l@ve RuBoard