I l@ve RuBoard Previous Section Next Section

5.3 Fixing DOS Filenames

The heart of the prior script was findFiles, a function than knows how to portably collect matching file and directory names in an entire tree, given a list of filename patterns. It doesn't do much more than the built-in find.find call, but can be augmented for our own purposes. Because this logic was bundled up in a function, though, it automatically becomes a reusable tool.

For example, the next script imports and applies findFiles, to collect all file names in a directory tree, by using the filename pattern * (it matches everything). I use this script to fix a legacy problem in the book's examples tree. The names of some files created under MS-DOS were made all uppercase; for example, spam.py became SPAM.PY somewhere along the way. Because case is significant both in Python and on some platforms, an import statement like "import spam" will sometimes fail for uppercase filenames.

To repair the damage everywhere in the thousand-file examples tree, I wrote and ran Example 5-6. It works like this: For every filename in the tree, it checks to see if the name is all uppercase, and asks the console user whether the file should be renamed with the os.rename call. To make this easy, it also comes up with a reasonable default for most new names -- the old one in all-lowercase form.

Example 5-6. PP2E\PyTools\fixnames_all.py
#########################################################
# Use: "python ..\..\PyTools\fixnames_all.py".
# find all files with all upper-case names at and below
# the current directory ('.'); for each, ask the user for
# a new name to rename the file to; used to catch old 
# uppercase file names created on MS-DOS (case matters on
# some platforms, when importing Python module files);
# caveats: this may fail on case-sensitive machines if 
# directory names are converted before their contents--the
# original dir name in the paths returned by find may no 
# longer exist; the allUpper heuristic also fails for 
# odd filenames that are all non-alphabetic (ex: '.');
#########################################################

import os, string
listonly = 0

def allUpper(name):
    for char in name:
        if char in string.lowercase:    # any lowercase letter disqualifies
            return 0                    # else all upper, digit, or special 
    return 1 

def convertOne(fname):
    fpath, oldfname = os.path.split(fname)
    if allUpper(oldfname):
        prompt = 'Convert dir=%s file=%s? (y|Y)' % (fpath, oldfname)
        if raw_input(prompt) in ['Y', 'y']:
            default  = string.lower(oldfname)
            newfname = raw_input('Type new file name (enter=%s): ' % default)
            newfname = newfname or default
            newfpath = os.path.join(fpath, newfname)
            os.rename(fname, newfpath)
            print 'Renamed: ', fname
            print 'to:      ', str(newfpath)
            raw_input('Press enter to continue')
            return 1
    return 0

if __name__ == '__main__':
    patts = "*"                              # inspect all file names
    from fixeoln_all import findFiles        # reuse finder function
    matches = findFiles(patts)

    ccount = vcount = 0
    for matchlist in matches:                # list of lists, one per pattern
        for fname in matchlist:              # fnames are full directory paths
            print vcount+1, '=>', fname      # includes names of directories 
            if not listonly:  
                ccount = ccount + convertOne(fname)
            vcount = vcount + 1
    print 'Converted %d files, visited %d' % (ccount, vcount)

As before, the findFiles function returns a list of simple filename lists, representing the expansion of all patterns passed in (here, just one result list, for the wildcard pattern * ).[5] For each file and directory name in the result, this script's convertOne function prompts for name changes; an os.path.split and an os.path.join call combination portably tacks the new filename onto the old directory name. Here is a renaming session in progress on Windows:

[5] Interestingly, using string '*' for the patterns list works the same as using list ['*'] here, only because a single-character string is a sequence that contains itself; compare the results of map(find.find, '*') with map(find.find, ['*']) interactively to verify.

C:\temp\examples>python %X%\PyTools\fixnames_all.py 
Using Python find
1 => .\.cshrc
2 => .\LaunchBrowser.out.txt
3 => .\LaunchBrowser.py
...
 ...more deleted...
...
218 => .\Ai
219 => .\Ai\ExpertSystem
220 => .\Ai\ExpertSystem\TODO
Convert dir=.\Ai\ExpertSystem file=TODO? (y|Y)n 
221 => .\Ai\ExpertSystem\__init__.py
222 => .\Ai\ExpertSystem\holmes
223 => .\Ai\ExpertSystem\holmes\README.1ST
Convert dir=.\Ai\ExpertSystem\holmes file=README.1ST? (y|Y)y 
Type new file name (enter=readme.1st):
Renamed:  .\Ai\ExpertSystem\holmes\README.1st
to:       .\Ai\ExpertSystem\holmes\readme.1st
Press enter to continue
224 => .\Ai\ExpertSystem\holmes\README.2ND
Convert dir=.\Ai\ExpertSystem\holmes file=README.2ND? (y|Y)y 
Type new file name (enter=readme.2nd): readme-more 
Renamed:  .\Ai\ExpertSystem\holmes\README.2nd
to:       .\Ai\ExpertSystem\holmes\readme-more
Press enter to continue
...
 ...more deleted...
...
1471 => .\todos.py
1472 => .\tounix.py
1473 => .\xferall.linux.csh
Converted 2 files, visited 1473

This script could simply convert every all-uppercase name to an all-lowercase equivalent automatically, but that's potentially dangerous (some names might require mixed-case). Instead, it asks for input during the traversal, and shows the results of each renaming operation along the way.

5.3.1 Rewriting with os.path.walk

Notice, though, that the pattern-matching power of the find.find call goes completely unused in this script. Because it always must visit every file in the tree, the os.path.walk interface we studied in Chapter 2 would work just as well, and avoids any initial pause while a filename list is being collected (that pause is negligible here, but may be significant for larger trees). Example 5-7 is an equivalent version of this script that does its tree traversal with the walk callbacks-based model.

Example 5-7. PP2E\PyTools\fixnames_all2.py
###############################################################
# Use: "python ..\..\PyTools\fixnames_all2.py".
# same, but use the os.path.walk interface, not find.find;
# to make this work like the simple find version, puts of
# visiting directories until just before visiting their
# contents (find.find lists dir names before their contents);
# renaming dirs here can fail on case-sensitive platforms 
# too--walk keeps extending paths containing old dir names;
###############################################################

import os
listonly = 0
from fixnames_all import convertOne

def visitname(fname):
    global ccount, vcount
    print vcount+1, '=>', fname
    if not listonly:
        ccount = ccount + convertOne(fname)
    vcount = vcount + 1

def visitor(myData, directoryName, filesInDirectory):  # called for each dir 
    visitname(directoryName)                           # do dir we're in now,
    for fname in filesInDirectory:                     # and non-dir files here
        fpath = os.path.join(directoryName, fname)     # fnames have no dirpath
        if not os.path.isdir(fpath):
            visitname(fpath)
     
ccount = vcount = 0
os.path.walk('.', visitor, None)
print 'Converted %d files, visited %d' % (ccount, vcount)

This version does the same job, but visits one extra file (the topmost root directory), and may visit directories in a different order (os.listdir results are unordered). Both versions run in under a dozen seconds for the example directory tree on my computer.[6] We'll revisit this script, as well as the fixeoln line-end fixer, in the context of a general tree-walker class hierarchy later in this chapter.

[6] Very subtle thing: both versions of this script might fail on platforms where case matters, if they rename directoriesalong the way. If a directory is renamed before the contents of that directory have been visited (e.g., a directory SPAM renamed to spam), then later reference to the directory's contents using the old name (e.g., SPAM/filename) will no longer be valid on case-sensitive platforms. This can happen in the find.find version, because directories can and do show up in the result list before their contents. It's also a potential with the os.path.walk version, because the prior directory path (with original directory names) keeps being extended at each level of the tree. I only use this script on Windows (DOS), so I haven't been bitten by this in practice. Workarounds -- ordering find result lists, walking trees in a bottom-up fashion, making two distinct passes for files and directories, queuing up directory names on a list to be renamed later, or simply not renaming directories at all -- are all complex enough to be delegated to the realm of reader experiments. As a rule of thumb, changing a tree's names or structure while it is being walked is a risky venture.

    I l@ve RuBoard Previous Section Next Section