I l@ve RuBoard Previous Section Next Section

18.5 Parser Generators

If you have any background in parsing theory, you may know that neither regular expressions nor string splitting is powerful enough to handle more complex language grammars (roughly, they don't have the "memory" required by true grammars). For more sophisticated language analysis tasks, we sometimes need a full-blown parser. Since Python is built for integrating C tools, we can write integrations to traditional parser generator systems such as yacc and bison. Better yet, we could use an integration that already exists.

There are also Python-specific parsing systems accessible from Python's web site. Among them, the kwParsing system, developed by Aaron Watters, is a parser generator written in Python, and the SPARK toolkit, developed by John Aycock, is a lightweight system that employs the Earley algorithm to work around technical problems with LALR parser generation (if you don't know what that means, you probably don't need to care). Since these are all complex tools, though, we'll skip their details in this text. Consult http://www.python.org for information on parser generator tools available for use in Python programs.

Lesson 2: Don't Reinvent the Wheel

Speaking of parser generators: to use some of these tools in Python programs, you'll need an extension module that integrates them. The first step in such scenarios should always be to see if the extension already exists in the public domain. Especially for common tools like these, chances are that someone else has already written an integration that you can use off-the-shelf instead of writing one from scratch.

Of course, not everyone can donate all their extension modules to the public domain, but there's a growing library of available components that you can pick up for free and a community of experts to query. Visit http://www.python.org for links to Python software resources. With some half a million Python users out there as I write this book, there is much that can be found in the prior-art department.

Of special interest to this chapter, also see YAPPS -- Yet Another Python Parser System. YAPPS is a parser generator written in Python. It uses supplied rules to generate human-readable Python code that implements a recursive descent parser. The parsers generated by YAPPS look much like (and are inspired by) the hand-coded expression parsers shown in the next section. YAPPS creates LL(1) parsers, which are not as powerful as LALR parsers, but sufficient for many language tasks. For more on YAPPS, see http://theory.stanford.edu/~amitp/Yapps.

    I l@ve RuBoard Previous Section Next Section