I l@ve RuBoard Previous Section Next Section

17.4 Binary Search Trees

Binary trees are a data structure that impose an order on inserted nodes: items less than a node are stored in its left subtree, and items greater than a node are inserted in the right. At the bottom, the subtrees are empty. Because of this structure, binary trees naturally support quick, recursive traversals -- at least ideally, every time you follow a link to a subtree, you divide the search space in half.[3]

[3] If you're looking for a more graphical image of binary trees, skip ahead to the PyTreeexamples at the end of this chapter, or simply run it on your own machine.

Binary trees are named for the implied branch-like structure of their subtree links. Typically, their nodes are implemented as a triple of values: (LeftSubtree, NodeValue, RightSubtree). Beyond that, there is fairly wide latitude in the tree implementation. Here we'll use a class-based approach:

  • BinaryTree is a header object, which initializes and manages the actual tree.

  • EmptyNode is the empty object, shared at all empty subtrees (at the bottom).

  • BinaryNode objects are nonempty tree nodes with a value and two subtrees.

Rather than coding distinct search functions, binary trees are constructed with "smart" objects (class instances) that know how to handle insert/lookup and printing requests and pass them to subtree objects. In fact, this is another example of the OOP composition relationship in action: tree nodes embed other tree nodes and pass search requests off to the embedded subtrees. A single empty class instance is shared by all empty subtrees in a binary tree, and inserts replace an EmptyNode with a BinaryNode at the bottom (see Example 17-13).

Example 17-13. PP2E\Dstruct\Classics\btree.py
class BinaryTree:
    def __init__(self):       self.tree = EmptyNode(  )
    def __repr__(self):       return `self.tree`
    def lookup(self, value):  return self.tree.lookup(value)
    def insert(self, value):  self.tree = self.tree.insert(value)

class EmptyNode:
    def __repr__(self):
        return '*'
    def lookup(self, value):                      # fail at the bottom
        return 0
    def insert(self, value):
        return BinaryNode(self, value, self)      # add new node at bottom

class BinaryNode:
    def __init__(self, left, value, right):
        self.data, self.left, self.right  =  value, left, right
    def lookup(self, value):
        if self.data == value:
            return 1
        elif self.data > value:
            return self.left.lookup(value)               # look in left
        else:
            return self.right.lookup(value)              # look in right
    def insert(self, value):
        if self.data > value:
            self.left = self.left.insert(value)          # grow in left
        elif self.data < value:
            self.right = self.right.insert(value)        # grow in right
        return self
    def __repr__(self):
        return '( %s, %s, %s )' % (`self.left`, `self.data`, `self.right`) 

As usual, BinaryTree can contain objects of any type that support the expected interface protocol -- here, > and < comparisons. This includes class instances with the __cmp__ method. Let's experiment with this module's interfaces. The following code stuffs five integers into a new tree, and then searches for values 0...9:

C:\...\PP2E\Dstruct\Classics>python
>>> from btree import BinaryTree 
>>> x = BinaryTree(  )
>>> for i in [3,1,9,2,7]:  x.insert(i)
... 
>>> for i in range(10):  print (i, x.lookup(i)),
... 
(0, 0) (1, 1) (2, 1) (3, 1) (4, 0) (5, 0) (6, 0) (7, 1) (8, 0) (9, 1)

To watch this tree grow, add a print statement after each insert. Tree nodes print themselves as triples, and empty nodes print as *. The result reflects tree nesting:

>>> y = BinaryTree(  )
>>> y
*
>>> for i in [3,1,9,2,7]:
...     y.insert(i); print y
... 
( *, 3, * )
( ( *, 1, * ), 3, * )
( ( *, 1, * ), 3, ( *, 9, * ) )
( ( *, 1, ( *, 2, * ) ), 3, ( *, 9, * ) )
( ( *, 1, ( *, 2, * ) ), 3, ( ( *, 7, * ), 9, * ) )

At the end of this chapter, we'll see another way to visualize trees in a GUI (which means you're invited to flip ahead now). Node values in this tree object can be any comparable Python object; for instances, here is a tree of strings:

>>> z = BinaryTree(  )
>>> for c in 'badce':  z.insert(c)
... 
>>> z
( ( *, 'a', * ), 'b', ( ( *, 'c', * ), 'd', ( *, 'e', * ) ) )
>>> z = BinaryTree(  )
>>> for c in 'abcde':  z.insert(c)
... 
>>> z
( *, 'a', ( *, 'b', ( *, 'c', ( *, 'd', ( *, 'e', * ) ) ) ) )

Notice the last result here: if items inserted into a binary tree are already ordered, then you wind up with a linear structure, and you lose the search-space partitioning magic of binary trees (the tree grows in right branches only). This is a worst-case scenario, and binary trees generally do a good job of dividing up values in practice. But if you are interested in pursuing this topic further, see a data structures text for tree-balancing techniques that automatically keep the tree as dense as possible.

Also note that to keep the code simple, these trees store a value only, and lookups return a 1 or (true or false). In practice, you sometimes may want to store both a key and an associated value (or even more) at each tree node. Example 17-14 shows what such a tree object looks like, for any prospective lumberjacks in the audience.

Example 17-14. PP2E\Dstruct\Classics\btree-keys.py
class KeyedBinaryTree:
    def __init__(self):          self.tree = EmptyNode(  )
    def __repr__(self):          return `self.tree`
    def lookup(self, key):       return self.tree.lookup(key)
    def insert(self, key, val):  self.tree = self.tree.insert(key, val)

class EmptyNode:
    def __repr__(self):
        return '*'
    def lookup(self, key):                               # fail at the bottom
        return None
    def insert(self, key, val):
        return BinaryNode(self, key, val, self)          # add node at bottom

class BinaryNode:
    def __init__(self, left, key, val, right):
        self.key,  self.val   = key, val
        self.left, self.right = left, right
    def lookup(self, key):
        if self.key == key:
            return self.val
        elif self.key > key:
            return self.left.lookup(key)                 # look in left
        else:
            return self.right.lookup(key)                # look in right
    def insert(self, key, val):
        if self.key == key:
            self.val = val
        elif self.key > key:
            self.left = self.left.insert(key, val)       # grow in left
        elif self.key < key:
            self.right = self.right.insert(key, val)     # grow in right
        return self
    def __repr__(self):
        return ('( %s, %s=%s, %s )' % 
                 (`self.left`, `self.key`, `self.val`, `self.right`)) 

if __name__ == '__main__':
    t = KeyedBinaryTree(  )
    for (key, val) in [('bbb', 1), ('aaa', 2), ('ccc', 3)]:
        t.insert(key, val)
    print t
    print t.lookup('aaa'), t.lookup('ccc')
    t.insert('ddd', 4)                       
    t.insert('aaa', 5)                       # changes key's value
    print t

And here is this script's self-test code at work; nodes simply have more content this time around:

C:\...\PP2E\Dstruct\Classics>python btree-keys.py
( ( *, 'aaa'=2, * ), 'bbb'=1, ( *, 'ccc'=3, * ) )
2 3
( ( *, 'aaa'=5, * ), 'bbb'=1, ( *, 'ccc'=3, ( *, 'ddd'=4, * ) ) )
    I l@ve RuBoard Previous Section Next Section