Binary search tree

From Free net encyclopedia

Image:Binary search tree.svg In computer science, a binary search tree (BST) is a binary tree which has the following properties:

  • Each node has a value.
  • The left subtree of a node contains only values less than or equal to the node's value.
  • The right subtree of a node contains only values greater than or equal to the node's value.

The major advantage of binary search trees is that the related Sort algorithms and search algorithms can be very efficient, like in-order traversal.

Binary search trees are a fundamental data structure used to construct more abstract data structures such as sets, multisets, and associative arrays.

We may or may not choose to allow duplicate values in a BST; if we do, it represents a multiset, and inequalities for the left and right subtrees above are non-strict (they have or equal to). If we do not, the inequalities can be taken as strict, and insertion operations must be modified to fail if the value being inserted is already present; in this case the BST represents a set with unique values, like the mathematical set. Yet other definitions use a non-strict inequality on only one side, which allows duplicate values but limits how well a tree with many duplicate values can be balanced.

Contents

Operations

Searching

Searching a binary tree for a specific value is a recursive process that we can perform due to the ordering it imposes. We begin by examining the root. If the value equals the root, the value exists in the tree. If it is less than the root, then it must be in the left subtree, so we recursively search the left subtree in the same manner. Similarly, if it is greater than the root, then it must be in the right subtree, so we recursively search the right subtree in the same manner. If we reach an external node, then the item is not where it would be if it were present, so it does not lie in the tree at all. A comparison may be made with binary search, which operates in nearly the same way but using random access on an array instead of following links.

Here is the search algorithm in the Python programming language:

def search_binary_tree(node, key):
    if node is None:
        return None  # not found
    if key < node.key:
        return search_binary_tree(node.left, key)
    elif key > node.key:
        return search_binary_tree(node.right, key)
    else:
        return node.value

This operation requires O(log n) time in the average case, but needs O(n) time in the worst-case, when the unbalanced tree resembles a linked list.

Insertion

Insertion begins with a search; we search for the value, but if we do not find it, we search the left or right subtrees as before. Eventually, we will reach an external node, and we add the value at that position. In other words, we examine the root and recursively insert the new node to the left subtree if the new value is less than or equal the root, or the right subtree if the new value is greater than the root.

Here's how a typical binary search tree insertion might be performed in C:

void InsertNode(struct node **node_ptr, struct node *newNode) {
    struct node *node = *node_ptr;
    if (node == NULL)
        *node_ptr = newNode;
    else if (newNode->value <= node->value)
        InsertNode(&node->left, newNode);
    else
        InsertNode(&node->right, newNode);
}

The above "destructive" procedural variant modifies the tree in place. It uses only constant space, but the previous version of the tree is lost. Alternatively, as in the following Python example, we can reconstruct all ancestors of the inserted node; any reference to the original tree root remains valid, making the tree a persistent data structure:

def binary_tree_insert(node, key, value):
    if node is None:
        return TreeNode(None, key, value, None)

    if key == node.key:
        return TreeNode(node.left, key, value, None)
    if key < node.key:
        return TreeNode(binary_tree_insert(node.left, key, value), node.key, node.value, node.right)
    else:
        return TreeNode(node.left, node.key, node.value, binary_tree_insert(node.right, key, value))

The part that is rebuilt uses Θ(log n) space in the average case and Ω(n) in the worst case (see big-O notation).

In either version, this operation requires time proportional to the height of the tree in the worst case, which is O(log n) time in the average case over all trees, but Ω(n) time in the worst case.

Another way to explain insertion is that in order to insert a new node in the tree, its value is first compared with the value of the root. If its value is less than the root's, it is then compared with the value of the root's left child. If its value is greater, it is compared with the root's right child. This process continues, until the new node is compared with a leaf node, and then it is added as this node's right or left child, depending on its value.

Deletion

There are several cases to be considered:

  • Deleting a leaf: Deleting a node with no children is easy, as we can simply remove it from the tree.
  • Deleting a node with one child: Delete it and replace it with its child.
  • Deleting a node with two children: Suppose the node to be deleted is called N. We replace the value of N with either its in-order successor (the left-most child of the right subtree) or the in-order predecessor (the right-most child of the left subtree).

Image:Binary search tree delete.svg

Once we find either the in-order successor or predecessor, swap it with N, and then delete it. Since either of these nodes must have less than two children (otherwise it cannot be the in-order successor or predecessor), it can be deleted using the previous two cases. In a good implementation, it is generally recommended to avoid consistently using one of these nodes, because this can unbalance the tree.

Here is C++ sample code for a destructive version of deletion (we assume the node to be deleted has already been located using search):

void DeleteNode(struct node*& node) {
    if (node->left == NULL) {
        delete node;
        node = node->right;
    } else if (node->right == NULL) {
        delete node;
        node = node->left;
    } else {
        // Node has two children - get max of left subtree
        struct node*& temp = node->left;
        while (temp->right != NULL) {
            temp = temp->right;
        }
        node->value = temp->value;
        DeleteNode(temp);
    }
}

Although this operation does not always traverse the tree down to a leaf, this is always a possibility; thus in the worst case, it requires time proportional to the height of the tree. It does not require more even when the node has two children, since it still follows a single path and visits no node twice.

Traversal

Once the binary search tree has been created, its elements can be retrieved in order by recursively traversing the left subtree, visiting the root, then recursively traversing the right subtree. The tree may also be traversed in pre order or post order traversals.

def traverse_binary_tree(treenode):
    if treenode is None: return []
    left, nodevalue, right = treenode
    traverse_binary_tree(left)
    visit(nodevalue)
    traverse_binary_tree(right)

Traversal requires Ω(n) time, since it must visit every node. This algorithm is also O(n), and so asymptotically optimal.

Sort

A binary search tree can be used to implement a simple but inefficient sort algorithm. Similar to insertion sort, we insert all the values we wish to sort into a new ordered data structure, in this case a binary search tree, then traverse it in order, building our result:

def build_binary_tree(values):
    tree = None
    for v in values:
        tree = binary_tree_insert(tree, v)
    return tree

def traverse_binary_tree(treenode):
    if treenode is None: return []
    else:
        left, value, right = treenode
        return (traverse_binary_tree(left) + [value] + traverse_binary_tree(right))         

The worst-case time of build_binary_tree is Ω(n2) — if you feed it a sorted list of values, it chains them into a linked list with no left subtrees. For example, build_binary_tree([1, 2, 3, 4, 5]) yields the tree (None, 1, (None, 2, (None, 3, (None, 4, (None, 5, None))))).

There are a variety of schemes for overcoming this flaw with simple binary trees; the most common is the self-balancing binary search tree. If this same procedure is done using such a tree, the overall worst-case time is O(nlog n), which is asymptotically optimal for a comparison sort. In practice, the poor cache performance and added overhead in time and space for a tree-based sort (particularly for node allocation) makes it inferior to other asymptotically optimal sorts such as quicksort and heapsort for static list sorting. On the other hand, it is one of the most efficient methods of incremental sorting, adding items to a list over time while keeping the list sorted at all times.

Types of binary search trees

There are many types of binary search trees. AVL trees and red-black trees are both forms of self-balancing binary search trees. A splay tree is a binary search tree that automatically moves frequently accessed elements nearer to the root. In a treap ("tree heap"), each node also holds a priority and the parent node has higher priority than its children.

Optimal binary search trees

If we don't plan on modifying a search tree, and we know exactly how often each item will be accessed, we can construct an optimal binary search tree, which is a search tree where the average cost of looking up an item (the expected search cost) is minimized.

Assume that we know the elements and that for each element, we know the proportion of future lookups which will be looking for that element. We can then use a dynamic programming solution, detailed in section 15.5 of Introduction to Algorithms, to construct the tree with the least possible expected search cost.

Even if we only have estimates of the search costs, such a system can considerably speed up lookups on average. For example, if you have a BST of English words used in a spell checker, you might balance the tree based on word frequency in text corpuses, placing words like "the" near the root and words like "agerasia" near the leaves. Such a tree might be compared with Huffman trees, which similarly seek to place frequently-used items near the root in order to produce a dense information encoding; however, Huffman trees only store data elements in leaves.

See also

External links

References

de:Binärer Suchbaum es:Árbol binario de búsqueda fr:Arbre binaire de recherche it:Albero binario di ricerca he:עץ חיפוש nl:Zoekboom ja:2分探索木 pl:Drzewo poszukiwań binarnych pt:Árvore de busca binária ru:Двоичное дерево поиска uk:Бінарне дерево пошуку zh:二元搜尋樹