A red-black tree is a binary search tree in which all nodes are colored either red or black. Additional conditions impose restrictions on the tree to ensure reasonable balance.
Red-black trees satisfy two conditions, a bookkeeping condition and a balance condition. The two conditions have a consequence that bounds the height of a tree in terms of the number of nodes that it contains.
All paths through the tree contain the same number of black nodes.
This amounts to requiring that the tree be perfectly balanced, but red nodes are ignored. This condition just specifies how we do bookkeeping to check for balance.
By itself, the bookkeeping condition has no teeth. You can satisfy it in any binary search tree by making all of the nodes red.
No red nodes have red parents.
This condition adds teeth to the bookeeping condition. It ensures that the number of nodes, red or black, along two paths through the tree can differ by at most a factor of approximately 2.
It can be shown that the height of a red-black tree with n nodes is O(log(n)). In fact, the big-oh constant is essentially 2. This implies that searches in a red-black tree have O(log(n)) worst-case run time.
Section overview text.
Section overview text.
Inserts in a red-black tree add a new node with color red to avoid violating the bookkeeping condition. The code for rebalancing after adding the node is just a simple loop with the following invariants:
The loop code attempts to either eliminate the double red or to move it higher up in the tree. Simple cases deal with situations where there is no double red or it can be fixed immediately without using rotations. Other cases use rotations or move the double red closer to the root.
If the new red is the root then it has no parent so there is no violation of the balance condition. Nothing more needs to be done.
If the parent of the new red node is not red then there is no violation of the balance condition. Nothing more needs to be done.
If the parent of the new red is the root and it is red then it can be made black. This changes the number of blacks on all paths through the tree by the same amount so it does not result in a violation of the bookkeeping condition. It also eliminates the violation of the balance condition. Nothing more needs to be done.
The other insert case are described by one of the two following diagrams or their mirror images. The new red in the left diagram, the straight chain case, is node 1. The new red in the right diagram, the crooked chain case, is node 2. We are assuming that easy case conditions did not hold so that both the new red and its parent are red. In either diagram the uncle is node 5.
When the uncle is red both diagrams can be handled with the same code. Otherwise, different code is needed for the two diagrams.
If the uncle is red then conditions for doing a promote at the grandparent are satisfied: the grandparent is black and both of its children are red. The promote eliminates the balance condition violation between the current new red and its parent but it turns the grandparent red. This may result in a new violation. This can be handled by continuing the insert rebalancing loop with the grandparent as the new red.
If the uncle is not red, including the possibility that it is null, then a promote cannot be done. However, if the new red, its parent, and its grandparent form a straight chain then a rotation from the parent separates the two reds. This fixes the balance condition violation so the insert rebalancing loop can be terminated.
If the new red, its parent, and its grandparent form a crooked chain then a rotation from the parent does not separate the two reds because the new red becomes the child of the grandparent, which turns red.
However, the chain can be straightened out first by doing a rotation from the new red. If the loop is continued after this rotation the the next iteration will find a straight chain to the grandparent and it will fix the balance condition violation.
Section overview text.
After a node is removed the only possible violation of the red-black tree conditions is a violation of the bookkeeping condition. More precisely, a path through the link to the removed node may have one fewer black nodes than other paths.
There are two quick fixes that deal with cases that do not require a loop to restore the bookkeeping condition. Otherwise a rebalancing loop is required.
There are two delete cases that can be handled without a loop:
Then there is nothing to be done. Removal of the node did not violate either of the red-black tree conditions.
This happens when the removed node has exactly one child. This child must be red. When the removed node was removed the red child took its place. There is a violation of the bookkeeping condition since the removed node must be black. The violation can be fixed by changing the color of the red child to black.
If neither of the quick fixes applies then paths through the removal link are short by one black node compared to the pre-delete situation. We refer to this kind of link as a pale link. The delete rebalancing loop maintains the following invariants:
One of the loop cases, when the pale link is at the root, is trivial to handle. Otherwise the cases can be classified by colors occurring in the following diagram or its mirror image:
The cases are described using the following terminology.
Links are named according to their end nodes. For example, the far niece link is the link from the sibling node to the far niece node.
The order of the cases in the menu to the left is important. Cases are named by a single condition but other conditions may be required for the case. These additional conditions are implied by failure to satisfy conditions for earlier cases.
If the pale link is the root link then all paths through the tree are short by one black node compared to the pre-delete situation. But this means that all paths through the tree have the same number of black nodes so the red-black tree conditions have been reestablished. Nothing more needs to be done. A check for the negation of this condition can be used as a loop continuation condition.
Conditions assumed from failure of earlier case conditions: the pale link is not the root link so the pale link has a sibling.
If the sibling is red then we can do a rotation at the sibling link. This does not fix the pale link but it makes its parent red so one of the other cases will apply on the next iteration of the loop.
Conditions assumed from failure of earlier case conditions: the pale link is not the root link so the pale link has a sibling. Then the niece links exist though their end nodes may be null.
If the far niece is red then we do three things:
Conditions assumed from failure of earlier case conditions: the pale link is not the root link so the pale link has a sibling. Then the niece links exist though their end nodes may be null. Also the far niece is not red.
If the near niece is red then the near niece link should be rotated. This does not fix the pale link but it makes the far niece red so the previous case will apply on the next iteration of the loop.
Conditions assumed from failure of earlier case conditions: neither niece is red.
If the parent is red then the colors across the sibling link should be swapped. This does not change the number of black node on paths through the sibling link but it adds a black node to paths through the pale link. That fixes the pale link so no more needs to be done.
If none of the preceeding case conditions are true then the parent, sibling, and niece nodes (if they exist) are all black. Then the sibling should be made red. This does not fix the pale link. Instead it creates a situation where all paths through the parent link are short by one black node and all paths that are short by a black node go through the parent link. Thus we should set the pale link to the parent of the original pale link and continue the loop.
Section overview text.
Although rebalancing code for red-black trees seems complex, it does not add much time to handling inserts and deletes. It can be shown that the amortized rebalancing time is constant. This means that the worst case average rebalancing time for a sequence of inserts and deletes is constant.
In return for this you get a worst case O(log n) bound on the tree height as opposed to O(n) for an unbalanced tree. The balancing also improves the average tree height by about 25%. For trees with ~100 nodes or more this reduces the search time significantly. Since a search is involved in both inserts and deletes, the rebalancing actually reduces the overall average insert and delete time for moderate to large size trees.
Section overview text.
Section overview text.
Section overview text.
Section overview text.