Splay Tree Presentation: December 2013

Monday, December 2, 2013

Deletion and Joining

The most complicated (but still not very complicated) operation we perform on splay trees is deletion, which assumes the item to delete is present. To perform deletion, we first find the node to delete and splay it. We then remove the root, leaving us with a left and right subtree, and finally join theses two subtrees. Joining the two subtrees is actually nontrivial, so we will look at how it works now.

The operation of joining consists of merging two trees t1 and t2, assuming that all items of t1 <= items in t2. Joining consists of three steps; first, we splay the largest item of t1. This will leave t1 with no right child, meaning that we can now simply attach t2 as the right child of t1. The fact that we use this join operation after we have deleted the root gives us an advantage; we know that the element we just deleted is guaranteed to be >= every element of t1, meaning that performing a splay search for the deleted node in t1 will result in the largest element being splayed.

Sunday, December 1, 2013

Amortized Complexity

Amortized complexity considers a sequence of operations, rather than just a single operation. It was introduced by Robert Tarjan (the creator of splay trees), most likely to support splay trees. One of the primary reasons that amortized analysis is need is that the worst case is too pessimistic. For example, the worst case complexity of searching in a splay tree is O(n), but this case is very rare. The second major resaon is that it is too difficult to define an average case. For example, what is the average case when a search could take anywhere from O(n) to O(1) time?

Okay, so know that we understand the reasons amortized analysis is needed, I'm present quick, informal example of amortized analysis. I won't be using splay trees, as the proof is fairly involved, but I will instead analyze the cost of pushing an element onto a stack represented as an array.

I'm pretty sure that at some point, all of us have done this. You internally store the stack as an array, keeping track of the current index into the array. When you push, you store the value into the array at the current index and increment the stored index. At any time, we will use n to represent the current size of the internal array. Now the complicated part comes when you try to push onto a stack that is full. Since we are using fixed-size arrays, you need to allocate a new array with size 2n, and copy over the old n elements. This has a cost of O(n).

Now, if we consider a sequence of m push operations, we will see that there is no definite average case, as sometimes a push will be O(1), and other times be O(n+1). This means that we will have to use amortized analysis.

We are going to assume that we initially have n=1. Since we double n each time that we resize, the total number of resizes for m operations will be lg(m-1). The ith resize has a cost of 2^i. Combining all of this together gives us the following result:

So, that gives us the cost of resizing, but we also have the m constant time assignment operations to store the pushed values into our array. Adding together the resize cost and this assignment cost gives us a final total cost which is O(m).

Now, for the amortized part. The above calculations showed us that the cost of m operations is O(m), which allows us to conclude that the amortized cost of a single operation is O(1) or constant. Hopefully, it is easy to see how we get this result, we just divide O(m) for m operations by m to get O(1) for 1 operation.

Use Cases

Splay trees are best used in non-real time applications, mostly because of that possibility of an operation being O(n). They are good in low memory situations, as they have no overhead when compared to a plain binary search tree. They have a simple implementation, making them good for when all that is desired is an improvement over binary search trees. Finally, they are good to use when support for duplicates is required.