Monday, December 2, 2013

Deletion and Joining

The most complicated (but still not very complicated) operation we perform on splay trees is deletion, which assumes the item to delete is present. To perform deletion, we first find the node to delete and splay it. We then remove the root, leaving us with a left and right subtree, and finally join theses two subtrees. Joining the two subtrees is actually nontrivial, so we will look at how it works now.

The operation of joining consists of merging two trees t1 and t2, assuming that all items of t1 <= items in t2. Joining consists of three steps; first, we splay the largest item of t1. This will leave t1 with no right child, meaning that we can now simply attach t2 as the right child of t1. The fact that we use this join operation after we have deleted the root gives us an advantage; we know that the element we just deleted is guaranteed to be >= every element of t1, meaning that performing a splay search for the deleted node in t1 will result in the largest element being splayed.

Sunday, December 1, 2013

Amortized Complexity

Amortized complexity considers a sequence of operations, rather than just a single operation. It was introduced by Robert Tarjan (the creator of splay trees), most likely to support splay trees. One of the primary reasons that amortized analysis is need is that the worst case is too pessimistic. For example, the worst case complexity of searching in a splay tree is O(n), but this case is very rare. The second major resaon is that it is too difficult to define an average case. For example, what is the average case when a search could take anywhere from O(n) to O(1) time?

Okay, so know that we understand the reasons amortized analysis is needed, I'm present quick, informal example of amortized analysis. I won't be using splay trees, as the proof is fairly involved, but I will instead analyze the cost of pushing an element onto a stack represented as an array.

I'm pretty sure that at some point, all of us have done this. You internally store the stack as an array, keeping track of the current index into the array. When you push, you store the value into the array at the current index and increment the stored index. At any time, we will use n to represent the current size of the internal array. Now the complicated part comes when you try to push onto a stack that is full. Since we are using fixed-size arrays, you need to allocate a new array with size 2n, and copy over the old n elements. This has a cost of O(n).

Now, if we consider a sequence of m push operations, we will see that there is no definite average case, as sometimes a push will be O(1), and other times be O(n+1). This means that we will have to use amortized analysis.

We are going to assume that we initially have n=1. Since we double n each time that we resize, the total number of resizes for m operations will be lg(m-1). The ith resize has a cost of 2^i. Combining all of this together gives us the following result:


So, that gives us the cost of resizing, but we also have the m constant time assignment operations to store the pushed values into our array. Adding together the resize cost and this assignment cost gives us a final total cost which is O(m).


Now, for the amortized part. The above calculations showed us that the cost of m operations is O(m), which allows us to conclude that the amortized cost of a single operation is O(1) or constant. Hopefully, it is easy to see how we get this result, we just divide O(m) for m operations by m to get O(1) for 1 operation.

Use Cases

Splay trees are best used in non-real time applications, mostly because of that possibility of an operation being O(n). They are good in low memory situations, as they have no overhead when compared to a plain binary search tree. They have a simple implementation, making them good for when all that is desired is an improvement over binary search trees. Finally, they are good to use when support for duplicates is required.

Saturday, November 30, 2013

Insertion and Search

Insertion probably has the simplest implementation. We start by doing a standard binary tree insert, then splay the newly inserted node.

Searching is slightly less simple, as we modify the tree regardless of whether the item is present in the tree or not. We start with a standard binary search, after which we perform one of two possible operations. In the case that we found the node we were looking for, we splay it and return it. Otherwise, we splay the last node that was visited while searching. This is useful later during deletion.

Splaying

Splaying is the process that makes splay trees unique from a standard binary tree:

Splaying is a recursive function which takes node x and rotates it to the root of the tree.

Splaying performs three rotation types; Zig rotations, Zig-Zag Two rotations, and Zig-Zag rotations. Zig rotations are single rotations that occur when x is the left child of its parent, which is the root node. Zig-Zig rotations is a double zig rotation that occurs when node x is the left child of its parent, which is the left child of x's grandparent. Lastly, Zig-Zag rotations are double rotations that occur when node x is the right child of its parent, which is the left child of x's grandparent.

Disadvantages

Here's some of the disadvantages of splay trees.

A big one is that there is a possiblity of a tree having linear
height, which is something that other trees always
prevent. Another disadvantage is that the we normally think of a
search operation as nondestructive, that is, it doesn’t modify
the tree, but in the case of splay trees, it does. The most
obvious situation where this can become an issue is in a
multithreading environment. Finally, splay trees have the
disadvantage that individual operations can be expensive.  For
example, it is possible for a search to take time O(n) the
first time it runs, then O(1) the next time.

Advantages

Here's some of the advantages of splay trees.

First of all, splay trees are much simpler to implement than
other trees such as AVL or red-black trees.  They are basically
plain old binary search trees with one extra operation,
splaying. In the average case, they have performance comparable
to other balanced trees, and they also have a small memory
footprint as they use no more memory than a plain binary search
tree.  For comparison, every node of an AVL tree has an
additional height field, and every node of a red-black tree has
an additional color field. Finally, splay trees have no issue
with identical keys, wheres AVL trees, for example, do not
support identical keys.

Wednesday, November 27, 2013

Main Source

We should probably use the original paper for most of our information. There's a lot of other wrong information on other sites. Here's the link:
http://www.cs.princeton.edu/courses/archive/fall07/cos521/handouts/self-adjusting.pdf