How Java sorts Objects
2014-04-28 18:00
How Java sorts Objects
It started with an experiment on sorting…
The reason why I had this question in my head was because I was wondering how sorting would look like for Scissors, Paper, Stone objects, since they have a cyclic ordering. And in that little experiment I used Java’s Collections.sort
method, which is really useful if the class you wish to sort implements Comparable
, and that just requires you to define a single method compareTo
. So I went ahead to investigate how Java implements this method. My initial guess was that for small collections, insertion sort would be used, because it’s actually more efficient. But for larger collections, mergesort/quicksort would be used. Eventually I found out I was wrong, here’s why.
Grep that method
GrepCode is a wonderful resource. It lets you search the code bases of many open source projects including Java and Eclipse, and there’s so much to learn from robust code that is being used by so many people.
A simple search for Collections.sort
took me here.
public static <T extends Comparable<? super T>> void [More ...] sort(List<T> list) {
Object[] a = list.toArray();
Arrays.sort(a);
ListIterator<T> i = list.listIterator();
for (int j=0; j<a.length; j++) {
i.next();
i.set((T)a[j]);
}
}
So actually Collections.sort
converts the collection of objects into an Array
internally, and uses Arrays.sort
. Sounds like a simple way to reduce duplication of code!
Digging deeper into Arrays.sort
So I looked into Array.sort
and see that Arrays.sort
uses either legacyMergeSort
or something called ComparableTimSort
.
public static void [More ...] sort(Object[] a) {
if (LegacyMergeSort.userRequested)
legacyMergeSort(a);
else
ComparableTimSort.sort(a);
}
Well I know mergesort, and it seems like you need to set some system parameter to use it, so my guess is that it isn’t used most of the time. How about the other one, ComparableTimSort
?
Tim’s sort
I’ve never heard of such a sort before, so I continued investigating and reached the method definition, which said this was just like TimSort.
The comments for the TimSort
class actually describes the history of TimSort, which was first used by Python!
// A stable, adaptive, iterative mergesort that requires far fewer than n lg(n)
// comparisons when running on partially sorted arrays, while offering
// performance comparable to a traditional mergesort when run on random arrays.
// Like all proper mergesorts, this sort is stable and runs O(n log n) time (worst case).
I haven’t completely figured out how it works, but it has O(n)
best case performance, and performs very well on real world data. Here’s a description of the algorithm by its author Tim Peters, and a cool visualization of the process. Oh by the way that previous website, sortvis has visualizations of other sorting algorithms as well, check it out!
So…
I don’t really understand Tim’s sort as for now, and I think this won’t be the last time I’ll see it, but at least know I know how Java sorts stuff :)