The title is in reference to Why is it faster to process a sorted array than an unsorted array?
Is this a branch prediction effect, too? Beware: here the processing for the sorted array is slower!!
Consider the following code:
private static final int LIST_LENGTH = 1000 * 1000;
private static final long SLOW_ITERATION_MILLIS = 1000L * 10L;
@Test
public void testBinarySearch() {
Random r = new Random(0);
List<Double> list = new ArrayList<>(LIST_LENGTH);
for (int i = 0; i < LIST_LENGTH; i++) {
list.add(r.nextDouble());
}
//Collections.sort(list);
// remove possible artifacts due to the sorting call
// and rebuild the list from scratch:
list = new ArrayList<>(list);
int nIterations = 0;
long startTime = System.currentTimeMillis();
do {
int index = r.nextInt(LIST_LENGTH);
assertEquals(index, list.indexOf(list.get(index)));
nIterations++;
} while (System.currentTimeMillis() < startTime + SLOW_ITERATION_MILLIS);
long duration = System.currentTimeMillis() - startTime;
double slowFindsPerSec = (double) nIterations / duration * 1000;
System.out.println(slowFindsPerSec);
...
}
This prints out a value of around 720 on my machine.
Now if I activate the collections sort call, that value drops down to 142. Why?!?
The results are conclusive, they don't change if I increase the number of iterations/time.
Java version is 1.8.0_71 (Oracle VM, 64 bit), running under Windows 10, JUnit test in Eclipse Mars.
UPDATE
Seems to be related to contiguous memory access (Double objects accessed in sequential order vs in random order). The effect starts vanish for me for array lengths of around 10k and less.
Thanks to assylias for providing the results:
/**
* Benchmark Mode Cnt Score Error Units
* SO35018999.shuffled avgt 10 8.895 ± 1.534 ms/op
* SO35018999.sorted avgt 10 8.093 ± 3.093 ms/op
* SO35018999.sorted_contiguous avgt 10 1.665 ± 0.397 ms/op
* SO35018999.unsorted avgt 10 2.700 ± 0.302 ms/op
*/
System.currentTimeMillis
and assertEquals
. There are no warmup iterations, there are no iterations in general, you rely on a constant amount of time and check how much was done in that time. Sorry, but this test is effectively useless.
It looks like caching / prefetching effect.
The clue is that you compare Doubles (objects), not doubles (primitives). When you allocate objects in one thread, they are typically allocated sequentially in memory. So when indexOf
scans a list, it goes through sequential memory addresses. This is good for CPU cache prefetching heuristics.
But after you sort the list, you still have to do the same number of memory lookups in average, but this time memory access will be in random order.
UPDATE
Here is the benchmark to prove that the order of allocated objects matters.
Benchmark (generator) (length) (postprocess) Mode Cnt Score Error Units
ListIndexOf.indexOf random 1000000 none avgt 10 1,243 ± 0,031 ms/op
ListIndexOf.indexOf random 1000000 sort avgt 10 6,496 ± 0,456 ms/op
ListIndexOf.indexOf random 1000000 shuffle avgt 10 6,485 ± 0,412 ms/op
ListIndexOf.indexOf sequential 1000000 none avgt 10 1,249 ± 0,053 ms/op
ListIndexOf.indexOf sequential 1000000 sort avgt 10 1,247 ± 0,037 ms/op
ListIndexOf.indexOf sequential 1000000 shuffle avgt 10 6,579 ± 0,448 ms/op
I think we are seeing the effect of memory cache misses:
When you create the unsorted list
for (int i = 0; i < LIST_LENGTH; i++) {
list.add(r.nextDouble());
}
all the double are most likely allocated in a contiguous memory area. Iterating through this will produce few cache misses.
On the other hand in the sorted list the references point to memory in a chaotic manner.
Now if you create a sorted list with contiguous memory:
Collection.sort(list);
List<Double> list2 = new ArrayList<>();
for (int i = 0; i < LIST_LENGTH; i++) {
list2.add(new Double(list.get(i).doubleValue()));
}
this sorted list has the same performance than the original one (my timing).
As a simple example that confirms the answer by wero and the answer by apangin (+1!): The following does a simple comparison of both options:
Creating random numbers, and sorting them optionally
Creating sequential numbers, and shuffling them optionally
It is also not implemented as a JMH benchmark, but similar to the original code, with only slight modifications to observe the effect:
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Random;
public class SortedListTest
{
private static final long SLOW_ITERATION_MILLIS = 1000L * 3L;
public static void main(String[] args)
{
int size = 100000;
testBinarySearchOriginal(size, true);
testBinarySearchOriginal(size, false);
testBinarySearchShuffled(size, true);
testBinarySearchShuffled(size, false);
}
public static void testBinarySearchOriginal(int size, boolean sort)
{
Random r = new Random(0);
List<Double> list = new ArrayList<>(size);
for (int i = 0; i < size; i++)
{
list.add(r.nextDouble());
}
if (sort)
{
Collections.sort(list);
}
list = new ArrayList<>(list);
int count = 0;
int nIterations = 0;
long startTime = System.currentTimeMillis();
do
{
int index = r.nextInt(size);
if (index == list.indexOf(list.get(index)))
{
count++;
}
nIterations++;
}
while (System.currentTimeMillis() < startTime + SLOW_ITERATION_MILLIS);
long duration = System.currentTimeMillis() - startTime;
double slowFindsPerSec = (double) nIterations / duration * 1000;
System.out.printf("Size %8d sort %5s iterations %10.3f count %10d\n",
size, sort, slowFindsPerSec, count);
}
public static void testBinarySearchShuffled(int size, boolean sort)
{
Random r = new Random(0);
List<Double> list = new ArrayList<>(size);
for (int i = 0; i < size; i++)
{
list.add((double) i / size);
}
if (!sort)
{
Collections.shuffle(list);
}
list = new ArrayList<>(list);
int count = 0;
int nIterations = 0;
long startTime = System.currentTimeMillis();
do
{
int index = r.nextInt(size);
if (index == list.indexOf(list.get(index)))
{
count++;
}
nIterations++;
}
while (System.currentTimeMillis() < startTime + SLOW_ITERATION_MILLIS);
long duration = System.currentTimeMillis() - startTime;
double slowFindsPerSec = (double) nIterations / duration * 1000;
System.out.printf("Size %8d sort %5s iterations %10.3f count %10d\n",
size, sort, slowFindsPerSec, count);
}
}
The output on my machine is
Size 100000 sort true iterations 8560,333 count 25681
Size 100000 sort false iterations 19358,667 count 58076
Size 100000 sort true iterations 18554,000 count 55662
Size 100000 sort false iterations 8845,333 count 26536
nicely showing that the timings are exactly the opposites of another: If random numbers are sorted, then the sorted version is slower. If sequential numbers are shuffled, then the shuffled version is slower.
Success story sharing
list.indexOf(list.get(index))
thelist.get(index)
doesn't benefit in any way from prefetching sinceindex
is random. The price oflist.get(index)
is the same regardless of weather the list was sorted or not. Prefetching kicks in only forlist.indexOf()