ChatGPT解决这个技术问题 Extra ChatGPT

.toArray(new MyClass[0]) or .toArray(new MyClass[myList.size()])?

Assuming I have an ArrayList

ArrayList<MyClass> myList;

And I want to call toArray, is there a performance reason to use

MyClass[] arr = myList.toArray(new MyClass[myList.size()]);

over

MyClass[] arr = myList.toArray(new MyClass[0]);

?

I prefer the second style, since it's less verbose, and I assumed that the compiler will make sure the empty array doesn't really get created, but I've been wondering if that's true.

Of course, in 99% of the cases it doesn't make a difference one way or the other, but I'd like to keep a consistent style between my normal code and my optimized inner loops...

Looks like the question has now been settled in a new blog post by Aleksey Shipilёv, Arrays of Wisdom of the Ancients!
From the blog post: "Bottom line: toArray(new T[0]) seems faster, safer, and contractually cleaner, and therefore should be the default choice now."

R
Raedwald

Counterintuitively, the fastest version, on Hotspot 8, is:

MyClass[] arr = myList.toArray(new MyClass[0]);

I have run a micro benchmark using jmh the results and code are below, showing that the version with an empty array consistently outperforms the version with a presized array. Note that if you can reuse an existing array of the correct size, the result may be different.

Benchmark results (score in microseconds, smaller = better):

Benchmark                      (n)  Mode  Samples    Score   Error  Units
c.a.p.SO29378922.preSize         1  avgt       30    0.025 ▒ 0.001  us/op
c.a.p.SO29378922.preSize       100  avgt       30    0.155 ▒ 0.004  us/op
c.a.p.SO29378922.preSize      1000  avgt       30    1.512 ▒ 0.031  us/op
c.a.p.SO29378922.preSize      5000  avgt       30    6.884 ▒ 0.130  us/op
c.a.p.SO29378922.preSize     10000  avgt       30   13.147 ▒ 0.199  us/op
c.a.p.SO29378922.preSize    100000  avgt       30  159.977 ▒ 5.292  us/op
c.a.p.SO29378922.resize          1  avgt       30    0.019 ▒ 0.000  us/op
c.a.p.SO29378922.resize        100  avgt       30    0.133 ▒ 0.003  us/op
c.a.p.SO29378922.resize       1000  avgt       30    1.075 ▒ 0.022  us/op
c.a.p.SO29378922.resize       5000  avgt       30    5.318 ▒ 0.121  us/op
c.a.p.SO29378922.resize      10000  avgt       30   10.652 ▒ 0.227  us/op
c.a.p.SO29378922.resize     100000  avgt       30  139.692 ▒ 8.957  us/op

For reference, the code:

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
public class SO29378922 {
  @Param({"1", "100", "1000", "5000", "10000", "100000"}) int n;
  private final List<Integer> list = new ArrayList<>();
  @Setup public void populateList() {
    for (int i = 0; i < n; i++) list.add(0);
  }
  @Benchmark public Integer[] preSize() {
    return list.toArray(new Integer[n]);
  }
  @Benchmark public Integer[] resize() {
    return list.toArray(new Integer[0]);
  }
}

You can find similar results, full analysis, and discussion in the blog post Arrays of Wisdom of the Ancients. To summarize: the JVM and JIT compiler contains several optimizations that enable it to cheaply create and initialize a new correctly sized array, and those optimizations can not be used if you create the array yourself.


Very interesting comment. I'm surprised no one has commented on this. I guess it's because it contradicts the other answers here, as far as speed. Also interesting to note, this guys reputation is almost higher than all the other answers (ers) combined.
I digress. I would also like to see benchmarks for MyClass[] arr = myList.stream().toArray(MyClass[]::new); .. which I guess would be slower. Also, I would like to see benchmarks for the difference with array declaration. As in the difference between: MyClass[] arr = new MyClass[myList.size()]; arr = myList.toArray(arr); and MyClass[] arr = myList.toArray(new MyClass[myList.size()]); ... or should there not be any difference? I guess these two are an issue that is outside of the toArray functions happenings. But hey! I didn't think I would learn about the other intricate differences.
@PimpTrizkit Just checked: using an extra variable makes no difference as expected, Using a stream takes between 60% and 100% more time as calling toArray directly (the smaller the size, the larger the relative overhead)
This same conclusion was found here: shipilev.net/blog/2016/arrays-wisdom-ancients
@xenoterracide as discussed in the comments above, streams are slower.
M
Meredith

As of ArrayList in Java 5, the array will be filled already if it has the right size (or is bigger). Consequently

MyClass[] arr = myList.toArray(new MyClass[myList.size()]);

will create one array object, fill it and return it to "arr". On the other hand

MyClass[] arr = myList.toArray(new MyClass[0]);

will create two arrays. The second one is an array of MyClass with length 0. So there is an object creation for an object that will be thrown away immediately. As far as the source code suggests the compiler / JIT cannot optimize this one so that it is not created. Additionally, using the zero-length object results in casting(s) within the toArray() - method.

See the source of ArrayList.toArray():

public <T> T[] toArray(T[] a) {
    if (a.length < size)
        // Make a new array of a's runtime type, but my contents:
        return (T[]) Arrays.copyOf(elementData, size, a.getClass());
    System.arraycopy(elementData, 0, a, 0, size);
    if (a.length > size)
        a[size] = null;
    return a;
}

Use the first method so that only one object is created and avoid (implicit but nevertheless expensive) castings.


Two comments, might be of interest to someone: 1) LinkedList.toArray(T[] a) is even slower (uses reflection: Array.newInstance) and more complex; 2) On the other hand, in JDK7 release, I was very surprised to find out, that usually painfully-slow Array.newInstance performs nearly as fast as usual array creation!
@ktaria size is a private member of ArrayList, specifiying ****suprise**** the size. See ArrayList SourceCode
Guessing performance without benchmarks works only in trivial cases. Actually, new Myclass[0] is faster: shipilev.net/blog/2016/arrays-wisdom-ancients
This is no longer valid answer as of JDK6+
А
Антон Антонов

From JetBrains Intellij Idea inspection:

There are two styles to convert a collection to an array: either using a pre-sized array (like c.toArray(new String[c.size()])) or using an empty array (like c.toArray(new String[0]). In older Java versions using pre-sized array was recommended, as the reflection call which is necessary to create an array of proper size was quite slow. However since late updates of OpenJDK 6 this call was intrinsified, making the performance of the empty array version the same and sometimes even better, compared to the pre-sized version. Also passing pre-sized array is dangerous for a concurrent or synchronized collection as a data race is possible between the size and toArray call which may result in extra nulls at the end of the array, if the collection was concurrently shrunk during the operation. This inspection allows to follow the uniform style: either using an empty array (which is recommended in modern Java) or using a pre-sized array (which might be faster in older Java versions or non-HotSpot based JVMs).


If all of this is copied/quoted text, could we format it accordingly and also provide a link to the source? I actually came here because of the IntelliJ inspection and I'm very interested in the link to look up all of their inspections and the reasoning behind them.
Here you can check the inspections texts: github.com/JetBrains/intellij-community/tree/master/plugins/…
T
Tom Hawtin - tackline

Modern JVMs optimise reflective array construction in this case, so the performance difference is tiny. Naming the collection twice in such boilerplate code is not a great idea, so I'd avoid the first method. Another advantage of the second is that it works with synchronised and concurrent collections. If you want to make optimisation, reuse the empty array (empty arrays are immutable and can be shared), or use a profiler(!).


Upvoting 'reuse the empty array', because it's a compromise between readability and potential performance that's worthy of consideration. Passing an argument declared private static final MyClass[] EMPTY_MY_CLASS_ARRAY = new MyClass[0] doesn't prevent the returned array from being constructed by reflection, but it does prevent an additional array being constructed each each time.
Machael is right, if you use a zero-length array there is no way around: (T[])java.lang.reflect.Array.newInstance(a.getClass().getComponentType(), size); which would be superfluous in if the size would be >= actualSize (JDK7)
If you can give a citation for "modern JVMs optimise reflective array construction in this case", I'll gladly upvote this answer.
I'm learning here. If instead I use: MyClass[] arr = myList.stream().toArray(MyClass[]::new); Would it help or hurt with synchronized and concurrent collections. And why? Please.
@PimpTrizkit when you invoke .stream().toArray(MyClass[]::new) on a synchronized collection, you lose the synchronization and have to synchronize manually. In case of a concurrent collection, it doesn’t matter, as both toArray approaches are only weakly consistent. In either case, calling toArray(new MyClass[0]) on the collection directly is likely to be faster. (And to consider APIs introduced after your question, i.e. JDK 11+, calling .toArray(MyClass[]::new) directly on the collection just delegates to .toArray(new MyClass[0]) because that is already the best method for the task.)
D
Dave Cheney

toArray checks that the array passed is of the right size (that is, large enough to fit the elements from your list) and if so, uses that. Consequently if the size of the array provided it smaller than required, a new array will be reflexively created.

In your case, an array of size zero, is immutable, so could safely be elevated to a static final variable, which might make your code a little cleaner, which avoids creating the array on each invocation. A new array will be created inside the method anyway, so it's a readability optimisation.

Arguably the faster version is to pass the array of a correct size, but unless you can prove this code is a performance bottleneck, prefer readability to runtime performance until proven otherwise.


P
Panagiotis Korros

The first case is more efficient.

That is because in the second case:

MyClass[] arr = myList.toArray(new MyClass[0]);

the runtime actually creates an empty array (with zero size) and then inside the toArray method creates another array to fit the actual data. This creation is done using reflection using the following code (taken from jdk1.5.0_10):

public <T> T[] toArray(T[] a) {
    if (a.length < size)
        a = (T[])java.lang.reflect.Array.
    newInstance(a.getClass().getComponentType(), size);
System.arraycopy(elementData, 0, a, 0, size);
    if (a.length > size)
        a[size] = null;
    return a;
}

By using the first form, you avoid the creation of a second array and also avoid the reflection code.


toArray() does not use reflection. At least as long as you do not count "casting" to reflection, anyway ;-).
toArray(T[]) does. It needs to create an array of the appropriate type. Modern JVMs optimise that kind of reflection to be about the same speed as the non-reflective version.
I think that it does use reflection. The JDK 1.5.0_10 does for sure and reflection is the only way I know to create an array of a type that you don't know at compile time.
Then one of the source code examples her (the one above or mine) is out-of-date. Sadly, I didn't find a correct sub-version number for mine, though.
Georgi, your code is from JDK 1.6 and if you see the implementation of the Arrays.copyTo method you will see that the implementation uses reflection.
M
MiguelMunoz

The second one is marginally mor readable, but there so little improvement that it's not worth it. The first method is faster, with no disadvantages at runtime, so that's what I use. But I write it the second way, because it's faster to type. Then my IDE flags it as a warning and offers to fix it. With a single keystroke, it converts the code from the second type to the first one.


M
Matthew Murdoch

Using 'toArray' with the array of the correct size will perform better as the alternative will create first the zero sized array then the array of the correct size. However, as you say the difference is likely to be negligible.

Also, note that the javac compiler does not perform any optimization. These days all optimizations are performed by the JIT/HotSpot compilers at runtime. I am not aware of any optimizations around 'toArray' in any JVMs.

The answer to your question, then, is largely a matter of style but for consistency's sake should form part of any coding standards you adhere to (whether documented or otherwise).


OTOH, if the standard is to use a zero-length array, then cases that deviate imply that performance is a concern.