The Problem With Creating Generic Arrays


Introduction

I bet that most Java programmers fall into the trap of trying to create generic arrays at least once in their Java software development career, wondering why the compiler is complaining. Shouldn’t generics—introduced in Java 5—allow the use of type parameters everywhere? Why is List<T> genericList = new ArrayList<>() legal, but T[] genericArray = new T[10] is not? This blog post answers these questions and shows several fixes to solve this problem.

Arrays Versus Generics With Respect to Variance

The first principle one needs to know in order to understand the problem mentioned above is the fundamental difference between arrays and generics when it comes to variance. Variance, esp. the adjectives invariant and covariant (and also contravariant if one has a closer look at this topic), “refer[s] to how subtyping between more complex types relates to subtyping between their components”, according to Wikipedia. By “more complex types”, we speak of arrays and generic collections here. So an array, e.g., String[], is a complex type, whereas String is its component type. The same applies to generic collections, e.g., List<String>, where List is the complex type, and String is the type of its components.

The questions are:
If Super is a superclass of Sub (i.e., Sub is a subclass of Super), then …

  1. is Super[] also a superclass of Sub[]?
    (is Sub[] also a subclass of Super[]?)
  2. is List<Super> also a superclass of List<Sub>?
    (is List<Sub> also a subclass of List<Super>?)

In order to answer the question for arrays (question 1), please have a look at the following example program. Instead of the abstract types Super and Sub, I will use the concrete classes Number as the supertype and BigDecimal and BigInteger as the subtype for this and all upcoming examples in this blog post. Note that BigDecimal and BigInteger are “siblings”, so neither is a subtype of the other. They’re both “equally ranked” subtypes of Number.

import java.util.*;
import java.math.*;

public final class Variances {

    private void arrayVariance() {
        BigDecimal[] bigDecimalArray = new BigDecimal[10];
        Number[] numberArray = bigDecimalArray;
        bigDecimalArray = (BigDecimal[]) numberArray;
        BigInteger[] bigIntegerArray = (BigInteger[]) numberArray;  /* Fails at runtime. */
    }
  1. The first command BigDecimal[] bigDecimalArray = new BigDecimal[10] creates an array of type BigDecimal and of size 10. This is the ordinary way arrays are created.
  2. The second command Number[] numberArray = bigDecimalArray assigns this (more specific) BigDecimal array to a (more general) Number array. The compiler won’t complain, and this line of code will be executed fine.
  3. The third command bigDecimalArray = (BigDecimal[]) numberArray tries to “extract” the subtype BigDecimal array out of the supertype Number array. This is fine both during compile time and runtime. It uses the proper cast (BigDecimal[]).

    The main message of points 2 and 3 is this: Arrays are covariant, i.e., an array of a supertype (Super[]) is also a supertype of an array of a subtype (Sub[]). Or regarding the code example above: Number[] is a supertype of BigDecimal[], because Number is a supertype of BigDecimal.

  4. The last command BigInteger[] bigIntegerArray = (BigInteger[]) numberArray tries to extract the BigDecimal array out of the supertype Number array and cast it to a BigInteger array. Even though the compiler doesn’t complain, this line of code will fail during runtime. In this case, the compiler doesn’t provide compile time type safety.

Arrays are also called reified (probably the best synonym: “concretized”). They always have a “concrete”, known type during runtime.

Contrast this to generics (question 2), shown in the following example program:

    private void genericCollectionVariance() {
        List<Number> numberList = new ArrayList<>();
        List<BigDecimal> bigDecimalList = new ArrayList<>();
        numberList = bigDecimalList;  /* Fails at compile time. */
    }
  1. The first command List<Number> numberList = new ArrayList<>() creates a generic [Array]List of type Number. Note that I’m not talking about the hierarchy of List and ArrayList. Using the more general interface type List instead of ArrayList is considered more flexible style, so please don’t feel distracted. This is the ordinary way generic lists are created.
  2. The second command List<BigDecimal> bigDecimalList = new ArrayList<>() does the same for the more specific type BigDecimal.
  3. The last command numberList = bigDecimalList was intended to assign the more specific List<BigDecimal> to the more general List<Number>, analogous to step 2 in the array example above. However, the compiler complains with an error message: “Type mismatch: cannot convert from List<BigDecimal> to List<Number>”.

    The main message of point 3 is this: Generics are invariant, i.e., a generic collection of a supertype (List<Super>) is neither a supertype nor a subtype of a collection of a subtype (<Sub>). Or regarding the code example above: [Array]List<Number> is neither a supertype nor a subtype of [Array]List<BigDecimal>. They simply don’t have any hierarchy connection between each other.

Generics are also called non-reified (“non-concretized”). They don’t have a “concrete”, known type during runtime, for the reason given in the following section.

Erasure

The second principle one needs to know in order to understand why one cannot create generic arrays in a straightforward way is the concept of erasure. When generics were introduced in Java 5, it was of utmost importance to keep the bytecode backward compatible to versions prior to Java 5. Since the Java Virtual Machine (JVM; the “executor”) doesn’t and cannot know anything about generics, the Java designers came up with the idea of erasure. Erasure means that the generic type is simply “erased” by the compiler. Let’s see how this works.

The first example shows the source code of a user of a simple generic data structure, a SingleItemBag<T>, which simply stores one item of type T:

import java.math.*;

public final class Erasures {

    private void useSingleItemBag() {  /* before erasure */
        SingleItemBag<BigDecimal> bigDecimalBag = new SingleItemBag<>();
        bigDecimalBag.setItem(BigDecimal.ONE);
        BigDecimal myBigDecimal = bigDecimalBag.getItem();
        System.out.println(myBigDecimal);
    }

The user creates an instance of type BigDecimal, then sets an item, then gets the item again, and finally outputs its value. This is probably the most typical example of using generics as a “user”. The user doesn’t need to manually check types, doesn’t need to cast, and can rely on the compiler’s type safety. These are the great advantages of generics.

Things are getting interesting when you look at the code the compiler translates the previous version to:

    private void useSingleItemBag() {  /* after erasure */
        SingleItemBag bigDecimalBag = new SingleItemBag();
        bigDecimalBag.setItem(BigDecimal.ONE);
        BigDecimal myBigDecimal = (BigDecimal) bigDecimalBag.getItem();
        System.out.println(myBigDecimal);
    }

All generics (in other words, all writings between < and >) are gone; they were “erased”. However, the compiler now introduced an explicit type cast: BigDecimal myBigDecimal = (BigDecimal) bigDecimalBag.getItem(); Broadly speaking, the compiler changed the code to the way you would have written it in Java 1.4.

The second example shows the source code of the implementation of the generic data structure:

public final class SingleItemBag<T> {  /* before erasure */

    private T item;

    public SingleItemBag() {}

    public T getItem() {
        return item;
    }

    public void setItem(T item) {
        this.item = item;
    }
}

Again, the compiler removes everything generic:

public final class SingleItemBag {  /* after erasure */

    private Object item;

    public SingleItemBag() {}

    public Object getItem() {
        return item;
    }

    public void setItem(Object item) {
        this.item = item;
    }
}

Every type parameter T (which actually means: “I don’t know the specific type yet.”) gets replaced by Object. Since the compiler had control over the whole conversion process, it can guarantee that all type casts (which were also inserted by it) succeed. At the same time, the compiler becomes more strict. If there was the slightest chance that the use of generics results in a type mismatch, the compiler would have issued an error or a warning. By using generics, runtime type safety gets shifted towards compile time type safety.

The term “non-reified”—introduced in the previous section—explains the situation after erasure: All type information has been erased, so the JVM simply doesn’t know the types during runtime (they’re all Objects). Data types are no longer “concrete”.

Understanding the concept of erasure also answers the question why generics are invariant. If it were allowed that, e.g., a List<Number> “swallowed” a List<BigDecimal>, then both compile time type safety (because the compiler cannot see behind the “swallowing” in the source code) and runtime type safety (because of erasure, everything is Object) would be gone. In other words, type safety would be severly flawed, if not inexistent.

The Consequences of Erasure for Arrays

The original problem or question was why arrays cannot be created generically in a straightforward manner. Now that you know the main concepts of the unterlying theory, let’s try to create a generic array. The following code examples show a generic MultiItemBag<T> of type T, which is a data structure where one can add items of type T and get them back using indices. This is definitely no production code, but should suffice to demonstrate the concepts.

public final class MultiItemBag<T> {  /* before erasure */

    private T[] itemArray;

    private int indexToFirstFreeElement
            = 0;

    public MultiItemBag() {
        itemArray = new T[10];  /* Fails at compile time. */
    }

    public T getItem(int index) {

        /* [...] Parameter checks omitted. */

        T elementToReturn = itemArray[index];

        return elementToReturn;
    }

    public void addItem(T item) {

        /* [...] Array bound checks omitted. */

        itemArray[indexToFirstFreeElement++] = item;
    }
}

MultiItemBag is intended to use an array T[] itemArray as the internal data storage. However, the initialization itemArray = new T[10] fails due to a compiler error: “Cannot create a generic array of T”.

To understand why this is, have a look at the code that gets generated by the compiler after erasure:

public final class MultiItemBag<T> {  /* after erasure */

    private Object[] itemArray;

    private int indexToFirstFreeElement
            = 0;

    public MultiItemBag() {
        itemArray = new Object[10];
    }

    public Object getItem(int index) {

        /* [...] Parameter checks omitted. */

        Object elementToReturn = itemArray[index];

        return elementToReturn;
    }

    public void addItem(Object item) {

        /* [...] Array bound checks omitted. */

        itemArray[indexToFirstFreeElement++] = item;
    }
}

Every T got replaced by Object. Especially, the array T[] became an array Object[], and even worse, new T[10] became new Object[10]. When run, the JVM creates an array of the most general type, Object[]. Since one could put any object into Object[], it would undermine all type safety, especially when it came to extracting and casting back the elements (with the “automatic” casts inserted by the compiler). The flexibility and covariance of arrays simply doesn’t go together with the strictness, safety and invariance of generics.

In the following sections, I’ll show you four different ways to overcome this problem.

Fix 1: Use Collections Instead of Arrays

This fix sounds pretty lame, but it’s definitely justified. Since Java’s Collection API has grown so large during the last more than 20 years, think twice whether you really need an array to back your data. The Collection API contains all kinds of (linked) lists, stacks, queues, etc., even in specialized forms, with or without thread safety, with or without modification protection, etc.

import java.util.*;

public final class MultiItemBagFix1<T> {

    private List<T> itemList;

    public MultiItemBagFix1() {
        itemList = new ArrayList<>();
    }

    public T getItem(int index) {

        /* [...] Parameter checks omitted. */

        T elementToReturn = itemList.get(index);

        return elementToReturn;
    }

    public void addItem(T item) {
        itemList.add(item);
    }
}

Replacing the array in MultiItemBag[Fix1] by [Array]List<T> allows the use of generics in its pure and unrestricted form, simplifies the code (note that the index handling using indextoFirstFreeElement disappeared), offers much more functionality, and lets you use and rely on code that has been well-proven and tested for many years by millions of programmers.

Fix 2: Create and Cast the Array Object[] Manually

By creating an array Object[] manually—instead of letting the compiler do it by erasure—and casting it to the generic field type T[] itemArray, you “give a promise” to the compiler that you’re aware of what’s going on with this array and that you “take sole responsibility” if anything goes wrong. The compiler reminds you of this promise by issuing a warning: Type safety: Unchecked cast from Object[] to T[].

If you’re sure about what you’re doing, you can suppress the warning, as seen in the source code below:

public final class MultiItemBagFix2<T> {

    private T[] itemArray;

    private int indexToFirstFreeElement
            = 0;

    @SuppressWarnings("unchecked")
    public MultiItemBagFix2() {
        itemArray = (T[]) new Object[10];
    }

    public T getItem(int index) {

        /* [...] Parameter checks omitted. */

        T elementToReturn = itemArray[index];

        return elementToReturn;
    }

    public void addItem(T item) {

        /* [...] Array bound checks omitted. */

        itemArray[indexToFirstFreeElement++] = item;
    }
}

But what’s different compared to the original version? After erasure, doesn’t it look the same? Well, by manually inserting the type cast (T[]), the “responsibility” is transferred to you as the programmer. The creation of generic arrays is not disallowed because something will go wrong for sure, but rather because the compiler cannot guarantee that everything will be fine. By manually casting the array and suppressing (or ignoring) the warning, you have to live with the risk of ClassCastExceptions. Type safety can only be guaranteed by you as the programmer. If you’ve verified that during runtime the types won’t mess up, you can go with the fix provided above.

Fix 3: Use an Array Object[] and Cast Every Access Manually

This fix is similar to Fix 2 shown above. Whereas the former fix uses an array of type T[], thus eliminating the need for any type casts when accessing the array, the fix shown below kind of does the opposite: The array is kept in its general form, Object[], and every (read) access is accompanied by an explicit type cast (T). (There is no need for a cast when “writing” to the array, because everything can be written to Object.)

public final class MultiItemBagFix3<T> {

    private Object[] itemArray;

    private int indexToFirstFreeElement
            = 0;

    public MultiItemBagFix3() {
        itemArray = new Object[10];
    }

    public T getItem(int index) {

        /* [...] Parameter checks omitted. */

        @SuppressWarnings("unchecked")
        T elementToReturn = (T) itemArray[index];

        return elementToReturn;
    }

    public void addItem(T item) {

        /* [...] Array bound checks omitted. */

        itemArray[indexToFirstFreeElement++] = item;
    }
}

Whether one prefers Fix 2 or Fix 3 is a matter of taste. Both versions require explicit casts, thus shifting the responsibility of type safety to you as the programmer. Both versions require the suppression of warnings, of course not after you’ve verified unconditional type safety. If the class grew bigger and offered more access methods, Fix 2 would probably be better, because it only needs one type cast, whereas Fix 3 needs casts for every (reading) access method.

Fix 4: Use Reflection to Create the Generic Array

The last fix I’m presenting you uses Java’s reflection facility. The static method Array.newInstance(Class<?>, int) in the package java.lang.reflect creates an array of the specified type and the given length.

import java.lang.reflect.*;

public final class MultiItemBagFix4<T> {

    private T[] itemArray;

    private int indexToFirstFreeElement
            = 0;

    @SuppressWarnings("unchecked")
    public MultiItemBagFix4(Class<T> classType) {
        itemArray = (T[]) Array.newInstance(classType, 10);
    }

    public T getItem(int index) {

        /* [...] Parameter checks omitted. */

        T elementToReturn = itemArray[index];

        return elementToReturn;
    }

    public void addItem(T item) {

        /* [...] Array bound checks omitted. */

        itemArray[indexToFirstFreeElement++] = item;
    }
}

What sounds like a perfect solution actually has its drawbacks. First of all, newInstance needs the class type, i.e., the Class object of the desired type, handed in explicitly. It cannot find out the type on its own. Because of erasure, T.class would always result in Object, if it were allowed at all. (It’s not allowed, because it doesn’t make sense.) Always explicitly providing the class type makes using such a feature very cumbersome. See that the constructor of MultiItemBagFix4 requires its user to explicitly state the class type, even though users shouldn’t be confronted with such implementation details. Users of this data structure might possible ask themselves why the heck they must tell the class the type if it’s already specified by T. It’s very unusual, thus suspicious.

The second disadvantage is that one still needs an explicit type cast (as in Fix 2), still needs to take responsibility and double-check type safety (which might not be easy, since the user could also provide false class types as an argument), and still needs to suppress a warning.

Finally, using reflection is often considered bad, unsafe coding style, and usually doesn’t leave a good impression because of security, style, and performance reasons.

Checking the Provided Fixes

The following improvised test cases show that all four fixes actually work. Feel free to play around with the classes.

import java.math.*;

public final class BagFixChecks {

    public static void main(String[] args) {
        BagFixChecks bagFixChecks = new BagFixChecks();
        bagFixChecks.checkBagFix1();
        bagFixChecks.checkBagFix2();
        bagFixChecks.checkBagFix3();
        bagFixChecks.checkBagFix4();
    }

    private void checkBagFix1() {
        MultiItemBagFix1<BigDecimal> multiItemBag = new MultiItemBagFix1<>();
        multiItemBag.addItem(BigDecimal.TEN);
        multiItemBag.addItem(BigDecimal.ONE);
        multiItemBag.addItem(BigDecimal.ZERO);
        BigDecimal item0 = multiItemBag.getItem(0);
        BigDecimal item1 = multiItemBag.getItem(1);
        BigDecimal item2 = multiItemBag.getItem(2);
        System.out.format("MultiItemBagFix1: %2s, %2s, %2s%n", item0, item1, item2);
    }

    private void checkBagFix2() {
        MultiItemBagFix2<BigDecimal> multiItemBag = new MultiItemBagFix2<>();
        multiItemBag.addItem(BigDecimal.TEN);
        multiItemBag.addItem(BigDecimal.ONE);
        multiItemBag.addItem(BigDecimal.ZERO);
        BigDecimal item0 = multiItemBag.getItem(0);
        BigDecimal item1 = multiItemBag.getItem(1);
        BigDecimal item2 = multiItemBag.getItem(2);
        System.out.format("MultiItemBagFix2: %2s, %2s, %2s%n", item0, item1, item2);
    }

    private void checkBagFix3() {
        MultiItemBagFix3<BigDecimal> multiItemBag = new MultiItemBagFix3<>();
        multiItemBag.addItem(BigDecimal.TEN);
        multiItemBag.addItem(BigDecimal.ONE);
        multiItemBag.addItem(BigDecimal.ZERO);
        BigDecimal item0 = multiItemBag.getItem(0);
        BigDecimal item1 = multiItemBag.getItem(1);
        BigDecimal item2 = multiItemBag.getItem(2);
        System.out.format("MultiItemBagFix3: %2s, %2s, %2s%n", item0, item1, item2);
    }

    private void checkBagFix4() {
        MultiItemBagFix4<BigDecimal> multiItemBag = new MultiItemBagFix4<>(BigDecimal.class);
        multiItemBag.addItem(BigDecimal.TEN);
        multiItemBag.addItem(BigDecimal.ONE);
        multiItemBag.addItem(BigDecimal.ZERO);
        BigDecimal item0 = multiItemBag.getItem(0);
        BigDecimal item1 = multiItemBag.getItem(1);
        BigDecimal item2 = multiItemBag.getItem(2);
        System.out.format("MultiItemBagFix4: %2s, %2s, %2s%n", item0, item1, item2);
    }
}

Summary

First and foremost, don’t use generic arrays. Use Java’s Collection API instead, since chances are pretty low that it doesn’t contain at least the basic data structure of what you need (Fix 1). If you really cannot get around creating generic arrays, prefer Fix 2.

This blog post could have been about an academic problem, however, most programmers probably have been confronted with it at least once. This blog post showed the theoretical background of the problem: the fundamental difference of arrays versus generic collections with respect to variance, erasure, runtime type and compile time type safety. It again stresses the important fact that in order to really understand Java, one needs to have a thorough and detailed look at and understanding of its background.

Shortlink to this blog post: link.simplexacode.ch/rk572019.02

Leave a Reply