A New and Consistent Way of Implementing equals and hashCode | Part 1


Introduction

My first two blog posts on this website have been a two-part article about the evolution of Java’s compareTo and Comparator#compare methods. Implementing order for comparable objects is important, but not omnipresent. However, it is vital that all value classes, i.e., classes whose focus is more on data than on behavior, provide both a correct definition and implementation of equality as well as an effective and efficient calculation of their hash code.

Since every class is a subclass of Object, and Object contains both the methods equals(Object) and hashCode(), every own class can implement equality checks and hash code calculations by simply overriding these two methods. equals and hashCode always go hand in hand, so it rarely makes sense to implement just one of these two.

This blog post does not cover the reasons why one definitely should implement equals and hashCode, nor does it discuss the “contracts” that need to be fulfilled when implementing them. These topics would be beyond the scope of today’s blog post, but in case you’re interested in them, please let me know, and perhaps one day I’ll write about it.

The Old Way up to Java 6

Similar to my first blog post about comparable objects, I want to demonstrate the implementation of the two methods by using an artificial class that contains all data types as fields. There is, e.g., one field of type short, one field of type double, one field of an arbitrary object type (in this case of type OtherType), and fields of both a one- and a two-dimensional array. The following class shows theses fields as well as a correct implementation of equals and hashCode that was common up to and including Java 6:

import java.util.*;

public final class Java6DemoClass {

    private boolean booleanField;

    private byte byteField;

    private char charField;

    private short shortField;

    private int intField;

    private long longField;

    private float floatField;

    private double doubleField;

    private OtherClass classField;

    private int[] oneDimensionalArrayField;

    private int[][] twoDimensionalArrayField;

    /* [...] Constructors omitted. */

    @Override
    public boolean equals(Object otherObject) {
        if (this == otherObject) {
            return true;
        }

        if (!(otherObject instanceof Java6DemoClass)) {
            return false;
        }

        Java6DemoClass otherDemoClass = (Java6DemoClass) otherObject;

        return (/* booleanField */
                (this.booleanField == otherDemoClass.booleanField)

                /* byteField */
                && (this.byteField == otherDemoClass.byteField)

                /* charField */
                && (this.charField == otherDemoClass.charField)

                /* shortField */
                && (this.shortField == otherDemoClass.shortField)

                /* intField */
                && (this.intField == otherDemoClass.intField)

                /* longField */
                && (this.longField == otherDemoClass.longField)

                /* floatField */
                && (Float.compare(this.floatField, otherDemoClass.floatField) == 0)

                /* floatField alternative */
                && (Float.floatToIntBits(this.floatField)
                    == Float.floatToIntBits(otherDemoClass.floatField))

                /* doubleField */
                && (Double.compare(this.doubleField, otherDemoClass.doubleField) == 0)

                /* doubleField alternative */
                && (Double.doubleToLongBits(this.doubleField)
                    == Double.doubleToLongBits(otherDemoClass.doubleField))

                /* classField */
                && ((this.classField == null)
                    ? (otherDemoClass.classField == null)
                    : this.classField.equals(otherDemoClass.classField))

                /* classField alternative */
                && ((this.classField == otherDemoClass.classField)
                    || ((this.classField != null)
                        && (this.classField.equals(otherDemoClass.classField))))

                /* oneDimensionalArrayField */
                && Arrays.equals(this.oneDimensionalArrayField,
                                 otherDemoClass.oneDimensionalArrayField)

                /* twoDimensionalArrayField */
                && Arrays.deepEquals(this.twoDimensionalArrayField,
                                     otherDemoClass.twoDimensionalArrayField));
    }

    @Override
    public int hashCode() {
        long temp;
        int result = 17;

        /* booleanField */
        result = 37 * result + (booleanField ? 0 : 1);

        /* byteField */
        result = 37 * result + (int) byteField;

        /* charField */
        result = 37 * result + (int) charField;

        /* shortField */
        result = 37 * result + (int) shortField;

        /* intField */
        result = 37 * result + intField;

        /* longField */
        result = 37 * result + (int) (longField ^ (longField >>> 32));

        /* floatField */
        result = 37 * result + Float.floatToIntBits(floatField);

        /* doubleField */
        temp = Double.doubleToLongBits(doubleField);
        result = 37 * result + (int) (temp ^ (temp >>> 32));

        /* classField */
        result = 37 * result + ((classField == null) ? 0 : classField.hashCode());

        /* oneDimensionalArrayField */
        result = 37 * result + Arrays.hashCode(oneDimensionalArrayField);

        /* twoDimensionalArrayField */
        result = 37 * result + Arrays.deepHashCode(twoDimensionalArrayField);

        return result;
    }

    /* [...] Other methods omitted. */
}
public final class OtherClass {

    /* [...] Body omitted. */
}

Both the equals and the hashCode method in the example use all fields. As I already mentioned in my article about comparable objects, this is not a requirement. It is solely up to you which fields should be taken into account for both equality checks and hash code calculations. As a rule of thumb, one usually skips “derived” fields, i.e., fields that are just a calculation of other, “primary” fields, and could easily be reproduced. Also, hashCode usually incorporates the same fields for its calculation as the equals method does for its equality check.

Both methods are rather long, and that’s exactly the problem. If things become tedious, then programmers don’t like them and thus try to avoid them. The length and the many special cases also carry increased risk of programming errors. Thus one goal of this blog post is to present a concise version of these two methods at the end.

Let’s have a look at some implementation details of above’s equals method:

  • The equals method starts with three steps: 1. identity check, 2. type check, and 3. type cast. These steps are necessary to fulfill the equals contract. As already mentioned in the introduction, discussing this contract is beyond the scope of this blog post. Instead, we’ll have a look at the fourth main step, which is the actual field comparisons.
  • Comparing fields with the primitive types boolean, byte, char, short, int, and long is straightforward. Simply do the equality check using “==”.
  • I present two options for each of the field types float and double. The first option forwards the equality check to Float’s and Double’s static compare method, respectively. A return value of 0 implies equality.

    Why is that? The reason is that checking floating-point numbers isn’t as trivial as checking integral numbers. The former can also contain special values like positive and negative infinity, Not-a-Number (NaN), as well as positive (+0) and negative zeros (-0). Two “infinities” (either both positive or both negative) would still be equal using “==” (which is correct), +0 and -0 would so too (but should not), and NaNs would not (but should).

    I once wrote a blog post about using JDK’s source code to clarify implementation specifics, and that’s exactly what I have done (and you could do, too) in order to find out the “magic” behind Float.compare and Double.compare, respectively. The comparison is done by examining the raw bits of the floating-point number. This is actually too low-level for an ordinary Java programmer. Since the bit patterns of all NaNs are the same, they’re regarded as equal. Since the two different zeros +0 and -0 have a different bit pattern, they’re regarded as unequal.

    The code example above also presents an alternative for both the float and double data type. There the bit patterns are unveiled and “==”-compared directly. Which alternative is nicer is a matter of style. I personally prefer forwarding the equality check to the proven static compare method of the corresponding wrapper class.

  • Comparing two arbitrary classes requires calling their equals methods recursively. However—and again this is similar to what I wrote in my article about comparable objects—the class field of the own class could be null, the class field of the other class could be null, or both could be null. If exactly one field is null, then they’re regarded as unequal. If both fields are null, they’re regarded as equal. Otherwise, if both fields are not null, equals is called on one of them. It’s the null checks that make the code so bloated.

    An alternative presented in the code example above works slightly different. It’s preferred when you expect positive identity checks, because if two objects are found to be identical (by reference), their equals method can immediately return true, without looking at all the other data fields, which might be costly.

  • Comparisons of array fields are best done by calling the static utility methods Arrays.equals or Arrays.deepEquals. When two arrays int[] array1 and int[] array2 are being compared using their “own” equals methods, i.e., array1.equals(array2) or array2.equals(array1), the default implementation of equals is used, which is checking for reference identity. This is usually not what the programmer wants. One usually wants to go through all array elements and check each of the elements whether they’re equal or not. Only if all elements are “the same” can the two arrays be regarded as equal.

    You see that I put “the same” in quotes, because its meaning isn’t clear. If an array contains other arrays as elements, it’s considered (at least) a two-dimensional array. What should be checked when iterating through all elements of the enclosing array (i.e., the array in the first dimension)? Usually it wouldn’t be enough if the subarrays (i.e., the arrays in the second dimension) were just being compared using their own implementation of equals. Instead, one expects to have each element of those subarrays be checked, too. The same applies to arrays that have three or even more dimensions.

    Arrays.equals does a “shallow” equality search inside multi-dimensional arrays. Arrays.equals is fine for one-dimensional arrays, but usually doesn’t produce the expected results for multi-dimensional arrays, for the reason just given above. Instead, use Arrays.deepEquals if you want to have the elements of higher-dimension arrays be checked, too. When you have a close look at the code example above, you see that Arrays.equals is used for comparing the oneDimensionalArrayFields, whereas Arrays.deepEquals is used for the twoDimensionalArrayFields.

Let me also give you some comments on the hashCode method:

  • Remember that the general idea of hash codes is to generate “random-looking” numbers, even though of course they aren’t random at all. All parts of the hashCode method in the code example above thus pursue the target of incorporating each field value as much as possible.
  • Adding hash codes of fields with primitive data types byte, char, short, and int is straightforward.
  • The boolean field adds either 0 or 1, depending on its value false or true, respectively.
  • The long field consists of 64 bits that need to be incorporated into the 32-bit hash code. By shifting the long 32 bit positions to the right, one gets access to the “left half” of the bit pattern. This left half is then combined with the right half of the bit pattern using the bitwise exclusive OR operator “^”.
  • Including the 32-bit float and the 64-bit double fields require exposing their bit patterns. While float’s bit pattern can simply be added to the hash code, double’s 64-bit pattern needs to be merged as described for the long data type above.
  • Adding the hash codes of arbitrary classes of course involves calling their respective hashCode methods. However, if non-nullity cannot be guaranteed, it always needs to be surrounded by explicit null checks. This makes the code look a little bit bloated, if not as much as inside the equals method.
  • Remember that dealing with arrays requires special care inside the equals method, as explained in detail above. When it comes to arrays inside the hashCode method, the same ideas apply. Using the utility method Arrays.hashCode is fine for one-dimensional arrays. Multi-dimensional arrays probably need Arrays.deepHashCode in order to get the results one expects. Note that the methods chosen a) need to be consistent with the methods chosen for equals and b) can harm performance significantly, especially for large or multi-dimensional arrays. So better think twice whether your arrays really need to be part of your definition of equality, and thus also be part of your hash code computation.

New Options in Java 7 | Not Recommended

Java 7 introduced the utility class Objects (note the ‘s’!). Its methods for equality checks and hash code computations look promising, even though they have some significant drawbacks. Let’s have a closer look at the demo class, this time adapted to Java 7:

import java.util.*;

public final class Java7DemoClass {

    private boolean booleanField;

    private byte byteField;

    private char charField;

    private short shortField;

    private int intField;

    private long longField;

    private float floatField;

    private double doubleField;

    private OtherClass classField;

    private int[] oneDimensionalArrayField;

    private int[][] twoDimensionalArrayField;

    /* [...] Constructors omitted. */

    @Override
    public boolean equals(Object otherObject) {
        if (this == otherObject) {
            return true;
        }

        if (!(otherObject instanceof Java7DemoClass)) {
            return false;
        }

        Java7DemoClass otherDemoClass = (Java7DemoClass) otherObject;

        return (/* booleanField */
                Objects.equals(Boolean.valueOf(this.booleanField),
                               Boolean.valueOf(otherDemoClass.booleanField))

                /* byteField */
                && Objects.equals(Byte.valueOf(this.byteField),
                                  Byte.valueOf(otherDemoClass.byteField))

                /* charField */
                && Objects.equals(Character.valueOf(this.charField),
                                  Character.valueOf(otherDemoClass.charField))

                /* shortField */
                && Objects.equals(Short.valueOf(this.shortField),
                                  Short.valueOf(otherDemoClass.shortField))

                /* intField */
                && Objects.equals(Integer.valueOf(this.intField),
                                  Integer.valueOf(otherDemoClass.intField))

                /* longField */
                && Objects.equals(Long.valueOf(this.longField),
                                  Long.valueOf(otherDemoClass.longField))

                /* floatField */
                && Objects.equals(Float.valueOf(this.floatField),
                                  Float.valueOf(otherDemoClass.floatField))

                /* doubleField */
                && Objects.equals(Double.valueOf(this.doubleField),
                                  Double.valueOf(otherDemoClass.doubleField))

                /* classField */
                && Objects.equals(this.classField, otherDemoClass.classField)

                /* oneDimensionalArrayField */
                && Arrays.equals(this.oneDimensionalArrayField,
                                 otherDemoClass.oneDimensionalArrayField)

                /* twoDimensionalArrayField */
                && Arrays.deepEquals(this.twoDimensionalArrayField,
                                     otherDemoClass.twoDimensionalArrayField));
    }

    @Override
    public int hashCode() {
        return Objects.hash(Boolean.valueOf(booleanField),
                            Byte.valueOf(byteField),
                            Character.valueOf(charField),
                            Short.valueOf(shortField),
                            Integer.valueOf(intField),
                            Long.valueOf(longField),
                            Float.valueOf(floatField),
                            Double.valueOf(doubleField),
                            classField);  /* Array fields not possible! */
    }

    /* [...] Other methods omitted. */
}

A great thing seems to be the static method Objects.equals that theoretically deals with all non-array data types and relieves the programmer from the tedious task of writing different code for each different data type. When you look at the highlighted code, its usage is also very consistent and definitely “looks nice”.

A main advantage of Objects.equals is that it properly deals with null references. There is no risk of any NullPointerException, nor is there any need for complicated boilerplate code that goes through all options by checking which object—if any—might be null.

When you look closer, however, you also see that I have explicitly wrapped all primitive types. The reason is that the mentioned Objects.equals method only accepts Objects (or its subtypes, of course). Primitive types need to be wrapped or boxed, either implicitly by autoboxing or explicitly by hand. I’ve already mentioned it a few times in my former blog posts that I let my compiler emit a warning if any autoboxing or unboxing is performed, because it’s a typical source of severe performance problems, if done excessively. I prefer doing manual boxing or wrapping, not because it’s faster, but because then I’m aware of it, at least.

Both the equals and the hashCode methods need to run fast. Their main use is inside collections, where they may get called thousands or millions of times. I personally claim that using wrapper classes is unacceptable in such a case, even though I haven’t measured the performance differences yet. A detailed investigation of this topic will be done in the blog post Part 2.

Another promising thing is the static method Objects.hash(Object...) that takes a vararg and thus allows an arbitrary number of parameters to be added. But we are confronted with the same problem as mentioned above: The hash method only accepts Objects, so again all primitives need to be wrapped, which might be a severe performance problem.

Objects.hash simply calls hashCode on each of its parameters, which is fine for all data types except for arrays. The default implementation of an array’s hash code does not take its elements into account. One needs to call Arrays.[deep]HashCode in order to do so. For this reason, Objects.hash doesn’t support arrays the way one expects it.

Since vararg is just a fancy notation for an array that is generated behind-the-scenes, one could create such a “parameter array” explicitly and then hand it over to Arrays.deepHashCode, as shown here:

    @Override
    public int hashCode() {
        return Arrays.deepHashCode(new Object[] {
                                       Boolean.valueOf(booleanField),
                                       Byte.valueOf(byteField),
                                       Character.valueOf(charField),
                                       Short.valueOf(shortField),
                                       Integer.valueOf(intField),
                                       Long.valueOf(longField),
                                       Float.valueOf(floatField),
                                       Double.valueOf(doubleField),
                                       classField,
                                       oneDimensionalArrayField,
                                       twoDimensionalArrayField
                                   });
    }

The advantage is that it properly deals with both one- and multi-dimensional arrays. The disadvantage is that we still couldn’t get rid of the need of boxing or wrapping. Further investigation (which is still to come in Part 2 of this topic) is necessary to find out more about the actual performance penalty.

To sum it up, according to my gut feeling, the Java 7 methods don’t provide an alternative to the implementations of equals and hashCode, in particular because of their need of boxing or wrapping. However, I might prove wrong, that’s why I’m also excited about the actual penalty the wrapper classes do or don’t introduce. Stay tuned for Part 2.

New Methods in Java 8 | Combining the Best of Each Version

Java 8 equipped all wrapper classes with static hashCode methods that perform state-of-the-art calculations and do not require boxing or wrapping. There aren’t any new methods that could be of help for an equals implementation.

The code example presented below provides a “best of” the options explained in this blog post. No matter if a code line stems from an old or a new Java version, what matters most is a) its ease of use, b) correct results, and c) no performance bottlenecks. Please see for yourself:

import java.util.*;

public final class Java8DemoClass {

    private boolean booleanField;

    private byte byteField;

    private char charField;

    private short shortField;

    private int intField;

    private long longField;

    private float floatField;

    private double doubleField;

    private OtherClass classField;

    private int[] oneDimensionalArrayField;

    private int[][] twoDimensionalArrayField;

    /* [...] Constructors omitted. */

    @Override
    public boolean equals(Object otherObject) {
        if (this == otherObject) {
            return true;
        }

        if (!(otherObject instanceof Java8DemoClass)) {
            return false;
        }

        Java8DemoClass otherDemoClass = (Java8DemoClass) otherObject;

        return (/* booleanField */
                (this.booleanField == otherDemoClass.booleanField)

                /* byteField */
                && (this.byteField == otherDemoClass.byteField)

                /* charField */
                && (this.charField == otherDemoClass.charField)

                /* shortField */
                && (this.shortField == otherDemoClass.shortField)

                /* intField */
                && (this.intField == otherDemoClass.intField)

                /* longField */
                && (this.longField == otherDemoClass.longField)

                /* floatField */
                && (Float.compare(this.floatField, otherDemoClass.floatField) == 0)

                /* doubleField */
                && (Double.compare(this.doubleField, otherDemoClass.doubleField) == 0)

                /* classField */
                && Objects.equals(this.classField, otherDemoClass.classField)

                /* oneDimensionalArrayField */
                && Arrays.equals(this.oneDimensionalArrayField,
                                 otherDemoClass.oneDimensionalArrayField)

                /* twoDimensionalArrayField */
                && Arrays.deepEquals(this.twoDimensionalArrayField,
                                     otherDemoClass.twoDimensionalArrayField));
    }

    @Override
    public int hashCode() {
        int result = 17;
        result = 37 * result + Boolean.hashCode(booleanField);
        result = 37 * result + Byte.hashCode(byteField);
        result = 37 * result + Character.hashCode(charField);
        result = 37 * result + Short.hashCode(shortField);
        result = 37 * result + Integer.hashCode(intField);
        result = 37 * result + Long.hashCode(longField);
        result = 37 * result + Float.hashCode(floatField);
        result = 37 * result + Double.hashCode(doubleField);
        result = 37 * result + Objects.hashCode(classField);
        result = 37 * result + Arrays.hashCode(oneDimensionalArrayField);
        result = 37 * result + Arrays.deepHashCode(twoDimensionalArrayField);

        return result;
    }

    /* [...] Other methods omitted. */
}

This demo class can serve as a template for your own implementations of equals and hashCode. It looks pretty concise and consistent and definitely should be self-explanatory.

Just one word on the hashCode computation of classField. It uses the static utility method Objects.hashCode(Object) (not Objects.hash(Object...)!), which hasn’t been explained yet. It simply calls the hashCode method on the provided object, but before, it makes sure that it isn’t null. In case it is null, it simply returns 0. This method can thus be seen as the analogon to Objects.equals(Object, Object) whose main purpose is the elimination of surrounding null checks.

A Word About Google AutoValue

I often hear and read about Google AutoValue which uses annotations that instruct an automatic source generator to write the corresponding equals, hashCode, and toString methods. I personally am not a friend of such tools, even though I have to admit that I havent’t tested it yet.

Defining and implementing equality logic and hash codes is one of the most fundamental things Java programmers needs to deal with. It is—in my opinion—too important to be handed over to an automatic tool. Yes, writing these methods might be mundane, nevertheless, their implementation is important, and requires one’s full attention. The functionality of one’s classes in combination with container classes (collections) is mainly the result of their definition of equality. This definition is not trivial. People who claim that “in Java it’s always the same” probably haven’t understood the idea and importance of those methods yet.

If such source generators are the only reason why one includes such a library, think twice. Each additional library has an effect on the application’s license (even though Google AutoValue’s Apache License might be one of the most liberal, however, one still need to mention and deploy it together with the application). Each additional library needs to be taken care of in the build setup (especially since Java isn’t known for its simple build processes). Each additional library must be updated and version tracked from time to time. And most important, each additional library must be studied and adds a severe impact on the complexity of the (already complex) Java application.

The time one spends seriously coming to grips with (or worse, “playing around”) a tool like Google AutoValue is better invested in learning how to properly write these methods yourself. In this blog post, I tried my very best to give the most simple, yet fully working demonstration of both methods. It allows you to adapt the code to your specific needs. Did you know that, e.g., Google AutoValue doesn’t allow you to adjust the order of the comparisons or hash code computations? Manually, you can move the “cheap” computations, or the computations on fields that you think will differ more often than others, to the top. A fully automated tool like Google AutoValue can’t.

Summary and Outlook

In this blog post, Part 1, I presented different options for implementing correct and efficient equals and hashCode methods, depending on the options previous Java versions have introduced. Currently, I personally use and also encourage you to implement these methods based on the template I’ve provided at the end. It is concise, consistent, efficient, and self-explanatory. Who knows what future Java versions will bring in order to simplify the implementation of these methods even more?

There is an open question about the actual performance of the methods that might look elegant in code, but involve varargs and parameter wrapping. Part 2 of this topic will thus present a more realistic usage example and have a detailed look at the different options and their runtimes.

Shortlink to this blog post: link.simplexacode.ch/zxxx2019.01

Leave a Reply