Introduction
My first two blog posts on this website have been a two-part article about the evolution of Java’s compareTo
and Comparator#compare
methods. Implementing order for comparable objects is important, but not omnipresent. However, it is vital that all value classes, i.e., classes whose focus is more on data than on behavior, provide both a correct definition and implementation of equality as well as an effective and efficient calculation of their hash code.
Since every class is a subclass of Object
, and Object
contains both the methods equals(Object)
and hashCode()
, every own class can implement equality checks and hash code calculations by simply overriding these two methods. equals
and hashCode
always go hand in hand, so it rarely makes sense to implement just one of these two.
This blog post does not cover the reasons why one definitely should implement equals
and hashCode
, nor does it discuss the “contracts” that need to be fulfilled when implementing them. These topics would be beyond the scope of today’s blog post, but in case you’re interested in them, please let me know, and perhaps one day I’ll write about it.
The Old Way up to Java 6
Similar to my first blog post about comparable objects, I want to demonstrate the implementation of the two methods by using an artificial class that contains all data types as fields. There is, e.g., one field of type short
, one field of type double
, one field of an arbitrary object type (in this case of type OtherType
), and fields of both a one- and a two-dimensional array. The following class shows theses fields as well as a correct implementation of equals
and hashCode
that was common up to and including Java 6:
import java.util.*;
public final class Java6DemoClass {
private boolean booleanField;
private byte byteField;
private char charField;
private short shortField;
private int intField;
private long longField;
private float floatField;
private double doubleField;
private OtherClass classField;
private int[] oneDimensionalArrayField;
private int[][] twoDimensionalArrayField;
/* [...] Constructors omitted. */
@Override
public boolean equals(Object otherObject) {
if (this == otherObject) {
return true;
}
if (!(otherObject instanceof Java6DemoClass)) {
return false;
}
Java6DemoClass otherDemoClass = (Java6DemoClass) otherObject;
return (/* booleanField */
(this.booleanField == otherDemoClass.booleanField)
/* byteField */
&& (this.byteField == otherDemoClass.byteField)
/* charField */
&& (this.charField == otherDemoClass.charField)
/* shortField */
&& (this.shortField == otherDemoClass.shortField)
/* intField */
&& (this.intField == otherDemoClass.intField)
/* longField */
&& (this.longField == otherDemoClass.longField)
/* floatField */
&& (Float.compare(this.floatField, otherDemoClass.floatField) == 0)
/* floatField alternative */
&& (Float.floatToIntBits(this.floatField)
== Float.floatToIntBits(otherDemoClass.floatField))
/* doubleField */
&& (Double.compare(this.doubleField, otherDemoClass.doubleField) == 0)
/* doubleField alternative */
&& (Double.doubleToLongBits(this.doubleField)
== Double.doubleToLongBits(otherDemoClass.doubleField))
/* classField */
&& ((this.classField == null)
? (otherDemoClass.classField == null)
: this.classField.equals(otherDemoClass.classField))
/* classField alternative */
&& ((this.classField == otherDemoClass.classField)
|| ((this.classField != null)
&& (this.classField.equals(otherDemoClass.classField))))
/* oneDimensionalArrayField */
&& Arrays.equals(this.oneDimensionalArrayField,
otherDemoClass.oneDimensionalArrayField)
/* twoDimensionalArrayField */
&& Arrays.deepEquals(this.twoDimensionalArrayField,
otherDemoClass.twoDimensionalArrayField));
}
@Override
public int hashCode() {
long temp;
int result = 17;
/* booleanField */
result = 37 * result + (booleanField ? 0 : 1);
/* byteField */
result = 37 * result + (int) byteField;
/* charField */
result = 37 * result + (int) charField;
/* shortField */
result = 37 * result + (int) shortField;
/* intField */
result = 37 * result + intField;
/* longField */
result = 37 * result + (int) (longField ^ (longField >>> 32));
/* floatField */
result = 37 * result + Float.floatToIntBits(floatField);
/* doubleField */
temp = Double.doubleToLongBits(doubleField);
result = 37 * result + (int) (temp ^ (temp >>> 32));
/* classField */
result = 37 * result + ((classField == null) ? 0 : classField.hashCode());
/* oneDimensionalArrayField */
result = 37 * result + Arrays.hashCode(oneDimensionalArrayField);
/* twoDimensionalArrayField */
result = 37 * result + Arrays.deepHashCode(twoDimensionalArrayField);
return result;
}
/* [...] Other methods omitted. */
}
public final class OtherClass {
/* [...] Body omitted. */
}
Both the equals
and the hashCode
method in the example use all fields. As I already mentioned in my article about comparable objects, this is not a requirement. It is solely up to you which fields should be taken into account for both equality checks and hash code calculations. As a rule of thumb, one usually skips “derived” fields, i.e., fields that are just a calculation of other, “primary” fields, and could easily be reproduced. Also, hashCode
usually incorporates the same fields for its calculation as the equals
method does for its equality check.
Both methods are rather long, and that’s exactly the problem. If things become tedious, then programmers don’t like them and thus try to avoid them. The length and the many special cases also carry increased risk of programming errors. Thus one goal of this blog post is to present a concise version of these two methods at the end.
Let’s have a look at some implementation details of above’s equals
method:
- The
equals
method starts with three steps: 1. identity check, 2. type check, and 3. type cast. These steps are necessary to fulfill theequals
contract. As already mentioned in the introduction, discussing this contract is beyond the scope of this blog post. Instead, we’ll have a look at the fourth main step, which is the actual field comparisons. - Comparing fields with the primitive types
boolean
,byte
,char
,short
,int
, andlong
is straightforward. Simply do the equality check using “==
”. -
I present two options for each of the field types
float
anddouble
. The first option forwards the equality check toFloat
’s andDouble
’s staticcompare
method, respectively. A return value of0
implies equality.Why is that? The reason is that checking floating-point numbers isn’t as trivial as checking integral numbers. The former can also contain special values like positive and negative infinity, Not-a-Number (
NaN
), as well as positive (+0
) and negative zeros (-0
). Two “infinities” (either both positive or both negative) would still be equal using “==
” (which is correct),+0
and-0
would so too (but should not), andNaN
s would not (but should).I once wrote a blog post about using JDK’s source code to clarify implementation specifics, and that’s exactly what I have done (and you could do, too) in order to find out the “magic” behind
Float.compare
andDouble.compare
, respectively. The comparison is done by examining the raw bits of the floating-point number. This is actually too low-level for an ordinary Java programmer. Since the bit patterns of allNaN
s are the same, they’re regarded as equal. Since the two different zeros+0
and-0
have a different bit pattern, they’re regarded as unequal.The code example above also presents an alternative for both the
float
anddouble
data type. There the bit patterns are unveiled and“==”
-compared directly. Which alternative is nicer is a matter of style. I personally prefer forwarding the equality check to the proven staticcompare
method of the corresponding wrapper class. -
Comparing two arbitrary classes requires calling their
equals
methods recursively. However—and again this is similar to what I wrote in my article about comparable objects—the class field of the own class could benull
, the class field of the other class could benull
, or both could benull
. If exactly one field isnull
, then they’re regarded as unequal. If both fields arenull
, they’re regarded as equal. Otherwise, if both fields are notnull
,equals
is called on one of them. It’s thenull
checks that make the code so bloated.An alternative presented in the code example above works slightly different. It’s preferred when you expect positive identity checks, because if two objects are found to be identical (by reference), their
equals
method can immediately returntrue
, without looking at all the other data fields, which might be costly. -
Comparisons of array fields are best done by calling the static utility methods
Arrays.equals
orArrays.deepEquals
. When two arraysint[] array1
andint[] array2
are being compared using their “own”equals
methods, i.e.,array1.equals(array2)
orarray2.equals(array1)
, the default implementation ofequals
is used, which is checking for reference identity. This is usually not what the programmer wants. One usually wants to go through all array elements and check each of the elements whether they’re equal or not. Only if all elements are “the same” can the two arrays be regarded as equal.You see that I put “the same” in quotes, because its meaning isn’t clear. If an array contains other arrays as elements, it’s considered (at least) a two-dimensional array. What should be checked when iterating through all elements of the enclosing array (i.e., the array in the first dimension)? Usually it wouldn’t be enough if the subarrays (i.e., the arrays in the second dimension) were just being compared using their own implementation of
equals
. Instead, one expects to have each element of those subarrays be checked, too. The same applies to arrays that have three or even more dimensions.Arrays.equals
does a “shallow” equality search inside multi-dimensional arrays.Arrays.equals
is fine for one-dimensional arrays, but usually doesn’t produce the expected results for multi-dimensional arrays, for the reason just given above. Instead, useArrays.deepEquals
if you want to have the elements of higher-dimension arrays be checked, too. When you have a close look at the code example above, you see thatArrays.equals
is used for comparing theoneDimensionalArrayField
s, whereasArrays.deepEquals
is used for thetwoDimensionalArrayField
s.
Let me also give you some comments on the hashCode
method:
- Remember that the general idea of hash codes is to generate “random-looking” numbers, even though of course they aren’t random at all. All parts of the
hashCode
method in the code example above thus pursue the target of incorporating each field value as much as possible. - Adding hash codes of fields with primitive data types
byte
,char
,short
, andint
is straightforward. - The
boolean
field adds either0
or1
, depending on its valuefalse
ortrue
, respectively. - The
long
field consists of 64 bits that need to be incorporated into the 32-bit hash code. By shifting thelong
32 bit positions to the right, one gets access to the “left half” of the bit pattern. This left half is then combined with the right half of the bit pattern using the bitwise exclusive OR operator “^
”. - Including the 32-bit
float
and the 64-bitdouble
fields require exposing their bit patterns. Whilefloat
’s bit pattern can simply be added to the hash code,double
’s 64-bit pattern needs to be merged as described for thelong
data type above. - Adding the hash codes of arbitrary classes of course involves calling their respective
hashCode
methods. However, if non-nullity cannot be guaranteed, it always needs to be surrounded by explicitnull
checks. This makes the code look a little bit bloated, if not as much as inside theequals
method. - Remember that dealing with arrays requires special care inside the
equals
method, as explained in detail above. When it comes to arrays inside thehashCode
method, the same ideas apply. Using the utility methodArrays.hashCode
is fine for one-dimensional arrays. Multi-dimensional arrays probably needArrays.deepHashCode
in order to get the results one expects. Note that the methods chosen a) need to be consistent with the methods chosen forequals
and b) can harm performance significantly, especially for large or multi-dimensional arrays. So better think twice whether your arrays really need to be part of your definition of equality, and thus also be part of your hash code computation.
New Options in Java 7 | Not Recommended
Java 7 introduced the utility class Objects
(note the ‘s’!). Its methods for equality checks and hash code computations look promising, even though they have some significant drawbacks. Let’s have a closer look at the demo class, this time adapted to Java 7:
import java.util.*;
public final class Java7DemoClass {
private boolean booleanField;
private byte byteField;
private char charField;
private short shortField;
private int intField;
private long longField;
private float floatField;
private double doubleField;
private OtherClass classField;
private int[] oneDimensionalArrayField;
private int[][] twoDimensionalArrayField;
/* [...] Constructors omitted. */
@Override
public boolean equals(Object otherObject) {
if (this == otherObject) {
return true;
}
if (!(otherObject instanceof Java7DemoClass)) {
return false;
}
Java7DemoClass otherDemoClass = (Java7DemoClass) otherObject;
return (/* booleanField */
Objects.equals(Boolean.valueOf(this.booleanField),
Boolean.valueOf(otherDemoClass.booleanField))
/* byteField */
&& Objects.equals(Byte.valueOf(this.byteField),
Byte.valueOf(otherDemoClass.byteField))
/* charField */
&& Objects.equals(Character.valueOf(this.charField),
Character.valueOf(otherDemoClass.charField))
/* shortField */
&& Objects.equals(Short.valueOf(this.shortField),
Short.valueOf(otherDemoClass.shortField))
/* intField */
&& Objects.equals(Integer.valueOf(this.intField),
Integer.valueOf(otherDemoClass.intField))
/* longField */
&& Objects.equals(Long.valueOf(this.longField),
Long.valueOf(otherDemoClass.longField))
/* floatField */
&& Objects.equals(Float.valueOf(this.floatField),
Float.valueOf(otherDemoClass.floatField))
/* doubleField */
&& Objects.equals(Double.valueOf(this.doubleField),
Double.valueOf(otherDemoClass.doubleField))
/* classField */
&& Objects.equals(this.classField, otherDemoClass.classField)
/* oneDimensionalArrayField */
&& Arrays.equals(this.oneDimensionalArrayField,
otherDemoClass.oneDimensionalArrayField)
/* twoDimensionalArrayField */
&& Arrays.deepEquals(this.twoDimensionalArrayField,
otherDemoClass.twoDimensionalArrayField));
}
@Override
public int hashCode() {
return Objects.hash(Boolean.valueOf(booleanField),
Byte.valueOf(byteField),
Character.valueOf(charField),
Short.valueOf(shortField),
Integer.valueOf(intField),
Long.valueOf(longField),
Float.valueOf(floatField),
Double.valueOf(doubleField),
classField); /* Array fields not possible! */
}
/* [...] Other methods omitted. */
}
A great thing seems to be the static method Objects.equals
that theoretically deals with all non-array data types and relieves the programmer from the tedious task of writing different code for each different data type. When you look at the highlighted code, its usage is also very consistent and definitely “looks nice”.
A main advantage of Objects.equals
is that it properly deals with null
references. There is no risk of any NullPointerException
, nor is there any need for complicated boilerplate code that goes through all options by checking which object—if any—might be null
.
When you look closer, however, you also see that I have explicitly wrapped all primitive types. The reason is that the mentioned Objects.equals
method only accepts Object
s (or its subtypes, of course). Primitive types need to be wrapped or boxed, either implicitly by autoboxing or explicitly by hand. I’ve already mentioned it a few times in my former blog posts that I let my compiler emit a warning if any autoboxing or unboxing is performed, because it’s a typical source of severe performance problems, if done excessively. I prefer doing manual boxing or wrapping, not because it’s faster, but because then I’m aware of it, at least.
Both the equals
and the hashCode
methods need to run fast. Their main use is inside collections, where they may get called thousands or millions of times. I personally claim that using wrapper classes is unacceptable in such a case, even though I haven’t measured the performance differences yet. A detailed investigation of this topic will be done in the blog post Part 2.
Another promising thing is the static method Objects.hash(Object...)
that takes a vararg and thus allows an arbitrary number of parameters to be added. But we are confronted with the same problem as mentioned above: The hash
method only accepts Object
s, so again all primitives need to be wrapped, which might be a severe performance problem.
Objects.hash
simply calls hashCode
on each of its parameters, which is fine for all data types except for arrays. The default implementation of an array’s hash code does not take its elements into account. One needs to call Arrays.[deep]HashCode
in order to do so. For this reason, Objects.hash
doesn’t support arrays the way one expects it.
Since vararg is just a fancy notation for an array that is generated behind-the-scenes, one could create such a “parameter array” explicitly and then hand it over to Arrays.deepHashCode
, as shown here:
@Override
public int hashCode() {
return Arrays.deepHashCode(new Object[] {
Boolean.valueOf(booleanField),
Byte.valueOf(byteField),
Character.valueOf(charField),
Short.valueOf(shortField),
Integer.valueOf(intField),
Long.valueOf(longField),
Float.valueOf(floatField),
Double.valueOf(doubleField),
classField,
oneDimensionalArrayField,
twoDimensionalArrayField
});
}
The advantage is that it properly deals with both one- and multi-dimensional arrays. The disadvantage is that we still couldn’t get rid of the need of boxing or wrapping. Further investigation (which is still to come in Part 2 of this topic) is necessary to find out more about the actual performance penalty.
To sum it up, according to my gut feeling, the Java 7 methods don’t provide an alternative to the implementations of equals
and hashCode
, in particular because of their need of boxing or wrapping. However, I might prove wrong, that’s why I’m also excited about the actual penalty the wrapper classes do or don’t introduce. Stay tuned for Part 2.
New Methods in Java 8 | Combining the Best of Each Version
Java 8 equipped all wrapper classes with static hashCode
methods that perform state-of-the-art calculations and do not require boxing or wrapping. There aren’t any new methods that could be of help for an equals
implementation.
The code example presented below provides a “best of” the options explained in this blog post. No matter if a code line stems from an old or a new Java version, what matters most is a) its ease of use, b) correct results, and c) no performance bottlenecks. Please see for yourself:
import java.util.*;
public final class Java8DemoClass {
private boolean booleanField;
private byte byteField;
private char charField;
private short shortField;
private int intField;
private long longField;
private float floatField;
private double doubleField;
private OtherClass classField;
private int[] oneDimensionalArrayField;
private int[][] twoDimensionalArrayField;
/* [...] Constructors omitted. */
@Override
public boolean equals(Object otherObject) {
if (this == otherObject) {
return true;
}
if (!(otherObject instanceof Java8DemoClass)) {
return false;
}
Java8DemoClass otherDemoClass = (Java8DemoClass) otherObject;
return (/* booleanField */
(this.booleanField == otherDemoClass.booleanField)
/* byteField */
&& (this.byteField == otherDemoClass.byteField)
/* charField */
&& (this.charField == otherDemoClass.charField)
/* shortField */
&& (this.shortField == otherDemoClass.shortField)
/* intField */
&& (this.intField == otherDemoClass.intField)
/* longField */
&& (this.longField == otherDemoClass.longField)
/* floatField */
&& (Float.compare(this.floatField, otherDemoClass.floatField) == 0)
/* doubleField */
&& (Double.compare(this.doubleField, otherDemoClass.doubleField) == 0)
/* classField */
&& Objects.equals(this.classField, otherDemoClass.classField)
/* oneDimensionalArrayField */
&& Arrays.equals(this.oneDimensionalArrayField,
otherDemoClass.oneDimensionalArrayField)
/* twoDimensionalArrayField */
&& Arrays.deepEquals(this.twoDimensionalArrayField,
otherDemoClass.twoDimensionalArrayField));
}
@Override
public int hashCode() {
int result = 17;
result = 37 * result + Boolean.hashCode(booleanField);
result = 37 * result + Byte.hashCode(byteField);
result = 37 * result + Character.hashCode(charField);
result = 37 * result + Short.hashCode(shortField);
result = 37 * result + Integer.hashCode(intField);
result = 37 * result + Long.hashCode(longField);
result = 37 * result + Float.hashCode(floatField);
result = 37 * result + Double.hashCode(doubleField);
result = 37 * result + Objects.hashCode(classField);
result = 37 * result + Arrays.hashCode(oneDimensionalArrayField);
result = 37 * result + Arrays.deepHashCode(twoDimensionalArrayField);
return result;
}
/* [...] Other methods omitted. */
}
This demo class can serve as a template for your own implementations of equals
and hashCode
. It looks pretty concise and consistent and definitely should be self-explanatory.
Just one word on the hashCode
computation of classField
. It uses the static utility method Objects.hashCode(Object)
(not Objects.hash(Object...)
!), which hasn’t been explained yet. It simply calls the hashCode
method on the provided object, but before, it makes sure that it isn’t null
. In case it is null
, it simply returns 0
. This method can thus be seen as the analogon to Objects.equals(Object, Object)
whose main purpose is the elimination of surrounding null
checks.
A Word About Google AutoValue
I often hear and read about Google AutoValue which uses annotations that instruct an automatic source generator to write the corresponding equals
, hashCode
, and toString
methods. I personally am not a friend of such tools, even though I have to admit that I havent’t tested it yet.
Defining and implementing equality logic and hash codes is one of the most fundamental things Java programmers needs to deal with. It is—in my opinion—too important to be handed over to an automatic tool. Yes, writing these methods might be mundane, nevertheless, their implementation is important, and requires one’s full attention. The functionality of one’s classes in combination with container classes (collections) is mainly the result of their definition of equality. This definition is not trivial. People who claim that “in Java it’s always the same” probably haven’t understood the idea and importance of those methods yet.
If such source generators are the only reason why one includes such a library, think twice. Each additional library has an effect on the application’s license (even though Google AutoValue’s Apache License might be one of the most liberal, however, one still need to mention and deploy it together with the application). Each additional library needs to be taken care of in the build setup (especially since Java isn’t known for its simple build processes). Each additional library must be updated and version tracked from time to time. And most important, each additional library must be studied and adds a severe impact on the complexity of the (already complex) Java application.
The time one spends seriously coming to grips with (or worse, “playing around”) a tool like Google AutoValue is better invested in learning how to properly write these methods yourself. In this blog post, I tried my very best to give the most simple, yet fully working demonstration of both methods. It allows you to adapt the code to your specific needs. Did you know that, e.g., Google AutoValue doesn’t allow you to adjust the order of the comparisons or hash code computations? Manually, you can move the “cheap” computations, or the computations on fields that you think will differ more often than others, to the top. A fully automated tool like Google AutoValue can’t.
Summary and Outlook
In this blog post, Part 1, I presented different options for implementing correct and efficient equals
and hashCode
methods, depending on the options previous Java versions have introduced. Currently, I personally use and also encourage you to implement these methods based on the template I’ve provided at the end. It is concise, consistent, efficient, and self-explanatory. Who knows what future Java versions will bring in order to simplify the implementation of these methods even more?
There is an open question about the actual performance of the methods that might look elegant in code, but involve varargs and parameter wrapping. Part 2 of this topic will thus present a more realistic usage example and have a detailed look at the different options and their runtimes.
Shortlink to this blog post: link.simplexacode.ch/zxxx2019.01