Introduction
Java provides two primitive types for floatingpoint arithmetic: float
and double
. The latter is usually preferred to the former since double
s are much more precise than float
s (15–17 digits compared to 6–9 digits). I am not aware of any performance gains using float
s, because common CPU architectures are designed for fullscale double
s (or even more). The only advantage I see in using float
s over double
s is when it comes to saving memory, e.g., when filling very large arrays with floatingpoint numbers. For all examples following below, we will stick to double
s.
Floatingpoint numbers have some severe flaws programmers often forget about in their reallife applications. We’re not talking about scientific or mathematical applications here, but about pitfalls every programmer could be confronted with sooner or later. The cause of most of these flaws lies in the fact that floatingpoint numbers cannot represent numerical values exactly. Floatingpoint numbers basically consist of a significand or mantissa (that is—roughly speaking—the digits one can “see”) and an exponent, which “shifts” the values by the specified order of magnitude.
Imagine telling someone the free amount of space you still have available on your hard disk. You would probably say something like: “I still have 453 gigabytes left.” The “giga” in “gigabytes” specifies the order of magnitude or—technically speaking—the exponent base 1000. “Giga” means 1 billion (German: 1 Milliarde), so by saying “453 gigabytes”, you actually mean “453 billion bytes”. You would probably never give a statement that says: “I still have 453,574,447,755 bytes left.” For specifiying free space on a hard disk, 3 significant digits are probably enough. There usually is no reason to care about minor details when talking about a completely different order of magnitude. To put it crudely, who cares about whether there are 453,…,…,755 bytes left instead of 453,…,…,485 bytes?
The data type double
allows specifying amounts in the order of magnitude roughly between 10^{308} and 10^{308}, thus the name “floatingpoint” (the decimal point can—broadly speaking—move about 308 places to the left or to the right). This is more than enough to specify, e.g., both the size of the universe (roughly 10^{26} m) as well as the size of an electron (roughly 10^{−19} m) while keeping the precision at 15 to 17 significant decimal digits at all times.
The Drawbacks of FloatingPoint Numbers
Having seen the great advantages of floatingpoint numbers, one usually cannot imagine their drawbacks. The following class contains three example methods (described in the sections below) that demonstrate fundamental flaws. One doesn’t even need to dive into scientific or mathematical applications in order to be confronted with their problems.
public final class FloatingPointExamples {
public static void main(String[] args) {
FloatingPointExamples examples = new FloatingPointExamples();
examples.multiplyTwoNumbers();
examples.subtractTwoNumbers();
examples.addThreeNumbersInDifferentOrder();
}
Multiplying Two Numbers
The following method multiplies two double
s a = 100.0 and b = 35.05. The result of their multiplication should—as everyone knows—be 3505.0. This value is prestored into the variable expectedResult
. The boolean resultsAreEquals
contains the result of the equality check “expected result = a × b?”. At the end, the result of this check, the calculated result, and the expected result are printed out.
private void multiplyTwoNumbers() {
double a = 100.0;
double b = 35.05;
double calculatedResult = a * b;
double expectedResult = 3505.0;
boolean resultsAreEqual = (calculatedResult == expectedResult);
System.out.println("Multiplying Two Numbers");
System.out.println("");
System.out.println();
System.out.println("Equal? " + resultsAreEqual);
System.out.println("Calculated: " + calculatedResult);
System.out.println("Expected: " + expectedResult);
System.out.println();
System.out.println();
}
Surprisingly, the result does not seem to equal the expected value 3505.0:
Multiplying Two Numbers  Equal? false Calculated: 3504.9999999999995 Expected: 3505.0
One can clearly see that the multiplication results in 3504.9999999999995, which is not equal to 3505.0.
Subtracting Two Numbers
The next method subtracts two simple double
values a = 7.66 and b = 1.64 and compares its result to the expected value of 6.02:
private void subtractTwoNumbers() {
double a = 7.66;
double b = 1.64;
double calculatedResult = a  b;
double expectedResult = 6.02;
boolean resultsAreEqual = (calculatedResult == expectedResult);
System.out.println("Subtracting Two Numbers");
System.out.println("");
System.out.println();
System.out.println("Equal? " + resultsAreEqual);
System.out.println("Calculated: " + calculatedResult);
System.out.println("Expected: " + expectedResult);
System.out.println();
System.out.println();
}
Again, there is a problem. Even in this simple calculation, 7.66 − 1.64 does not equal 6.02:
Subtracting Two Numbers  Equal? false Calculated: 6.0200000000000005 Expected: 6.02
Adding Three Numbers in Different Order
To cause even more confusion, have a look at the following method:
private void addThreeNumbersInDifferentOrder() {
double a = 444.000_000;
double b = 0.444_000;
double c = 0.000_444;
double calculatedResultAbc = a + b + c;
double calculatedResultCba = c + b + a;
double expectedResult = 444.444_444;
boolean resultsAbcAndExpectedAreEqual = (calculatedResultAbc == expectedResult);
boolean resultsCbaAndExpectedAreEqual = (calculatedResultCba == expectedResult);
boolean resultsAbcAndCbaAreEqual = (calculatedResultAbc == calculatedResultCba);
System.out.println("Adding Three Numbers in Different Order");
System.out.println("");
System.out.println();
System.out.println("a+b+c = Expected? " + resultsAbcAndExpectedAreEqual);
System.out.println("c+b+a = Expected? " + resultsCbaAndExpectedAreEqual);
System.out.println("a+b+c = c+b+a? " + resultsAbcAndCbaAreEqual);
System.out.println("Calculated a+b+c: " + calculatedResultAbc);
System.out.println("Calculated c+b+a: " + calculatedResultCba);
System.out.println("Expected: " + expectedResult);
System.out.println();
System.out.println();
}
This time, three double
s are summed up in order to create the number 444.444444. In the first case, the summation is 444.0 + 0.444 + 0.000444, in the second case the summation is 0.000444 + 0.444 + 444.0.
Surprisingly, the two results don’t match:
Adding Three Numbers in Different Order  a+b+c = Expected? false c+b+a = Expected? true a+b+c = c+b+a? false Calculated a+b+c: 444.44444400000003 Calculated c+b+a: 444.444444 Expected: 444.444444
The Problem Behind FloatingPoint Arithmetic
In order to give a detailed explanation of the problems that arose above, one needs to delve into the details of floatingpoint arithmetic, which is far beyond the scope of this blog post (search for “IEEE 754” if you want to learn more). To keep things short and simple, let me try to give you this explanation:
We humans use the decimal system for numbers. Each digit can contain the values 0 to 9, i.e., each digit can represent 10 different values. A digit immediately to the left of another digit counts 10 times more. Thus, 1234 is 1 × 1000 + 2 × 100 + 3 × 10 + 4 × 1 = 1234 = 1.234 × 10^{3}.
Computers use the binary system. Each digit can only contain the values 0 and 1, i.e., each digit can only represent 2 different values. A digit immediately to the left of another digit counts 2 times more. Thus, 1234 in decimal form is 10011010010 in binary form, because 1 × 1024 + 0 × 512 + 0 × 256 + 1 × 128 + 1 × 64 + 0 × 32 + 1 × 16 + 0 × 8 + 0 × 4 + 1 × 2 + 0 × 1 = 1234.
Not only is the significand based on 2, but so is the power that is responsible for the order of magnitude. What looks neat using a decimal power can become very complicated for a binary power. E.g., 1234 = 1.234 × 10^{3}, but what’s the significand and the exponent if the base needs to be 2? In this case, it’s 1.205078125 × 2^{10} (you knew that, didn’t you?). You see that the significand needs more digits. Even worse, it can also result in an infinite amount of digits needed because the significand could not be represented exactly otherwise. Thus, errors can occur as soon as this “truncated” number with the binary power is converted back to a pure decimal number.
To summarize, float
and double
are inappropriate data types if exact values are needed. There is already a need for exactness if you want to compare decimal numbers for equality, e.g., when dealing with monetary values. All three examples presented above could stem from realworld applications. Imagine the numbers used in the examples were currencies, and now try to think of a doubleentry bookkeeping system and its dire consequences if summing up the entries one way wouldn’t equal summing up the entries the other way …
The Classes BigInteger
and BigDecimal
Java provides two classes in the java.math
package that address several limitations and problems of the primitive integral and floatingpoint types. These classes are BigInteger
and BigDecimal
. In essence, both classes explicitly keep track of their (unlimited amount of) own digits. BigInteger
allows the representation of integral values only, whereas BigDecimal
allows the representation of fractional values. I see the following advantages in BigInteger
:
 It can deal with a theoretically unlimited amount of digits (only limited by memory), i.e.,
BigInteger
can deal with theoretically infinitely large integral values.  It shows its strengths in the area of number theory, esp. by providing several methods for bitwise operations, modular arithmetic, and prime numbers.
 It allows the initialization and output of numbers using any arbitrary radix (like binary, hexadecimal, or any “own” radix value, e.g., 5).
BigDecimal
has a slightly different focus:
 It can deal with a theoretically unlimited amount of digits (only limited by memory) both for the integral and the fractional part of a number, i.e.,
BigDecimal
can deal with theoretically infinitely large and infinitely precise decimal values.  All arithmetic operations that can be done exactly will be done exactly, i.e., there is no danger of introducing any errors or inaccuracies one would not expect (see examples).
 For arithmetic operations that cannot be done exactly, one can adjust the accuracy (defined by the scale or precision, see below) of the represented number and/or select one of several rounding modes.
The goal of this blog post is to address the inaccuracy problems of float
and double
. E.g., when doing arithmetic on currencies, we expect the results to not drift away from the exact monetary value. For such purposes, BigDecimal
is the choice. If one wants to overcome the limitations of int
and long
and is not interested in different radices or number theory methods, one can choose between BigInteger
and BigDecimal
. Surprisingly, as I found out, there is no performance loss if you use BigDecimal
even for pure integral purposes. Thus, if you don’t need any of the special methods BigInteger
offers, use BigDecimal
.
While preparing for this blog post, I expected the usage of BigDecimal
(and BigInteger
) to be much more complicated than it finally proved to be. I will only present the BigDecimal
class, for the reasons given above. Let’s dive right into the rewritten examples from the previous sections, this time using BigDecimal
instead of double
:
import java.math.*;
public final class BigDecimalSameExamples {
public static void main(String[] args) {
BigDecimalSameExamples examples = new BigDecimalSameExamples();
examples.multiplyTwoNumbers();
examples.subtractTwoNumbers();
examples.addThreeNumbersInDifferentOrder();
}
private void multiplyTwoNumbers() {
BigDecimal a = new BigDecimal("100.0");
BigDecimal b = new BigDecimal("35.05");
BigDecimal calculatedResult = a.multiply(b);
BigDecimal expectedResult = new BigDecimal("3505.0");
boolean resultsAreEqual = (calculatedResult.compareTo(expectedResult) == 0);
System.out.println("Multiplying Two Numbers");
System.out.println("");
System.out.println();
System.out.println("Equal? " + resultsAreEqual);
System.out.println("Calculated: " + calculatedResult);
System.out.println("Expected: " + expectedResult);
System.out.println();
System.out.println();
}
private void subtractTwoNumbers() {
BigDecimal a = new BigDecimal("7.66");
BigDecimal b = new BigDecimal("1.64");
BigDecimal calculatedResult = a.subtract(b);
BigDecimal expectedResult = new BigDecimal("6.02");
boolean resultsAreEqual = (calculatedResult.compareTo(expectedResult) == 0);
System.out.println("Subtracting Two Numbers");
System.out.println("");
System.out.println();
System.out.println("Equal? " + resultsAreEqual);
System.out.println("Calculated: " + calculatedResult);
System.out.println("Expected: " + expectedResult);
System.out.println();
System.out.println();
}
private void addThreeNumbersInDifferentOrder() {
BigDecimal a = new BigDecimal("444.000000");
BigDecimal b = new BigDecimal( "0.444000");
BigDecimal c = new BigDecimal( "0.000444");
BigDecimal calculatedResultAbc = a.add(b).add(c);
BigDecimal calculatedResultCba = c.add(b).add(a);
BigDecimal expectedResult = new BigDecimal("444.444444");
boolean resultsAbcAndExpectedAreEqual
= (calculatedResultAbc.compareTo(expectedResult) == 0);
boolean resultsCbaAndExpectedAreEqual
= (calculatedResultCba.compareTo(expectedResult) == 0);
boolean resultsAbcAndCbaAreEqual
= (calculatedResultAbc.compareTo(calculatedResultCba) == 0);
System.out.println("Adding Three Numbers in Different Order");
System.out.println("");
System.out.println();
System.out.println("a+b+c = Expected? " + resultsAbcAndExpectedAreEqual);
System.out.println("c+b+a = Expected? " + resultsCbaAndExpectedAreEqual);
System.out.println("a+b+c = c+b+a? " + resultsAbcAndCbaAreEqual);
System.out.println("Calculated a+b+c: " + calculatedResultAbc);
System.out.println("Calculated c+b+a: " + calculatedResultCba);
System.out.println("Expected: " + expectedResult);
System.out.println();
System.out.println();
}
The program output looks promising:
Multiplying Two Numbers  Equal? true Calculated: 3505.000 Expected: 3505.0 Subtracting Two Numbers  Equal? true Calculated: 6.02 Expected: 6.02 Adding Three Numbers in Different Order  a+b+c = Expected? true c+b+a = Expected? true a+b+c = c+b+a? true Calculated a+b+c: 444.444444 Calculated c+b+a: 444.444444 Expected: 444.444444
Let’s go through the highlighted sections of the source code step by step:

import java.math.*;
As already mentioned, the class
BigDecimal
is part of the packagejava.math
. 
BigDecimal a = new BigDecimal("100.0");
BigDecimal b = new BigDecimal("35.05");BigDecimal
provides several constructors, includingBigDecimal(double)
andBigDecimal(String)
. It is very important to use theString
version of the constructor because using thedouble
version introduces inaccuracies of thedouble
value, which—remember?—might not store the exact value one intended to store. This would introduce errors before the use ofBigDecimal
has even started. 
BigDecimal calculatedResult = a.multiply(b)
BigDecimal calculatedResult = a.subtract(b);
Since Java doesn’t allow operator overloading, performing operations is a little bit more verbose than simply using
+
,
,*
,/
, and the like. Common operations onBigDecimal
are:add(BigDecimal)
subtract(BigDecimal)
multiply(BigDecimal)
divide(BigDecimal)
remainder(BigDecimal)
pow(int)
abs()
plus()
negate()
min(BigDecimal)
max(BigDecimal)
BigDecimal
s are immutable (so areBigInteger
s), i.e., once their value is set, the object representing the value can no longer change. Performing an operation onBigDecimal
thus always results in a new object being created (this is the same behavior as, e.g., theString
class). Never forget to store the reference of that newlycreated, returned object, otherwise the result will be silently lost. 
boolean resultsAreEqual = (calculatedResult.compareTo(expectedResult) == 0);
BigDecimal
is one of the rare exceptions in the Java API whereequals
is not consistent withcompareTo
. In other words, twoBigDecimal
s might not be equal to each other according toequals
even though theircompareTo
methods return0
. This is the case if the value they represent is equal from a mathematical (or “human”) point of view, but the two numbers have a different scale, i.e., a different number of decimal places where one of them contains trailing zeros.The first example (“Multiplying Two Numbers”) shows two “different” outputs for its results:
Multiplying Two Numbers  Equal? true Calculated: 3505.000 Expected: 3505.0
Since 3505.000 and 3505.0 are equal from a mathematical point of view, and there really is no inaccuracy or rounding issue involved here,
compareTo
returns0
, meaning equality, whereasequals
returnsfalse
because the number of decimal places is different. Thus, always usecompareTo
for checking (in)equalities ofBigDecimal
s. 
BigDecimal calculatedResultAbc = a.add(b).add(c);
BigDecimal calculatedResultCba = c.add(b).add(a);
These lines of code demonstrate nicely how operations on
BigDecimal
s can be chained. Of course, the order of additions no longer has an influence on the final result.
Scale, MathContext
, and Precision
Of course, BigDecimal
s allow adjustments of their accuracy. Each BigDecimal
keeps track of its number of decimal places (the amount of digits right to the decimal point), called scale. 3505.000 has scale 3, whereas 3505.0 has scale 1. When adding two numbers, the scale is the maximum of the scales of both numbers, when multiplying two numbers, the scale is the sum of the scales of both numbers, etc.
The scale is adjusted implicitly by BigDecimal
, which is fine as long as all operations can be performed exactly. The scale can also be set explicitly, in which case rounding may occur. This rounding behavior, however, can again be specified. The divide
method, e.g., provides an overloaded version that takes the scale as an int
argument and performs the necessary rounding if the division cannot be performed exactly.
All of the BigDecimal
methods presented above, except min
and max
, have at least one overloaded version that takes a MathContext
object, e.g., add(BigDecimal, MathContext)
. This MathContext
object is a value object that contains both a socalled precision and a rounding mode enum. However, precision is not the same as scale. Precision is the total number of significant digits, both left and right to the decimal point, whereas scale only states the number of digits right to the decimal point.
The method stripTrailingZeros()
adjusts the scale to its possible minimum. By stripping the trailing zeros, we could have “normalized” the two results 3505.000 and 3505.0 above and make them look unique and also really make them equal according to equals
. The method setScale(int, RoundingMode)
let’s you set to the scale to any specific value explicitly, while performing any rounding using the given rounding mode, if necessary.
An Example of Using Huge BigDecimal
s
To close this blog post, let me show you an impressive application of BigDecimal
that calculates the factorial n! = 1 × 2 × 3 × … × n. Such numbers become extremely large very quickly. The result of 10! has 7 digits, 20! has 19 digits (and is the maximum factorial that can be calculated using the long
data type), and 100! already has 158 digits. Here is the code:
import java.math.*;
public final class BigDecimalFactorialExample {
private static final int N
= 10_000;
public static void main(String[] args) {
BigDecimalFactorialExample example = new BigDecimalFactorialExample();
example.factorial();
}
private void factorial() {
/* BigDecimals have preinitialized constants .ZERO, .ONE, and .TEN. */
BigDecimal result = BigDecimal.ONE;
for (int i = 1; i <= N; i++) {
/* Using the nonstring constructor is fine here, because ints are always exact. */
BigDecimal bigI = new BigDecimal(i);
result = result.multiply(bigI);
}
String resultString = result.toString();
System.out.println("Result: " + resultString);
System.out.println("Digits: " + resultString.length());
}
}
When you run the program with N
= 10,000, it calculates the 35,660digit result in no time. Calculating 1,000,000! takes some minutes until it displays the result consisting of more than 5.5 million digits.
I mentioned earlier that there is no performance loss when using BigDecimal
instead of BigInteger
. You can try out on your own by simply replacing the two classes.
Summary
Whenever you use decimal values that need to remain exact—probably in most cases when dealing with currencies—, use BigDecimal
. Using it is simpler than one would expect first. Whether you stick to its basic functionality or delve into its scientific and mathematical details, BigDecimal
will probably fit the whole range of your needs.
Shortlink to this blog post: link.simplexacode.ch/ewvs2019.01
Fabián
30 Sep 2020Nice article! Thanks.