Underflow bug

All of us are familiar with overflow bugs. However, sometimes you write code that counts on overflow. This is a story where overflow was supposed to happen but didn’t, hence the name underflow bug.

Round-robin

In our Java implementation of the round-robin algorithm, we store the number of connections in variable size and then we call index() % size to get the index of the chosen connection. The value of index() follows the sequence 0, 1, 2, 3, … For example with size = 3, we get index() % size equal to 0, 1, 2, 0, 1, 2, 0, …, which is exactly how round-robin works.

How is index() implemented? About a month ago, it looked the following way.

int index() {
  int index = this.index.get();
  if (index < 0) {
    index -= Integer.MAX_VALUE;
  }
  return index;
}

where this.index is AtomicInteger (we can’t just use an int, because the code is run in parallel). It’s value is increased later, after a sendable connection has been found.

The function index() tries to convert negative integers into positive. Before we dive into why it doesn’t work, let’s look at the representation of positive and negative integers.

How are positive and negative integers represented in memory?

Negative integers are usually explained using two’s complement but I remember this explanation confused me a lot in high school. I prefer a different way of looking at negative integers.

Let’s work with 8-bit integers for simplicity. With unsigned integers, we can get values ranging from 0 to 255 (28 − 1). For signed integers, the range is instead from −128 (−27) to 127 (27 − 1). Numbers 0 to 127 have the same signed and unsigned representation.

How much is 120 + 47 in both unsigned and signed representations? Unsigned is easy, 120 + 47 = 167. But signed is easy too! We just need a number that is negative and congruent with 167 modulo 256 (it has the same remainder after division by 256). We can therefore take 167 − 256 = −89.

You can check that 167 and −89 are congruent modulo 256 by firing up python:

$ python
>>> -89 % 256
167

Now suppose I ask you how much is 167 − 47. You immediately answer 120. But how much is −89 − 47 in 8-bit integers? −89 − 47 = −136 is outside of the range of 8-bit signed integers, so we need to add 256, resulting in −136 + 256 = 120.

You can also check this with python:

$ python
>>> (-89 - 47) % 256
120

I find this easier to comprehend and calculate than doing bit operations by hand.

What does the above index() code do when this.index.get() = -1?

Now that we understand negative integers better, what does happen above when this.index.get() = -1? Since Integer.MAX_VALUE = 2^31-1, the function returns −1 − (231 − 1) = −231 which is congruent with 231. However, 231 is just outside the range of positive 32-bit integers, so instead we get 231 − 231 = −231, or Integer.MIN_VALUE if you like.

When size = 3, we get Integer.MIN_VALUE % size = -2 in Java. The number is then used to access an element of an array, which causes the program to crash.

As an exercise, you can prove that −1 is the only number for which index() returns a negative number. This is easier done using congruencies than reasoning about bit operations.

Off-by-one error

The bug is that this.index.get() = -1 should return Integer.MAX_VALUE instead of Integer.MIN_VALUE. Interestingly, the difference between these two numbers is exactly 1! Try printing Integer.MAX_VALUE + 1 and Integer.MIN_VALUE - 1 yourself.

The initial code was actually counting on overflow but that doesn’t happen in this case, since -1 - Integer.MAX_VALUE still fits into the range of 32-bit signed integers.

We can fix the code by changing only two bytes, MAX to MIN (in ASCII it’s actually only a 4-bit change).

int index() {
  int index = this.index.get();
  if (index < 0) {
    index -= Integer.MIN_VALUE;
  }
  return index;
}

Or if you are not scared of bit operations, there is a one-byte fix.

    index &= Integer.MAX_VALUE;

This works, because Integer.MAX_VALUE has the lowest 31 bits set and doing &= Integer.MAX_VALUE effectively resets the highest bit that carries the sign.

Now we don’t need the if clause anymore.

int index() {
  return this.index.get() & Integer.MAX_VALUE;
}

Fun facts

The bug is triggered after about 4 billion (232 − 1) requests are sent using the round-robin algorithm, which is equivalent to sending 136.2 requests per second for one year. The bug went unnoticed for 710 days and the code was probably executed 100 trillion (1014) times before it crashed. This is understandable, since the bug needs very special circumstances to happen.