All of us are familiar with overflow bugs. However, sometimes you write code that counts on overflow. This is a story where overflow was supposed to happen but didn’t, hence the name *underflow bug*.

## Round-robin

In our Java implementation of the round-robin algorithm, we store the number of connections in variable `size`

and then we call `index() % size`

to get the index of the chosen connection. The value of `index()`

follows the sequence 0, 1, 2, 3, … For example with `size = 3`

, we get `index() % size`

equal to 0, 1, 2, 0, 1, 2, 0, …, which is exactly how round-robin works.

How is `index()`

implemented? About a month ago, it looked the following way.

int index() { int index = this.index.get(); if (index < 0) { index -= Integer.MAX_VALUE; } return index; }

where `this.index`

is AtomicInteger (we can’t just use an `int`

, because the code is run in parallel). It’s value is increased later, after a sendable connection has been found.

The function `index()`

tries to convert negative integers into positive. Before we dive into why it doesn’t work, let’s look at the representation of positive and negative integers.

# How are positive and negative integers represented in memory?

Negative integers are usually explained using two’s complement but I remember this explanation confused me a lot in high school. I prefer a different way of looking at negative integers.

Let’s work with 8-bit integers for simplicity. With unsigned integers, we can get values ranging from 0 to 255 (2^{8} − 1). For signed integers, the range is instead from −128 (−2^{7}) to 127 (2^{7} − 1). Numbers 0 to 127 have the same signed and unsigned representation.

How much is 120 + 47 in both unsigned and signed representations? Unsigned is easy, 120 + 47 = 167. But signed is easy too! We just need a number that is negative and congruent with 167 modulo 256 (it has the same remainder after division by 256). We can therefore take 167 − 256 = −89.

You can check that 167 and −89 are congruent modulo 256 by firing up python:

```
$ python
>>> -89 % 256
167
```

Now suppose I ask you how much is 167 − 47. You immediately answer 120. But how much is −89 − 47 in 8-bit integers? −89 − 47 = −136 is outside of the range of 8-bit signed integers, so we need to add 256, resulting in −136 + 256 = 120.

You can also check this with python:

```
$ python
>>> (-89 - 47) % 256
120
```

I find this easier to comprehend and calculate than doing bit operations by hand.

## What does the above `index()`

code do when `this.index.get() = -1`

?

Now that we understand negative integers better, what does happen above when `this.index.get() = -1`

? Since `Integer.MAX_VALUE = 2^31-1`

, the function returns −1 − (2^{31} − 1) = −2^{31} which is congruent with 2^{31}. However, 2^{31} is just outside the range of positive 32-bit integers, so instead we get 2^{31} − 2^{31} = −2^{31}, or `Integer.MIN_VALUE`

if you like.

When `size = 3`

, we get `Integer.MIN_VALUE % size = -2`

in Java. The number is then used to access an element of an array, which causes the program to crash.

As an exercise, you can prove that −1 is the only number for which `index()`

returns a negative number. This is easier done using congruencies than reasoning about bit operations.

## Off-by-one error

The bug is that `this.index.get() = -1`

should return `Integer.MAX_VALUE`

instead of `Integer.MIN_VALUE`

. Interestingly, the difference between these two numbers is exactly 1! Try printing `Integer.MAX_VALUE + 1`

and `Integer.MIN_VALUE - 1`

yourself.

The initial code was actually counting on overflow but that doesn’t happen in this case, since `-1 - Integer.MAX_VALUE`

still fits into the range of 32-bit signed integers.

We can fix the code by changing only two bytes, MAX to MIN (in ASCII it’s actually only a 4-bit change).

int index() { int index = this.index.get(); if (index < 0) { index -= Integer.MIN_VALUE; } return index; }

Or if you are not scared of bit operations, there is a one-byte fix.

index &= Integer.MAX_VALUE;

This works, because `Integer.MAX_VALUE`

has the lowest 31 bits set and doing `&= Integer.MAX_VALUE`

effectively resets the highest bit that carries the sign.

Now we don’t need the `if`

clause anymore.

int index() { return this.index.get() & Integer.MAX_VALUE; }

## Fun facts

The bug is triggered after about 4 billion (2^{32} − 1) requests are sent using the round-robin algorithm, which is equivalent to sending 136.2 requests per second for one year. The bug went unnoticed for 710 days and the code was probably executed 100 trillion (10^{14}) times before it crashed. This is understandable, since the bug needs very special circumstances to happen.