Oh IPv6, Where Art Thou

Illustrations by Jonas Ekman

Since the dawn of time, man has used 32-bit addressing. When the first Homo Erectus crawled out of the sea 6000 years ago, IPv4 infrastructure was already installed and the savannah was teeming with spam, flames and lewd ascii-art.

Back in 1992, the Chief Architect, a man named Greg Internet, started worrying about IPv4-addresses running out. There were only about 4 billion addresses in the IPv4 address space and somehow most of them had ended up with universities and with Hewlett-Packard. Development was started on a successor to IPv4 – to be called IPv6 for occult reasons – and Greg alerted the world to the upcoming addresspcalypse… addropalypse… address disaster.

In 1996 the IPv6 specification was completed. Not much else happened.

Greg tried to get the world’s attention again and again, year after year. {Un,}fortunately Network Address Translation was a thing so a few addresses could go a long way.

From 2010 and onwards the last available IPv4-address block has been given out several times yearly, to widespread apathy. In 2012 Greg Internet was lost in a tragic beard-grooming accident.

Now it is 2015. IPv6 is finally happening. I know, I know, you’ve been hearing this for the last 15 years and your scepticism is warranted, but this time is also different.

j_ill_wind

Now mobile phone carriers have started switching their networks to IPv6. This means that end-users are being converted in large chunks. IPv6 adoption is highest in Belgium with 36% but the US is not far behind with 21%. Completely impartial and objective projections made by IPv6 evangelists puts adoption in the US at 50% at the end of 2016. Whether you believe that or not, the amount of IPv6 traffic hitting the Google front page has doubled yearly for the last couple of years. It was 3.5% at the beginning of 2014, 6% at the beginning of 2015 and is around 8-9% at the time of writing.

One way of measuring IPv6 adoption for a certain network or operator is to measure the ratio of the number of IPv6 connections to the number of IPv4 connections, hitting major dual-stack sites, where those connections originate in the network of interest. In other words: what percentage of the connections from a particular network hitting Google, Facebook, Akamai and a few other sites, are IPv6? For Verizon Wireless networks, for instance, the answer is 72%. This metric is being tracked by the World IPv6 Launch and is what “IPv6 adoption” will refer to in this text.

If you want to deploy an IPv6 network you have two problems. One is how to deal with services in the Internet that only supports IPv4. These are legion and some sort of IPv6-IPv4 gateway will be needed for clients in your network. Such a gateway is commonly known as NAT64 which does what you would expect: it translates outbound IPv6 to IPv4 and inbound IPv4 to IPv6. Since DNS lookups can return IPv4 addresses and our network pretends that all the world is IPv6, NAT64 needs to be combined with DNS64 – a DNS proxy that translates IPv4 lookup results into IPv6.

The other problem is how to deal with client software that only works with IPv4. These are a cornucopia, and also other synonyms of “lots” that I looked up on the Internet. IPv4-only applications require an IPv4-IPv6 translator on their host in order to work in an IPv6-only network.

Because networking cannot possibly work without lots of convoluted acronyms, the translator at the edge of the IPv6 network is called a Provider-side Translator, or PLAT. The host-side translator is called a Customer-side Translator – CLAT. If these acronyms seem odd it is because a “translator” is called XLAT (which is odd in turn) so CLAT/PLAT are recursive acronyms because networking. The combination of PLAT/CLAT is called 464XLAT. I’m just going to leave that there.

j_broke_internet

Spotify is not known for working with IPv6. In fact, Spotify is known for not working with IPv6. Three major applications pop up again and again as examples of “bad apples” in presentations of the struggles to deploy IPv6: Whatsapp, Skype and Spotify.

iTunes, Facebook, Youtube, Pandora and Netflix all support IPv6. Spotify don’t. Except now we do, a bit.

In the beginning of 2015 we started getting bug reports about connectivity problems with the Spotify Android client. After a while it became apparent that these bug reports had one thing in common – the affected users were all customers of T-Mobile, a big mobile phone operator in the United States.

T-Mobile have adopted IPv6 in a major way. All Android phones running an OS more recent than 4.2 default to IPv6 in their network and their IPv6 adoption has gone from 0 to 50% in two years. T-Mobile helped define the 464XLAT RFC (RFC6877) because “It is not acceptable to break Spotify“, so in some small way perhaps we helped.

The connectivity problem affecting our users turned out to be caused by the CLAT, the IPv4-IPv6 translation software on the phone. Sometimes the CLAT would crash, causing IPv4-only applications on the phone to lose connectivity. To the user this looked like Spotify stopped working while most other internet services kept working – clearly a Spotify bug.

Fixing the CLAT was not really in our power, for one thing there’s no way we could roll a fix out to the users. Fortunately there was a workaround: if the Spotify client worked with IPv6 then the CLAT would not be required at all and we would be unaffected if the CLAT crashed.

maniskprogmini

Enabling IPv6 for a client application is not technically a big deal:

  • Whenever a socket is created you need to know if the socket should be a IPv4 or IPv6 socket.
  • Whenever a numeric or binary IP address is handled (as opposed to a host name), you need to ensure that both IPv6 and IPv4 address formats are supported.

In properly written networking software numeric addresses are rare. Hostnames are normally used, and socket calls that accept a resolved address do so using a wrapper type (struct sockaddr) that can contain either an IPv4 or IPv6 address without the caller having to know which one it is.

Unfortunately our client code had its own IP-address type which didn’t support IPv6, and it also assumed that an IP-address would fit in 4 bytes in numerous places. Extending the IP-address type to support IPv6 and fixing the address size assumptions touched a lot of code but was relatively uncontroversial.

There existed a set of patches to the client that did most of the work already, but those patches had never been merged. The patches had bit-rotted and so a lot of the work consisted of rebasing and updating the old changes.

One thing that was debated extensively was host name resolution strategies. In the old code a host name was resolved to a set of IP addresses and one of those addresses was picked at random. With IPv6 the host name lookup could return a mixed set of IPv4 and IPv6 addresses. That would indicate that the host was reachable both over IPv4 and over IPv6. But picking a result at random would cause the Android client to use the unstable CLAT whenever an IPv4 address was be picked – an undesirable outcome.

It turns out that the host name resolver (getaddrinfo()) in modern operating systems implements RFC3484 (Default Address Selection for IPv6) which means that they order their results according to how likely the results are to work. The solution we ended up with was to remove the randomisation and rely on the order established by getaddrinfo().

To avoid getting stuck on a bad IP a simple blacklist was also implemented: whenever a connection to the backend fails the failing IP address is excluded from further retries until all alternate addresses have been tried.

IPv6 support was enabled in the Spotify Android client in version 2.9.0. Today about 4% of all backend connections made from an Android device originate in an IPv6 network.

j_panik

So what about our other clients? Up until now there has been no compelling reason to enable IPv6 support other than for the good of mankind.

However, Apple is now talking about making IPv6 compatibility a requirement for apps in the App Store. Therefore the iOS client has just enabled IPv6 support, which is gradually rolling out starting in version 4.3.0.

Our partner Facebook is also pushing IPv6 and they are implementing an IPv6-only network in their offices. The desktop client does not work in an IPv6-only network so the Facebook engineers are sad because they can no longer rock. Sad Facebook is lifting the IPv6 priority for the desktop client as well.

If we get all Spotify clients to support IPv6 then we have solved a big part of the problem Spotify causes ISPs and mobile phone operators. They would no longer have to implement CLAT solutions in phones and on customer computers on account of Spotify. A PLAT, an IPv6-IPv4 translator at the border of their network, would still be required. But a PLAT is much less problematic because it doesn’t affect client machines and because it will be required anyway for the foreseeable future since so many web sites are still not reachable via IPv6.

Ideally we should support IPv6 end-to-end though. That means enabling IPv6 in the client-facing backend services. That story is yet to be told…

j_learned

What did we learn from IPv6ifying the Android client?

While the actual changes touched a lot of code, deployment has been remarkably problem free. We expected a lot more problems with IPv6 networks lacking IPv4-connectivity. So far we’ve discovered no connectivity problems caused by IPv6, knock on wood.

Networking APIs, particularly in the area of host name lookups, are notoriously finicky and display subtly different behaviour on different platforms. We did find a few bugs caused by these incompatibilities :

  • getaddrinfo() on Android does not return IPv6 results when the ‘hints’ parameter is NULL, as required by POSIX. A hints parameter with the field ai_flags set to PF_UNSPEC is required to get both IPv4 and IPv6 results.
  • boost::asio::ip::address::to_string() on OSX formats the string representation of IPv6 addresses differently than Windows and Linux/Android.

Since we implemented a testing network for IPv6 and the client changes were tested on all client platforms before being merged we found these issues before they could be found by our users.

One observation is that a substantial amount of time was spent after the IPv6 changes had been deployed proving that IPv6 was not the cause of sundry network issues. Whenever you change things in the client network layer, be prepared to own all client network glitches until further notice.


 

Creative Commons License
The pictures by Jonas Ekman are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Comments