Spotify supports unicode usernames which we are a bit proud of (not many services allow you to have ☃, the unicode snowman, as a username). However, it has also been a reliable source of pain over the years. This is the story of one time when it bit us pretty badly and how we spent Easter dealing with it.
Good Friday gone bad
Some years ago, late on Good Friday, a user posted on the Spotify support forum that he and a friend could hijack user accounts. Our forum manager challenged the user to take over his account, and within minutes the manager’s account had a new playlist added and a new password.
Pwning an account
A bunch of us dropped whatever we were working on and scurried to try to understand what was going wrong and how to fix it. From the forum post we knew that taking over an account went something like this:
- Find a user account to hijack. For the sake of this example let us hijack the account belonging to user bigbird.
- Create a new spotify account with username ᴮᴵᴳᴮᴵᴿᴰ (in python this is the string u’\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30′).
- Send a request for a password reset for your new account.
- A password reset link is sent to the email you registered for your new account. Use it to change the password.
- Now, instead of logging in to account with username ᴮᴵᴳᴮᴵᴿᴰ, try logging in to account with username bigbird with the new password.
- Success! Mission accomplished.
From the log lines associated with the hijacking of the forum manager’s account it appeared to be a problem with how we derived a canonical username from the username the user chooses at registration, but we were still pretty much in the dark. We had no option except to disable account creation until we could prevent the attack.
What the heck was going on?
Forbidden and equivalent characters in usernames
If you allow your users to pick their usernames too freely they may accidentally shoot themselves (or you) in the foot. For instance, it is probably good to
- not allow white space in usernames,
- treat “BigBird” and “bigbird” as the same username.
The first is an example of forbidding certain characters in usernames and the second is to treat some characters (‘B’ and ‘b’) as equivalent. The latter is often implemented by canonicalizing the username. If we only allow the letters a-z and A-Z then we could canonicalize a username by mapping all characters to lower case:
canonical_username = username.lower() # in python
So ‘BigBird’, ‘Bigbird’ and ‘bigbird’ would all be mapped to ‘bigbird’. We refer to ‘BigBird’ as the verbatim username and the remapped ‘bigbird’ as the canonical username. When an account is created the canonical username needs to be unused, so if one user enters ‘BigBird’ and another enters ‘bigbird’, only one of them will be allowed to create the account.
Lower casing has the key property of being idempotent, i.e., that applying it more than once has no effect: x.lower() == x.lower().lower(). So if a username gets passed from service to service and you want to make sure it is in canonical form you can safely apply .lower() and if it was already in canonical form there is no harm done, and it is easy to stay safe.
When Ω is not the same as Ω
If you allow non-ascii characters this becomes even more important, since lots of different characters look very similar. For example it is hard to see the difference between Ω and Ω even though one is obviously a Greek letter and the other is a unit for electrical resistance and in unicode they indeed have different code points. Treating two so similar looking characters as different when used in usernames is likely to cause problems and confusion, so we distinguish between verbatim usernames and canonical usernames. While the Omega and Ohm characters are different when used in verbatim usernames they are mapped to the same character in canonical usernames. Just simple lower casing will not be enough, obviously.
XMPP’s nodeprep canonicalization method
Fortunately there was no need to roll our own canonicalization. The problem was already solved in XMPP, and the method was implemented in the python framework twisted which we used for lots of backend services at the time. The code we used was more or less:
from twisted.words.protocols.jabber.xmpp_stringprep import nodeprep def canonical_username(name): return nodeprep.prepare(name)
It sounds like this should work, so again, what the heck was going on?
Tracking down the cause
It was easy to test one of the usernames used in the proof of concept. Let us see what happens when we tried ᴮᴵᴳᴮᴵᴿᴰ.
>>> canonical_username(u'\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30') u'BIGBIRD' >>> canonical_username(canonical_username(u'\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30')) u'bigbird'
Not so good since the function apparently was not idempotent, but at least it provided insight into why the attack worked. When you registered an account, canonical_username got applied once, and an account with canonical username ‘BIGBIRD’ got registered which was allowed since it did not collide with the existing account with canonical username ‘bigbird’. When resetting the password for ‘ᴮᴵᴳᴮᴵᴿᴰ’ canonical_username was applied once, so the email to send the password reset to got sent to the address associated with the newly created account with canonical username ‘BIGBIRD’. However, when the link was used, canonical_username was once again applied, yielding ‘bigbird’ so that the new password was instead set for the ‘bigbird’ account. We were relying on nodeprep.prepare being idempotent, and it wasn’t.
Duct taping the security hole
At this point, a few hours into the incident, we did reopen registration but with a restriction on the usernames you could register. You were only allowed to register username X if X==canonical_username(X).
def safe_name(X): return X == canonical_username(X)
If the new username was already a fixpoint, it should be safe. Still, we wanted to find out what had gone wrong. Could the method for computing canonical usernames based on nodeprep.prepare() be salvaged? If not we would be in trouble since we use canonical usernames in various databases so that changing how to derive them in a non-backwards compatible way would be quite costly.
Why did nodeprep.prepare() fail to be idempotent?
First we looked at the source code for the twisted module but as it was closely based on http://tools.ietf.org/html/draft-ietf-xmpp-nodeprep-03 we looked at that as well. The draft describes a relatively complicated transformation of unicode strings to get canonical representations. The draft explains that you may need to iterate the transformation until you reach a fixpoint, but for the convenience of implementors the draft includes tables for how to remap unicode code points and the tables let you look up the fixpoints rather than iterating the mapping.
However, at the very beginning of the draft it says
o The character repertoire that is the input and output to stringprep: Unicode 3.2, specified in section 2
Reading on, the draft does specify that you should check that the output you get is admissible, but it never tells you to check that the input is unicode 3.2. The draft does not stress checking the input, nodeprep.prepare did not check the input, and neither did we. It turns out that the code points making up ᴮᴵᴳᴮᴵᴿᴰ are not part of unicode 3.2.
So that was what the heck was going on.
The final fix
We reported the problem to the twisted developers, but we couldn’t wait for a patch so we needed a safe fix that we could apply ourselves. Actually checking that a username only contains unicode 3.2 code points is a bit tedious, and the actual problem was that nodeprep.prepare was not idempotent (albeit outside unicode 3.2). So the fix instead addressed the problem that we don’t want usernames where nodeprep.prepare is not idempotent. We wrote a small wrapper function around nodeprep.prepare that basically calls the old prepare function twice and rejects a name if old_prepare(old_prepare(name)) != old_prepare(name).
What then remained was some cleanup. Find identify handfull of compromised accounts, which due to the nature of the bug was actually easy. We just needed to find the accounts with incorrect canonical usernames and from them we could find the corresponding, hijacked, accounts.
And that is the end of our story, or so I thought…
The final twist
When writing this blog post I checked back with the twisted community since it involves an issue in their code base which has security implications, and I found out two things. First, the issue is fixed as of twisted version 11.0.0, and second the bug was not actually there from the start. It came into being when upgrading from python 2.4 to python 2.5.
Twisted’s code imports the module unicodedata in the standard python library. This module changed between python 2.4 and python 2.5. The python 2.4 version causes the twisted code to (correctly) throw an exception if the input is outside unicode 3.2, whereas no exception is thrown when using unicodedata from python 2.5, instead causing incorrect behavior in twisted’s implementation of nodeprep.prepare()
So changes in the standard python library from one python version to the next introduced a subtle bug in twisted’s nodepre.prepare() function which in turn introduced a security issue in Spotify’s account creation.
- This stresses the importance of validating user input. In this case we had to peel back quite a few layers to find out what the requirements on the input actually were.
- This was not the first or last time that fancy characters in usernames caused us pain, and I’m confident that it will keep biting us from time to time. However in a global market limiting the alphabet to ASCII is not an attractive option, so if you do decide to bite the bullet and support international characters, be aware that there are plenty of pitfalls and gotchas. Programming language and library support for unicode isn’t always as mature as one might hope.
- When users expose vulnerabilities, avoid antagonizing them if possible. They can probably provide valuable help on how to reproduce and perhaps even how to fix the issue. In this case the two users who posted to the forum where actually rewarded with some Spotify premium months.
- Normally, upgrading is a good way to get rid of bugs and security holes, but every once in awhile an upgrade packs a wallop.
And finally, the account bigbird was not among the attacked accounts. I just picked that as an example name.