Unnatural Keys

Nature doesn’t come with identifiers.

Matt Schellhas
Better Programming

--

DALL-E — “A photograph of meat in the shape of a key on a solid background”

At time of writing, I am working in the music industry. And as part of that work, we want a database of all of the songs in the world so that we can properly identify unknown songs and provide attribution so that folks can get paid appropriately. It is a noble goal with some interesting engineering challenges.

There’s also some… less interesting engineering challenges.

One is a bit self-inflicted. The first instinct for every DB person when faced with the “database of all the songs in the world” problem is to go with a natural key. They think: “there’s a bunch of IDs we have to store anyways that the business cares about. That’s the definition of a natural key! Let’s just use them”. After all, there are a lot of songs in the world — slightly more than 100 million, depending on who you ask and what they consider to be a song. Adding our own surrogate means a few hundred megabytes of overhead, excluding indexes on the other IDs that the business cares about.

There’s even industry standards that should take care of this for us. ISRC is literally the ISO standard (ISO 3901) for “uniquely identifying sound recordings”. And if you’ve worked in software for any length of time, you know that it does not.

For example:

  • Not all sound

--

--