miércoles, 23 de enero de 2013

Your Database Is Probably Terrible

Databases aren't sexy, but they're the absolute foundation of the tech world, the ground on which all of its edifices are constructed. You probably use a hundred every day. At least. They're like the Spice in Dune: "S/he who controls the database, controls the universe!" Well, don't look now, but that universe is beginning to quake.

In the beginning was the flat file, and lo, it was pretty awful, so help us Codd. Then came SQL and Larry Ellison, who inexplicably became the world's sixth-wealthiest man on the back of the thoroughly mediocre Oracle database. (I once spent several months as an Oracle developer. It was the longest several months of my professional life.)

For a long while the relational-database triumvirate of Oracle, IBM's DB2, and Microsoft's SQL Server ruled unchallenged, with Oracle first among equals. Then came the open-source revolution, and MySQL and PostgreSQL. (As well as SQLite, which runs on mobile devices. But everyone knew 10 years ago that that was just a niche market. Right?)

Then Web 2.0 hit, and it turned out that relational databases did not scale well. Oh, they were fine up to a point. But when the whole world starts hitting your server? Your server falls over. Facebook still runs on MySQL, but they have to jump through many (expensive) brilliant hoops to do so. Same with Twitter. Reddit abandoned traditional relational-database design entirely.

Google tackled this problem before anyone else, as usual, with its BigTable database, which scaled brilliantly…at the cost of several quirks, including forbidding more than one inequality operator per query. (For instance, if you have a database of "boxes," you can search for boxes with "length greater than 5? or "width greater than 3?…but not both at the same time. I'm actually very fond of BigTable, which I've used fairly extensively, but I have on multiple occasions found this infuriating.) This ushered in the age of NoSQL. MongoDB, CouchDB, arguably Redis, etc: They scaled like crazy! And they were really easy to use!

Unfortunately, they had to sacrifice a few things that relational databases were good at. Like transactional integrity. You could have a database that scaled really well, or you could have ACID (Atomicity, Consistency, Isolation, Durability. Trust me, these are all important things.) But you couldn't have both. Everyone knew that. Engineering is a question of optimizing compromises, that's all. You always have to compromise. Everyone knows that.

Well. Almost everyone.

It turned out that within Google, there were a whole lot of people who didn't think too much of BigTable and its limitations. So they went and built Spanner, a relational database that doesn't just scale, but scales across the planet. Meanwhile, FoundationDB was attacking the problem from one end, creating a datastore that is both NoSQL and ACID, while Clustrix was making a relational database that can happily and seamlessly scale horizontally as you add more servers. (As of this week they launched on Amazon Web Services, too.)

Database admins are famously conservative. Which makes sense. You don't want to mess with your data. Again, it's the ground on which everything else is built. And once you've gone and built an entire system upon a database, the last thing you want to do is migrate to another one. But at the same time, DB technology has been advancing by leaps and bounds, especially of late.

So the database(s) you're using at your workplace? They're probably not the best available; in fact, they're probably pretty bad, relatively speaking; and that's probably not going to change anytime soon. It's food for thought the next time you expect some new technology to thoroughly revolutionize the world just because it's better than all its competition. Most of the world doesn't want to be revolutionized. Most of the world likes its databases just fine. You can't convince them to change; you have to drive them to it.

No hay comentarios:

Publicar un comentario