Rough notes on our rationale for switching databases.
After we migrated to PostgreSQL last week, some commenters asked for more detail about technical rationale. I said I'd write something up, and since at this writing I'm sitting in a hotel room in Florida with nothing to do until tomorrow's (Monday's) MAVEN launch window starts, here it is.
The rest of this is more in the way of rough notes on the ups and downs than it is a coherent essay. It's not intended as propaganda for PostgreSQL. Right now I'm pretty happy we decided to put the effort into using it, but be sure to ask me what I think in six months.
(Note that I use "MySQL" herein as a shorthand for "MySQL and its derivatives/forks." No disrespect is intended for anybody's project or trademark. We've been using MariaDB specifically for a while, which I suspect is a good idea for people planning to stay on the MySQL train.)
Ok, so it's a million little things. MariaDB was starting to scare us. We
had this really bad afternoon after Randy realized that a straightforward
reporting query was yielding results that were super-explicitly excluded by
a condition in the WHERE
clause. ("I hate blue things," we says. "Here
are thousands of blue things!" says the database.) We kept getting into
situations where it seemed obvious that a halfway intelligent query planner
would have behaved much differently.
As I have mentioned before, we were sitting around drinking beer in the mountains one day and said well, what about PostgreSQL?
We're not doing any kind of crazy volume. We're working at a level where most of our data can be relational, or at least sort of relational, and we get to do things like join a bunch of tables all the time. There are features it's nice to have in a database if you're going to treat it like an actual relational database. (I have, for example, never had anything but a bad time with MySQL foreign keys. I am positively giddy about materialized views.)
PostgreSQL has a couple of features which make writing an ORM much nicer.
(RETURNING *
on inserts and updates, for example.) The one place
it seems to fall down is that it doesn't have a simple equivalent of
DESCRIBE
.
There's better GUI tooling for the MySQL family (phpMyAdmin is frustrating in a number of ways but good enough that everyone uses it anyway; the Mac users in the house seem to swear by Sequel Pro), but the stock psql CLI is actually pretty good.
The PostgreSQL query planner seems pretty smart.
Case sensitivity. String equality and LIKE
being case sensitive bit us
all over the place after the deploy, but this is clearly a saner default
approach. If you're migrating from MySQL and your users rely on search
features that look at a bunch of strings, you'll want to spend some extra
time on this, because it's one of those places that syntax doesn't really
change and nothing overtly breaks, but people are going to notice right away
that they aren't finding the stuff they used to.
We really need to look into the full text search stuff. And get a better handle on indexes on expressions.
Character encoding is hell.
I haven't used the regex syntax yet, but I'm sure I will sooner or later, if only in one-offs.
Not having a query cache has highlighted some places we were being pretty sub-optimal (read: stupid and/or the wrong kind of lazy). These have pretty much been solvable by chucking stuff at Memcached. For the one really tricky thing, consistent stock numbers on product listings, Casey wrote a little materialized view and a couple of triggers. It's rad. We will almost certainly solve a few other problems the same way.
Documentation and community. The PostgreSQL manual is a thing of quiet beauty. The mailing lists are, so far, pretty good.
Types. If there's an interesting technical story in a project like this, it's probably that we subjected a bunch of PHP code to a stronger, more deterministic type system and immediately discovered entire classes of bugs in our existing work. Many readers are presently saying "well, duh," but I think it's worth emphasizing that even if you're writing schlocky web code in some hacked-together terminally unhip gets-no-respect dynlang, you might still get a lot of mileage out of a more rigorous data store.
I used to think systems that wanted me to be real specific about types were kind of creepy and fascist in outlook, or at best sort of officious and bureaucratic. I'm not sure that's untrue of a number of them, but I'm starting to realize I need all the help I can get when it comes to consistency and precision, which puts the whole question in a different light. I also didn't used to care about unit tests, and now I clutch at them as a drowning man clutches a rope, because I write code in languages that offer almost no built-in mechanisms for formally specifying behavior.
(Shorter alternative to preceding paragraph: When I was younger, I didn't know anything.)
Using PDO against PostgreSQL gives us PHP native types for database values where before we were getting strings for everything (it may be possible to coax something like this behavior out of the MySQL drivers, but we basically stumbled into it here).
PostgreSQL has a boolean type, and does not think that true
is equivalent to 1
.
PostgreSQL cares about NOT NULL
constraints in a way that MySQL does not.
In dealing with this, we changed a certain amount of schema. Just about every time it
came up, we found ourselves fixing a bug we hadn't known existed (or hadn't had any
luck tracking down in the past).
(A strange bug: With prepared queries, PDO will resolutely insist that you're trying to use a null when you give it an empty string for a parameter. This is infuriating to diagnose.)
PostgreSQL hews substantially closer to the ANSI SQL standard than does
MySQL. You can get ANSI-like syntax out of MySQL by setting
SQL_MODE
, but this doesn't mean ANSI-compatible behavior.
SET SQL_MODE='ANSI'
was helpful because it let us do this project in a
couple of phases. We had to change up a bunch of queries, but the vast
majority of them just needed syntax changes and could continue running
against the existing MariaDB install. In exactly 63 places (I just checked)
we had to conditionalize something on type of database. A bunch of those
are reporting queries, which tend to be longer and more complicated; a few
are in libraries that generate SQL. All around, this doesn't seem too bad.
I guess that's about it for now. I'll amend this post if anything else comes to mind.