Tuesday, January 18, 2011

What Kind of Replication Do You Need?

As you probably know by now if you're a regular reader of Planet PostgreSQL, or if you've read the PostgreSQL 9.0 release notes, PostgreSQL 9.0 offers a much-improved form of built-in replication.  In PostgreSQL 8.4 and prior releases, replication was possible via WAL shipping, but the delay between master and standby could be several minutes, or even longer, depending on configuration parameters.  This delay has been essentially eliminated in PostgreSQL 9.0, which allows the write-ahead log to be streamed from master to standby as it's generated.

But it's still asynchronous replication - as opposed synchronous replication, which has been proposed for inclusion in PostgreSQL 9.1.  If the master crashes just an instant after processing a COMMIT statement, the client will believe that the transaction has committed, but the slave won't know about it.  If the slave is then promoted to become the new master, the transaction is gone forever.

Synchronous replication can prevent this.  It does so by waiting.  After processing the COMMIT statement locally, it waits for an acknowledgment that the standby has also processed the COMMIT.  It does not tell the client that the COMMIT was successful until the COMMIT has been processed both locally and on the standby server.   Thus, if the master is lost and the standby is promoted to master, every transaction which was acknowledged to the client is guaranteed to be reflected on that standby.

But notice that this requires an awful lot of the database client.  Ask yourself this question.  If your application sends a COMMIT command to the database, and it never gets an answer back, what does it do?  Does it assume the COMMIT failed?  Does it assume the COMMIT worked?  Either approach is clearly wrong, even if replication is not in use.  After all, if the database crashed, it could have done so either before or after processing the COMMIT statement; your application has no way of knowing which it is.  To be correct, your application must re-establish a database connection as soon as possible, examine the database, somehow determine whether or not the transaction in question committed prior to the crash, and if not, redo the whole transaction.  Does your application do that?

If it doesn't, synchronous replication isn't going to provide you with any additional protection against data loss/corruption.  My guess is that most applications probably assume that a database which disappears in mid-COMMIT has failed to process the COMMIT.  For example, the application might tell the user something like "Database not responding.  Please try your request again later."  Implicitly, we're telling the user that the request didn't work, and he needs to retry it.  But it may be that the database didn't actually fail until after the COMMIT occurred, but before sending the response back.  If the user isn't careful, he may end up performing the operation twice, which could be pretty unfortunate if, for example, it's a financial transaction.  Of course, it would be just as bad if we told the user that his action was performed, because that's not certain to be true either, and the transaction might be lost altogether.

To be safe, we have to either tell the user that we don't know what happened, and hope he or she exercises great caution; or we have to figure it out ourselves behind the scenes.  For example, suppose the user is buying Super Bowl tickets.  We issue UPDATE statements to assign the seats to the purchaser and an INSERT statement to record a charge to the purchaser's account, and try to commit.   Now a crash happens, and we tell the user "We are experiencing technical difficulties.  We apologize for the inconvenience and will send you an email shortly to let you know the status of your order."  We then make a record - somewhere - of the fact that this order needs to be resolved once the system is back on-line.  After everything's back up, we check whether the seats for this order are listed as assigned (i.e. we test whether or not the COMMIT we never heard back about did actually go through or not).  If so, we email the user and say "We apologize again for the delay.  Your seats are confirmed!"  If not, we email the user and say "We are very sorry, but your order did not go through.  Please try your transaction again."

Notice that in this scheme, the client is required to remember the state of the transaction until it receives a COMMIT acknowledgment, but not a moment longer.  If we lose connectivity to the database, we still have to remember what we were doing and still handle things in some kind of reasonable way.  But the instant we get that COMMIT back, we can announce that the transaction was certain successful and forget about it - it's now the database's job to track the information.  If your application handles things this way, you might be a candidate for sync rep.  If you use asynchronous replication, the tiny-but-not-zero window after the COMMIT when the transaction could still be lost is a problem for you.  Of course, you still have to ask yourself whether you can withstand the performance loss of waiting for two servers to commit every transaction rather than just one.  I haven't seen performance numbers for synchronous replication yet, but the hit will presumably be in excess of 50%, since we have to wait for two disks to write the commit record, one after another, with some network delay in between and a little more at the end.

Now, on the flip side, if your application doesn't have some kind of handling along the lines described above, you probably won't derive any benefit from synchronous replication.   Synchronous replication doesn't, after all, guarantee that the two servers have exactly the same state at all times - there's no physically practical way to guarantee that two drive heads in two different disk drives write the data on disk at precisely the same instant.  It does guarantee that the slave will never fall behind the master, but it does so by slowing down the master to a speed that the slave can keep up with.  Think about whether it might be better to handle this problem using monitoring: let the master run at full speed, and let arrange to have the system send you a text message if the slave starts to fall behind.

None of this is to say that synchronous replication isn't right for some people - it certainly is, especially people recording transactions in which large amounts of money change hands.  Failure to properly record such a transaction could be very serious, and the large amount of money involved justifies making the whole system smart enough to be robust against mid-commit failures and other problem cases.  But it's important to keep in mind is that there is no certain way to prevent data loss.  Even if you have two servers, or three servers, or ten servers, the same calamity can destroy all of them simultaneously - especially if the calamity is a software bug, or, even more likely, a user error, like typing "DELETE FROM sometable" and forgetting the WHERE clause.  In this case, replication - whether synchronous or not - is just a tool for distributing that mistake very reliably across several machines; what you really need is a good back-up strategy.

So, as with any other technology, neither replication in general nor synchronous replication more specifically should be deployed blindly or thought of as a silver bullet.  A complete solution will typically involve some combination of replication, vigilant monitoring, good application design,  appropriate hardware selection, careful analysis and testing of failure scenarios, regular backups, internal controls, and other best practices.


  1. I haven't been following this closely. I always assumed that the main point of synchronous replication is that, by the time an application completes a COMMIT, all the data is also accessible on the slaves. IOW, that slaves don't return stale data, which is important for load balancing.

    But now I realize that this may not be true. If you define synchronous replication as "slave synced WAL to disk", it doesn't necessarily mean that it has processed the WAL yet?

    Also, you say that sync replication will result in a loss of performance. Sure, individual transactions will take longer, but aggregate throughput of the server shouldn't decrease significantly, because the backend has freed all its locks and is just sleeping -- or am I missing something?

  2. There are indeed several possible definitions of synchronous replication - WAL received, WAL fsync'd, WAL applied. Ultimately we should probably support all of them; but you're right that the discussion above focuses (too?) specifically on the "fsync" level. I'm not sure which version(s) Simon's patch supports - it's been discussed on the list a few times, but I'm not sure what the current state of the patch is. Using "WAL apply" mode to keep the slaves up to date with respect to the master is another valid use case for this feature.

    In terms of performance, your comments about throughput are correct to a degree, but not completely, since using more process slots slows down ProcArray manipulations and generally drags on system performance; and of course some applications are latency-sensitive.

  3. In my case I've got a basic tagging application, with the provision that tags really can never be lost (financial transactions). The entire value of the app comes from durability.

    It's fine for a user to tag transactions twice, but once the user sees the green AJAX generated check mark the item needs to be solid. There's a bunch of other components with same requirement. Not ok to lose a green checkmark / transaction, but ok to fail (even frequently if needed). All OK to repeat in this case.

    For me, WAL received as the sync rep parameter is fine.