PostgreSQL Clustering made easy with PostgresHA.

Authored jointly with Scott Mead, Sr. Architect OpenSCG and Architect of PostgresHA.

PostgresHA is Highly Available (HA) PostgreSQL.

High Availability

High availability is not new. Financial trading systems and the telephone network were federally mandated to provide highly availability. People didn’t do HA because it was a fun project, they did it because they were told to.

What is new is that EVERYTHING these days is HA. Your email and blog posts on Google, your shopping cart on Amazon, your pictures on Facebook. People have gone from doing this because it’s critical for their business and doing it because its good for business.

Open Source is also good for business and by virtue of all the components of a Big Data solution being publicly available some very smart people have gone to the trouble of making this stuff accessible to the likes of you and me. They have made it so that we can understand it and use it. You don’t have to be Ma Bell or the NYSE to enjoy the benefits of always available computing.

PostgreSQL

The worlds most advanced Open Source Database. PostgreSQL has synchronous replication. In computer theory terms it is called 2-safe replication, in English it means that you have two databases instances running simultaneously and your Primary (Master) database is synchronized with a Secondary (Slave) database. Unless both databases crash simultaneously, you won’t lose data. If your databases are on separate machines in separate racks in separate buildings you can reduce the possibility of simultaneous failure to near zero. Building a HA solution that survives an Extinction Level Event like a meteor strike is beyond the scope of this paper and would not get read anyway.

With synchronous replication, each write waits until confirmation is received by the transaction log of both the Master and Slave. This increases the transaction response time and the minimum is the roundtrip time between them. All two-phase commit actions require waits (prepare and commit.)

However, there is good news, read only transactions and transaction rollbacks need not wait for replies from standby servers (PostgreSQL, We’re optimized for Rollback!) Only top level transactions require commits, sub-transaction commits do not need to wait for standby servers and long running data loading or index building only need the very last commit. Good for speed, but if you lose it halfway through, you lose it all – you pays your money, you takes your choice.

PostgreSQL makes a uses the same concept of active standby as ZooKeeper – keep a standby hot for fail over (Master / Slave, Primary / Secondary, Active / Standby – you say potato I say potato.)

OpenSCG makes PostgreSQL HA with the Open Source project PostgresHA .

At a high level, PostgresHA monitors both Master and Slave databases. If it spots something that could be a failure it enters a state of heightened awareness and starts yelling about it. Heightened awareness means continuous pinging, more pinging and deeper pinging.

At some point it’s serious enough to warrant failover.

This is where you need to consider what you want (In advance).

Some things (like a head crash) you want to know right away and react as soon as possible, and things like auto failover sound like a good idea.

It’s possible that there is a transient problem that no amount of intelligent probing is going to find out and it will go away without causing too much trouble. In this case you probably don’t want to fail over automatically, you might want to watch it for a little bit and when you can bear it no more, you punch a big red failover button.

Sometimes you might want to crank up the sensitivity of the probes anyway just to see what nominal behavior looks like.

PostgresHA essentially does this for you.  The great part is that OpenSCG has a free distribution of PostgreSQL HA here!

Here’s how:

PostgresHA Architecture

Your Applications (clients) attach to a connection pooler (pgBouncer). The connection pooler simply holds all the client connections and switches them between a smaller number of database connections. Everyone does it, it saves expensive database connection resources. Another nice property is that it can switch these connections to a different database if needed.

PostgresHA Steady State.

PostgresHA Steady State

With Streaming Replication in PostgreSQL, we can have a Master and Slave running synchronously, and through pgBouncer we can route read-only connections to the slave and write transactions to the Master (nice). To protect against a bad hardware failure, you can put the Master and Slave databases in different racks. For Disaster Recovery, you can use log streaming to replicate to a separate data center.

PostgresHA handles failure scenarios the following way:

Master cannot communicate with Slave

PostgresHA, Master Slave Disconnect

Slave becomes isolated:

PostgresHA, Slave Isolated

Master Isolated:

Master Isolated