Introduction:

PostgreSQL has been lauded for its robustness as an open-source object-relational database. With the introduction of each new version, the potency of the PostgreSQL database to store and wield highly proliferating data has also augmented. This also calls for protection of data should any unlikely event befall. PostgreSQL 10, along with many other flagship features, matures the high availability and protection against data loss with the introduction of “quorum commit” for synchronous replication.

In this article, we will discuss about what synchronous replication and quorum commit are and then move on to see how we can use use the keywords “FIRST” and “ANY” under the parameter synchronous_standby_names.

Synchronous vs asynchronous replication:

The primary difference between synchronous replication and asynchronous replication is the way in which data is written to the replica. Most synchronous replication products write data to the primary storage and standby DB simultaneously. Transactions waiting for commit on the primary will be allowed to proceed only after these standby servers confirm receipt of their data. Thus, the primary copy and the replica always remain synchronized. In case of any fail-over event, the replica can be promoted as the master.

In contrast, asynchronous replication products copy the data to the replica after it is already written to the primary storage. Although the replication process may occur in near-real-time, it is more common for replication to occur on a periodic or scheduled basis. In case of a fail-over event, one can expect some data loss.

Quorum Commit for Synchronous Replication:

PostgreSQL 9.6 introduced the feasibility of specifying multiple synchronous standbys with the aide of the parameter synchronous_standby_names. For example, a value of

synchronous_standby_names = '3 (standby_1, standby_2, standby_3, standby_4, standby_5)' 

means that the primary server has to wait for commit confirmations from first 3 available standby nodes among the set of 5 standby nodes before the primary could complete the transaction at its end. The order of the standbys in this list decides the standbys from which confirmation needs to be waited for.

PostgreSQL 10 has ameliorated this feature with “quorum commit.” In a quorum-based synchronous replication, all the standbys appearing in the synchronous_standby_names list will be used as candidates for synchronous standbys. In case should one of them fail, the other standbys will keep performing the role of candidates of synchronous standby. Being able to define quorum sets of synchronous standbys provides more flexibility in some availability scenarios.

To enable this new feature, synchronous_standby_names has been extended with the below set of keywords.

  • FIRST – refers to the PostgreSQL 9.6 version behavior in which priority is assigned by the order of the standbys in the list.
  • ANY – refers to the quorum behavior, meaning that any node in the set can be used to confirm a commit.

We shall go ahead and see the significance of each one.

Synchronous_Standby_Names = ‘FIRST x’

To set the synchronous_standby_names in this mode, we need to execute the alter statement

alter system set synchronous_standby_names='FIRST x (standby_1, standby_2,........... , standby_n) .  

Using “FIRST x” makes transaction commits on the primary to wait until their WAL records are replicated to the specified “x” number of synchronous standbys.

sync_rep=# select pid, application_name,client_addr,state,sync_priority,sync_state 
from pg_stat_replication order by application_name;
  pid  | application_name | client_addr  |   state   | sync_priority | sync_state
-------+------------------+--------------+-----------+---------------+------------
 11698 | standby_1        | 10.150.9.115 | streaming |             0 | async
 11723 | standby_2        | 10.150.8.118 | streaming |             0 | async
 11734 | standby_3        | 10.150.10.26 | streaming |             0 | async
(3 rows)

postgres=# show synchronous_standby_names;
 synchronous_standby_names
---------------------------

(1 row)

sync_rep=# alter system set synchronous_standby_names='FIRST 2(standby_1, standby_2,
standby_3)';
ALTER SYSTEM

sync_rep=# SELECT pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

sync_rep=# select pid, application_name,client_addr,state,sync_priority,sync_state 
from pg_stat_replication order by application_name;
  pid  | application_name | client_addr  |   state   | sync_priority | sync_state
-------+------------------+--------------+-----------+---------------+------------
 11698 | standby_1        | 10.150.9.115 | streaming |             1 | sync
 11723 | standby_2        | 10.150.8.118 | streaming |             2 | sync
 11734 | standby_3        | 10.150.10.26 | streaming |             3 | potential
(3 rows)

In the above case, we have chosen “first 2,” which means that the standby nodes, standby_1 and standby_2, once after receiving the WAL records need to send the acknowledgement to the primary and only then commits happen on the primary. The sync_priority level goes down from the left to right, with the left first node being of high priority 1.

Synchronous_Standby_Names = ‘ANY x’

To set the synchronous_standby_names in this mode, we need to execute the alter statement

alter system set synchronous_standby_names='ANY x (standby_1, standby_2,...........
 , standby_n) .

The usage of the second keyword “ANY” sets the sync_state to quorum and sync_priority to 1 in all the nodes. Thus, any two of the three nodes, whichever receives the WAL records records first, can send the acknowledgement so that the transaction waiting on the primary can continue.

sync_rep=# alter system set synchronous_standby_names='ANY 2 (standby_1, standby_2
, standby_3)';
ALTER SYSTEM

sync_rep=# SELECT pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

sync_rep=# select pid, application_name,client_addr,state,sync_priority,sync_state 
from pg_stat_replication order by application_name;
  pid  | application_name | client_addr  |   state   | sync_priority | sync_state
-------+------------------+--------------+-----------+---------------+------------
 11698 | standby_1        | 10.150.9.115 | streaming |             1 | quorum
 11723 | standby_2        | 10.150.8.118 | streaming |             1 | quorum
 11734 | standby_3        | 10.150.10.26 | streaming |             1 | quorum
(3 rows)

Testing the concept

Test Environment:

OS: CentOS 7 DB: PostgreSQL 10.1

Primary Node:

Primary Node: 10.150.9.110

StandbyNodes:

Standby_1:  10.150.9.115
Standby_2:  10.150.9.118
Standby_3:  10.150.10.26

First off, let us first see how we can use the keyword ‘first’ and what happens at the background. We shall go ahead and set the synchronous_standby_names with a value of ‘first 2’. To achieve this we should execute the statement

alter system set synchronous_standby_names='first 2(standby_1, standby_2, standby_3)' 

and then reload the configuration. This implies that the first two standbys should provide receipt of WALs to the primary and only after which the transactions on the primary would commit. As we can see below, standby_1 and standby_2 have sync_priority of 1 and 2 respectively with sync_state as sync, while standby_3 has a sync_priority of 3 with sync_state as potential. Potential here means that standby_3 is the potential node that would be moved up as ‘sync’ should any of the other nodes (standby_1 or standby_2) should go down.

sync_rep=# show synchronous_standby_names ;
        synchronous_standby_names
------------------------------------------
 first 2(standby_1, standby_2, standby_3)
(1 row)

sync_rep=# select pid, application_name,client_addr,state,sync_priority,sync_state 
from pg_stat_replication order by application_name;
  pid  | application_name | client_addr  |   state   | sync_priority | sync_state
-------+------------------+--------------+-----------+---------------+------------
 12961 | standby_1        | 10.150.9.115 | streaming |             1 | sync
 12973 | standby_2        | 10.150.8.118 | streaming |             2 | sync
 12976 | standby_3        | 10.150.10.26 | streaming |             3 | potential
(3 rows)

Consider the case when insertion of 10 million records in table test1 is progressing and due to some unknown reasons one of the priority nodes goes down. Then, standby_3 will be assigned to the quorum. Let us simulate this situation by bringing down standby_1.

sync_rep=# insert into test1 values (generate_series (1,9000000));

--- standby_1 is brought down 
/usr/pgsql-10/bin/pg_ctl  -D /opt/pg10/stdby1/data stop

--- standby_3, which was the potential node has now become 'sync'
sync_rep=# select pid, application_name,client_addr,state,sync_priority,sync_state from pg_stat_replication order by application_name;
  pid  | application_name | client_addr  |   state   | sync_priority | sync_state
-------+------------------+--------------+-----------+---------------+------------
 12973 | standby_2        | 10.150.8.117 | streaming |             2 | sync
 12976 | standby_3        | 10.150.10.24 | streaming |             3 | sync
(2 rows)

As standby_2 is down, the priority levels have been re-arranged among the nodes, thereby making the “potential” node to be in sync. Receipt of WALs must be acknowledged from standby_3 before transactions on primary can go further. In the above scenario, since receipt of WALs are continued to be sent from standby_2 and standby_3, the insert statement continues on the primary and commits.

Later, when standby_1 comes back up, it tries to synchronise itself with the primary node. We can see the state “catchup” of standby_1.

--- when the standby_1 comes back up, it catches up with the Primary
sync_rep=# select pid, application_name,client_addr,state,sync_priority,sync_state from pg_stat_replication order by application_name;
  pid  | application_name | client_addr  |   state   | sync_priority | sync_state
-------+------------------+--------------+-----------+---------------+------------
 16453 | standby_1        | 10.150.9.114 | catchup   |             1 | potential
 12973 | standby_2        | 10.150.8.117 | streaming |             2 | sync
 12976 | standby_3        | 10.150.10.24 | streaming |             3 | sync
(3 rows)

To enforce the keyword ‘ANY’, go ahead and execute the statement

alter system set synchronous_standby_names='ANY 2(standby_1, standby_2, standby_3)'

and then reload the configuration of primary. Here in this case, acknowledgement of receipt of WALs to the primary can be provided by any two standbys for the transactions on the primary to continue and commit.

In case if the node standby_1 is down, the other two nodes standby_2 and standby_3 having sync_priority 1 acknowledges the WALs.

sync_rep=# select pid, application_name,client_addr,state,sync_priority,sync_state 
from pg_stat_replication order by application_name;
  pid  | application_name | client_addr  |   state   | sync_priority | sync_state
-------+------------------+--------------+-----------+---------------+------------
 13985 | standby_1        | 10.150.9.115 | streaming |             1 | quorum
 15297 | standby_2        | 10.150.8.118 | streaming |             1 | quorum
 11734 | standby_3        | 10.150.10.26 | streaming |             1 | quorum
(3 rows)

-bash-4.2$ /usr/pgsql-10/bin/pg_ctl -D /opt/pg10/stdby1/data  stop
waiting for server to shut down.... done
server stopped

sync_rep=# select pid, application_name,client_addr,state,sync_priority,sync_state 
from pg_stat_replication order by application_name;
  pid  | application_name | client_addr  |   state   | sync_priority | sync_state
-------+------------------+--------------+-----------+---------------+------------
 15297 | standby_2        | 10.150.8.118 | streaming |             1 | quorum
 11734 | standby_3        | 10.150.10.26 | streaming |             1 | quorum
(2 rows)

Conclusion:

There is usually a trade-off between functionality and performance. Although performance takes higher precedence and has been considered as the elite choice in many environments, finance and banking milieus, where it is all about data protection, prefer to go with synchronous replication.