In PostgreSQL, there are two types of replication features: streaming replication (physical replication) that collectively replicates a database cluster, and logical replication that replicates in units of tables and databases.
In this article, we focus on the mechanism and configuration of streaming replication that replicates a database cluster in batch.
Streaming replication, a standard feature of PostgreSQL, allows the updated information on the primary server to be transferred to the standby server in real time, so that the databases of the primary server and standby server can be kept in sync. The streaming replication features below bring great benefits to your system.
You can process read-only SQL on the standby server to distribute read-only workload. However, PostgreSQL does not have a distribution feature that considers queries and server load. By using the load balancing feature of pgpool-II, an open source extension, read-only queries can be efficiently distributed between the PostgreSQL instance, and workload can be balanced.
To understand how streaming replication works, let's take a closer look at what to transfer and how to transfer it.
PostgreSQL saves the updated information of the primary server as a transaction log known as write-ahead log, or WAL, in preparation for crash recovery or rollback. Streaming replication works by transferring, or shipping, the WAL to the standby server in real time, and applying it on the standby server.
WAL shipping between the primary server and the standby server is performed by the WAL sender process on the primary server to the WAL receiver process on the standby server. These processes are started by setting postgresql.conf and pg_hba.conf parameters – we will discuss them later in this article.
This section describes the setups that can be configured with streaming replication, and extends the discussion to synchronous and asynchronous replications.
Streaming replication can be built in a 1:N configuration, where only one primary server is configurable, but multiple standby servers can be set up. The configuration that connects (ships WAL from) a primary server to all standby servers is called a multi-standby setup. You can also build a cascade setup where a standby server connects (ships WAL) to another standby server.
For streaming replication, you can select either synchronous or asynchronous replication for each standby server. This section describes the characteristics of each, and key points for configuration.
The difference between synchronous replication and asynchronous replication is whether or not to wait for the response from the standby server before completing the processing on the primary server. Choose the configuration according to your operation, since this affects the response time of SQL processing and high availability.
Synchronous setup is configured by synchronous_standby_names in postgresql.conf on the primary server. If there are multiple standby servers, you can specify the servers to be synchronized and the order of priority for COMMIT. Standby servers not specified in this parameter will be asynchronous.
Example: 'synchronous_standby_names = FIRST 2 (s1, s2, s3)'
You can also specify the method by which to select synchronous standby servers from a list of servers, such as 'FIRST n (list)' or 'ANY n (list)'. The FIRST keyword specifies that the n first standby servers in list will be synchronous and that transaction commits will wait until their WAL records are replicated to the first n standby servers in list. The ANY keyword specifies a quorum-based synchronous replication and that transaction commits should wait until their WAL records are replicated to at least n standby servers in list.
For example, in an environment using standby servers s1, s2, s3, and s4, setting synchronous_standby_names to 'FIRST 2 (s1, s2, s3') will configure s1 and s2 for synchronous replication, and s3 and s4 for asynchronous replication. The primary server will wait s1 and s2 to complete processing before it commits. If s1 or s2 fails, then s3 changes to synchronous replication.
Make sure that the name set in synchronous_standby_names matches the name set in application_name of primary_conninfo in postgresql.conf on the standby server. This parameter is discussed later in this article.
To set the synchronization level of the standby server, set synchronous_commit in postgresql.conf on the primary server. This section describes the values that can be set and their overview. The circled numbers in the 'Guaranteed range' column of the table match the numbers in the diagram directly below the table.
|Sync level||Set value||Overview||Guaranteed range|
|Full synchronous||remote_apply||Commits wait until after the WAL is replicated to the standby servers and the updated data on the standby server is available to read-only queries. It is fully assured that the data is in sync, so it is well suited for load balancing of read-only workloads that require data freshness.||1 to 9|
|Synchronous||on (default)||Commits wait until after the WAL is replicated to the standby servers. This setting achieves the best balance between performance and reliability.||1 to 6|
|Semi-synchronous||remote_write||Commits wait until after WAL has been shipped to the standby servers.||1 to 5|
|Asynchronous||local||Commits wait until after the WAL write on the primary server is completed.||1 to 2|
|Asynchronous||off||Commits do not wait for the WAL write on the primary server to complete. This setting is not recommended.||1|
We list below the parameters required to use the streaming replication feature, setting up synchronous replication and archiving the WAL.
Set the values below in postgresql.conf and pg_hba.conf on the primary server.
|postgresql.conf||listen_addresses||'*' for all available IP interfaces.|
This value cannot exceed the value set in max_connections.
|max_connections||Maximum number of simultaneous connections to the database server.
Set a value greater than max_wal_senders.
|wal_keep_segments||Minimum number of file segments to keep in the pg_wal directory.
If you do not want to archive, set a slightly increased value.
|wal_sender_timeout||Time to wait before determining that the WAL receiver process is in an abnormal state.|
|synchronous_standby_names||Standby servers for synchronous replication. No setting is required for asynchronous replication.
|synchronous_commit||Synchronization level of the standby servers.
Set 'remote_apply' to optimize data freshness and 'on' to maintain performance and reliability.
|archive_command||Command for archiving the WAL.
Example: 'cp %p /mnt/serv/arch_dir %f'
|pg_hba.conf||Set the 'database' column to 'replication'.
Example: host replication someuser 22.214.171.124 passwd
Set the values below in postgresql.conf on the secondary server.
|wal_receiver_timeout||Time to wait before determining that the WAL sender process is in an abnormal state.|
|primary_conninfo||Connection information to the primary server.
Set the same name for application_name and synchronous_standby_names in postgresql.conf on the primary server.
Example: 'host=126.96.36.199 port=5432 user=someuser password=passwd application_name=s1'
|restore_command||Set the command to get the WAL archive.
Example: 'scp firstname.lastname@example.org:/mnt/serv/arch_dir/%f %p'
In case the primary server fails, a standby server will be promoted. For this reason, we recommend setting common parameters on the primary and standby servers in advance.
FUJITSU Enterprise Postgres is designed for developers, data architects, and DBAs seeking the efficiency of a leading open source based database technology with the enhanced security and performance features plus support required in a serious business environment.
Read our most recent articles regarding all aspects of PostgreSQL and FUJITSU Enterprise Postgres.