Knowledge articles - High availability

Is there a way to determine if the server is running as the primary or standby by obtaining information from a database or a file on that server?KB4001

When you use the streaming replication feature, you can use the pg_is_in_recovery() function to check the server status.

The function returns t if the server is a standby, or f if the server is a primary.

In Fujitsu Enterprise Postgres, when you use Mirroring Controller to comprise the cluster system, you can use the Mirroring Controller command mc_ctl to determine which is primary and which is standby server as follows:

mc_ctl status -M /mcdir/inst1

The output will look like this:

mirroring status
-----------------
switchable
 server_id  host_role  host           host_status  db_proc_status  disk_status 
------------------------------------------------------------------------------
 inst1p     standby    XXX.XX.XX.156  normal       normal          normal
 inst1s     primary    XXX.XX.XX.161  normal       normal          normal

For more information on how to use the Mirroring Controller command mc_ctl to check the server role, refer to Fujitsu Enterprise Postgres Cluster Operation Guide - Database Multiplexing > Chapter 3 - Operations in database multiplexing mode > 3.3 - Checking the database multiplexing mode status [Linux version] [Windows version].

Applicable to

Product: Fujitsu Enterprise Postgres SE, Fujitsu Enterprise Postgres AE, PostgreSQL

Versions: from 9.5

Architecture: X86, S390x

Operating System: Linux

When source (master) and destination (standby) database servers are synchronized using the streaming replication feature, what is the impact on database server updates if I promote the destination (standby) server to master?KB4002

If you promote the standby server while the source (master) is still active, the standby server will become an updatable instance (client connections besides those of a replication type can update the database cluster).

This situation where both database instances of a cluster are updatable is referred to as a split-brain scenario. It means that some clients can make changes to the original master but not to the original standby, while other clients can make changes to the original standby but not to the master, eventuating in two separate database clusters containing different sets of data. At this point, no matter which database cluster you connect to, some data will be missing (as it will exist on the other database instance).

Recovering from a split-brain scenario is difficult because you need to identify the differences and ensure that missing data on one server is recovered from the other server. A new standby will need to be created from the newly recovered master before a high availability architecture can again be achieved.

Before promoting a standby server to master, it is vital that the master database (source) is fenced, i.e., that it is made inaccessible to any clients. For more details on this topic, look up STONITH (Shoot The Other Node In The Head).

For details on how to avoid split brains when using pgpool-II, refer to our PostgreSQL Insider article PostgreSQL High Availability using pgpool-II.

Applicable to

Product: Fujitsu Enterprise Postgres SE, Fujitsu Enterprise Postgres AE, Fujitsu Enterprise Postgres for Kubernetes, PostgreSQL

Versions: from 9.5

Architecture: X86, S390x

Operating System: Linux

Is it possible to resume streaming replication operations on a standby instance that has been restored from a logical dump (performed with pg_dump or pg_dumpall) of the primary database cluster?KB4003

No, the recommend approach is to use a binary dump such as that performed by using pg_basebackup. This ensures that the necessary information is written to the database data files and WAL files in order to begin playing the WAL files from the correct position.

The risk of using a logical dump is that data written since the start of the backup is not included in that backup, and that a restore is not able to identify the correct place in the WAL to start synchronising from.

A restore of a binary backup by itself may not produce a consistent database (specifically when the database is being used while the backup is being executed), so the WAL files must be used to bring the database into a consistent state during restore.

Applicable to

Product: Fujitsu Enterprise Postgres SE, Fujitsu Enterprise Postgres AE, PostgreSQL

Versions: from 9.5

Architecture: X86, S390x

Operating System: Windows, Linux

When the master database (source) in a high availability cluster replicating (without automated failover) is restarted, is there a situation where database updates to the master database are not replicated to the standby?KB4004

There is no "DB updates are allowed but updates are not replicated" timing. The standby will re-establish connectivity to the master and replication of all updates will occur.

Applicable to

Product: Fujitsu Enterprise Postgres SE, Fujitsu Enterprise Postgres AE, Fujitsu Enterprise Postgres for Kubernetes, PostgreSQL

Versions: from 9.4

Architecture: X86, S390x

Operating System: Windows, Linux

How do I return a fenced database instance from a primary state to a standby state, and synchronize it with the new primary instance?KB4005

Follow the steps below:

Stop the fenced database instance.
Take a backup of the current primary database using pg_basebackup, and use this to replace the fenced database instance data directory.
Configure the fenced database instance by setting standby_mode to "on", and setting the appropriate connection information for the database instance to connect to the primary instance.
Unfence the instance using the appropriate method for how the database has been fenced.
Start the database instance - it will connect to the master and synchronize.

Note that the settings mentioned above are specified differently depending on the database version - versions up to 12 use the recovery.conf file; versions from 12 onwards use the postgresql.conf file and the signal file recovery.signal.

Applicable to

Product: Fujitsu Enterprise Postgres SE, Fujitsu Enterprise Postgres AE, PostgreSQL

Versions: from 9.2

Architecture: X86, S390x

Operating System: Windows, Linux

How do I return a fenced database instance from a primary state to a standby state, and synchronize it with the new primary instance?KB4005

Follow the steps below:

Stop the fenced database instance.
Take a backup of the current primary database using pg_basebackup, and use this to replace the fenced database instance data directory.
Configure the fenced database instance by setting standby_mode to "on", and setting the appropriate connection information for the database instance to connect to the primary instance.
Unfence the instance using the appropriate method for how the database has been fenced.
Start the database instance - it will connect to the master and synchronize.

Applicable to

Product: Fujitsu Enterprise Postgres SE, Fujitsu Enterprise Postgres AE, PostgreSQL

Versions: from 9.2

Architecture: X86, S390x

Operating System: Windows, Linux

Streaming replication failed due to an inconsistent timeline ID during operation.KB4006

The safest course of action in this situation is to rebuild the standby database from the primary one.

Follow the steps below:

Take a backup of the current primary database using pg_basebackup, and use this to replace the data directory of the standby instance giving the error.
Configure the standby instance appropriately, including setting the appropriate connection information for the database instance to connect to the primary instance.
If the configuration file in the database directory being replaced is suitable, this can be backed up and replace the one from the backup.
Start the database instance - it will connect to the master and synchronize.

Applicable to

Product: Fujitsu Enterprise Postgres SE, Fujitsu Enterprise Postgres AE, PostgreSQL

Versions: from 9.2

Architecture: X86, S390x

Operating System: Windows, Linux

Is pg_basebackup required to build a streaming replication environment?KB4007

pg_basebackup is a utility that provides a less error-prone method of building a streaming replication environment, by performing the below activities automatically:

Checkpoint to flush dirty pages to disk.
Forcing full page writes to occur.
Marking the backup starting point in the transaction logs for synchronizing recovery later.

While other options are available, such as taking down the database or using pg_start_backup/pg_stop_backup and then manually copying the data directory, pg_basebackup offers the simplest and safest way of doing it.

Applicable to

Product: Fujitsu Enterprise Postgres SE, Fujitsu Enterprise Postgres AE, Fujitsu Enterprise Postgres for Kubernetes, PostgreSQL

Versions: from 9.4

Architecture: X86, S390x

Operating System: Windows, Linux

Which settings should I be concerned about when setting up streaming replication between two servers?KB4008

The configuration settings that you need to be aware of for streaming replication, and their meanings, are:

Sending servers (including standby servers in cascading replication setups)

max_wal_senders (integer): Maximum number of concurrent connections from standby servers or streaming base backup clients - i.e., the maximum number of simultaneously running WAL sender processes (default: 10).
max_replication_slots (integer): Maximum number of replication slots that the server can support (default: 10).
wal_keep_size (integer): Minimum size of past log file segments kept in the pg_wal directory, in case a standby server needs to fetch them for streaming replication.
max_slot_wal_keep_size (integer): Maximum size of WAL files that replication slots are allowed to retain in the pg_wal directory at checkpoint time.
wal_sender_timeout (integer): Amount of time of inactivity from replication connections before they are terminated.
track_commit_timestamp (boolean): Whether to record commit time of transactions (can only be set in postgresql.conf or on the server command line; default: off).

Master servers

synchronous_standby_names (string): List of standby servers that can support synchronous replication.
vacuum_defer_cleanup_age (integer): Number of transactions by which VACUUM and HOT updates will defer cleanup of dead row versions (default: 0, meaning that dead row versions can be removed as soon as possible).

Standby servers

primary_conninfo (string): Connection string to be used for the standby server to connect with a sending server.
primary_slot_name (string): Name of existing replication slot to be used when connecting to the sending server via streaming replication to control resource removal on the upstream node.
promote_trigger_file (string): Trigger file whose presence ends recovery in the standby.
hot_standby (boolean): Whether users can connect and run queries during recovery.
max_standby_archive_delay (integer): How long the standby server should wait before canceling standby queries that conflict with about-to-be-applied WAL entries (when Hot Standby is active).
max_standby_streaming_delay (integer): How long the standby server should wait before canceling standby queries that conflict with about-to-be-applied WAL entries (when Hot Standby is active).
wal_receiver_create_temp_slot (boolean): Whether the WAL receiver process should create a temporary replication slot on the remote instance when no permanent replication slot to use has been configured (default: off).
wal_receiver_status_interval (integer): Minimum frequency for the WAL receiver process on the standby to send information about replication progress to the primary or upstream standby.
hot_standby_feedback (boolean): Whether a hot standby should send feedback to the primary or upstream standby about queries currently executing on the standby.
wal_receiver_timeout (integer): Amount of time of inactivity from replication connections before they are terminated.
wal_retrieve_retry_interval (integer): How long the standby server should wait when WAL data is not available from any sources.
recovery_min_apply_delay (integer): By default, a standby server restores WAL records from the sending server as soon as possible. It may be useful to have a time-delayed copy of the data, offering opportunities to correct data loss errors.

Applicable to

Product: Fujitsu Enterprise Postgres SE, Fujitsu Enterprise Postgres AE, Fujitsu Enterprise Postgres for Kubernetes, PostgreSQL

Versions: from 9.4

Architecture: X86, S390x

Operating System: Windows, Linux

Does the time taken to promote a standby server to master using the pg_ctl promote command affect the size of the database?KB4009

Promotion of a standby server does not impact the size of the database. Note, however, that it does impact the amount of WAL data.

Applicable to

Product: Fujitsu Enterprise Postgres SE, Fujitsu Enterprise Postgres AE, PostgreSQL

Versions: from 9.2

Architecture: X86, S390x

Operating System: Windows, Linux

How do I prevent or block updates from a primary instance to a standby instance in a binary streaming replication architecture?KB4010

Standby servers connect to primary servers in order to receive data updates that are streamed from the transaction log. This means that the primary instance needs to be configured to allow connection from the standby instance. To do this, appropriate settings should be configured in the pg_hba.conf file (credentials and host that the standby will connect with).

The standby instance also needs to be configured with the connection information for the master instance. This is slightly different depending on which version of the database you are using:

For Fujitsu Enterprise Postgres/PostgreSQL versions 11 and earlier, specify the connection information in the recovery.conf file using the primary_conninfo parameter.
For Fujitsu Enterprise Postgres/PostgreSQL versions 12 and later, specify the connection information in the postgresql.conf file using the primary_conninfo parameter.

For more information on primary_conninfo, refer to PostgreSQL documentation > Part III - Server administration > Chapter 19 - Server configuration > 19.6 - Replication > 19.6.3 - Standby servers

Applicable to

Product: Fujitsu Enterprise Postgres SE, Fujitsu Enterprise Postgres AE, Fujitsu Enterprise Postgres for Kubernetes, PostgreSQL

Versions: from 9.4

Architecture: X86, S390x

Operating System: Windows, Linux

When VACUUM is executed on a primary instance that is the source of replication of a standby instance, is the information in the primary instance's VACUMM-optimized DB forwarded to the standby? Or is VACUMM actually executed on the standby?KB4011

Changes made on the primary instance by the VACUUM command are written to the primary instance's transaction log; these changes are then replicated to the standby instance.

The VACUUM command is not run on the standby, and doesn’t need to.

Applicable to

Product: Fujitsu Enterprise Postgres SE, Fujitsu Enterprise Postgres AE, Fujitsu Enterprise Postgres for Kubernetes, PostgreSQL

Versions: from 9.2

Architecture: X86, S390x

Operating System: Windows, Linux

How does VACUUM work? What is its operation flow?KB4012

The operation flow of the VACUUM command is as follow:

The VACUUM command is applied to a table on the primary side.
The VACUUM processing is performed on the primary side.
Database files on the primary are optimized by the VACUUM processing.
Database-VACUUM optimized files are transferred to the secondary side via streaming replication.
Optimized data is accessed on the secondary side.

If, for some reason, a VACUUM operation is performed on a primary table when replication is not being performed, the optimized database information is stored on the primary, and then when replication is restored, the optimized database information is transferred and populated on the secondary.

Both the VACUUM command and table update operations (UPDATE, INSERT, DELETE, etc.) result in database changes, which are treated as differential information via WAL, so the flow of synchronization between the primary and the secondary is the same when replication is being performed.

Applicable to

Product: Fujitsu Enterprise Postgres SE, Fujitsu Enterprise Postgres AE, Fujitsu Enterprise Postgres for Kubernetes, PostgreSQL

Versions: from 9.2

Architecture: X86, S390x

Operating System: Windows, Linux

I can't access my streaming standby database, even in read mode. How to configure the standby server?KB4013

On the standby database server, set hot_standby to on in postgresql.conf file. This allows you to connect and run queries on the standby server during recovery.
Note that if the standby is in sync streaming replication mode, then the master will not complete requests until the standby database (not the server itself) is restarted — just reloading the configuration is not enough.

Applicable to

Product: Fujitsu Enterprise Postgres AE,PostgreSQL

Versions: from 9.5 and 9.6

Architecture: X86, S390x

Operating System: Windows, Linux

How do I drop replication slots?KB4014

Replication slots ensure that the primary server keeps the WAL necessary for standby recovery. This is useful in cases where the standby server needs to be kept up-to-date even after being disconnected from the master server for a long period.
To drop a replication slot, follow the steps below:

Stop the standby server.
Log in to the primary server.
Drop the replication slot using SELECT pg_drop_replication_slot('slotName');
Confirm that the replication slot was dropped using SELECT * FROM pg_replication_slots;
On the standby server, open recovery.conf and comment the entry primary_slot_name='slotName'
Start the standby server using pg_ctl start -D $PGDATA

Applicable to

Product: Fujitsu Enterprise Postgres AE,PostgreSQL

Versions: from 9.5

Architecture: X86, S390x

Operating System: Windows, Linux