
What to do if streaming replication is not an option?
One of the many benefits of Postgres replication is that you can set up a streaming replica from a cluster, regardless of whether the source is a Primary or a Standby. This means we can cascade replication on from replicas or even from a standby used for log shipping.
The ability to set up a hot standby from a warm standby was particularly useful for a customer who recently advised that security restrictions at their organization now forbade any intra-datacenter application connectivity. This meant that using streaming replication to maintain their DR Fujitsu Enterprise Postgres instances was no longer acceptable.
I suggested that this being the case, we could either use log shipping or storage replication.
Working out the solution
The customer preference was to use log shipping to the remote DC. They run Fujitsu Enterprise Postgres from an active DC where the archived WAL and backups are written to an NFS share. This synchronizes to a remote passive DC NFS every few minutes. The customer expected there to be up to a 2hr window before the DR DC was running the service again.
The customer accepted there was a risk of data loss with the file-based log shipping to a warm standby DR DC approach. However, this solution is in keeping with the customer security policy of no application connectivity between DCs.
To meet the customer requirements, we have implemented the changes in design as per the below diagram. As you can see, the customer applications use Connection Manager to reach the Primary. Connection Manager runs as Active/Active, where the application connection string contain both hosts to protect against a SPOF. The Active DC Connection Manager only knows about the local Primary and Standby instances. The Active DC Arbitration Server also only knows about the 2 local Fujitsu Enterprise Postgres instances and uses a VIP for AS host resilience.
On the Passive DC side, Connection Manager is also Active/Active, and only configured for its local Fujitsu Enterprise Postgres instances. On the Passive DB side, the Connection Manager only shows the 2 Warm standbys. The Arbitration Server and Mirroring Controller on the Passive DC are also running awaiting the promotion of VM1 in DC A.
On failure of the Active DC B or shutdown of the 2 Fujitsu Enterprise Postgres clusters, promotion of the Warm standby can proceed with VM1 becoming the primary with its Hot standby of VM2 already in place. Upon recovery of DC B, VM4 would be implemented as the Warm standby and VM5 as the Hot standby of VM4.
With this architecture, the customer has requested automation to lower the Sync Replication Mode dependency in the event of losing the Primary or Replica. This could happen during routine activities like patching.
The automation will:
- Change the replication mode from Sync to Async when the Primary has no replica and is in Sync mode
- Change the replication mode from Async to Sync when the Primary has a replica and is in Async mode
Fujitsu Enterprise Postgres has the edge
This DR solution protects against any SPOF, and is both symmetrical and simple for the customer to support. All this is possible due to the adaptability of Fujitsu Enterprise Postgres to allow a Warm standby to also have a Hot standby. The ability to promote a Warm standby performing WAL file replay with this setup gives Fujitsu Enterprise Postgres the edge over many competitor products in offering faster and simpler transitions of DC failover. This is particularly true for customers who are not permitted to replicate between datacenters.
# | Summary |
1 | We use archive_timeout to force a WAL archive log switch when there has been changes to avoid waiting for WAL. |
2 | The Connection Managers are Active/Active in each DC and point to their local 2 Fujitsu Enterprise Postgres DBs. This is to save time on DC failover. |
3 | The Arbitration Server is Active/Passive in each DC and points to their local 2 Fujitsu Enterprise Postgres Clusters. This is to save time on DC failover. |
4 | In the event of DC failover/or losing DC B, we promote the remote standby (e.g., VM1) and the Connection Manager will automatically pick up the new Primary on VM1 and its Sync replica on VM2 as the standby. |
5 | In the event of DC failover/or losing DC B, we promote the remote standby (i.e., VM1) and the AS will automatically pick up the new Primary on VM1 and its Sync replica on VM2 as the standby. |
6 | PgBackrest is used for WAL archiving/WAL restoration. |
7 | All components (i.e., Fujitsu Enterprise Postgres clusters, Fujitsu Enterprise Postgres Arbitration Server, Fujitsu Enterprise Postgres Connection Manager) have auto-start services. |