<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=2826169&amp;fmt=gif">
Start  trial

    Start trial

      img-blog-curtain-author-rajesh-kandasamy-blue-to-cyan
      Mirroring Controller ensures business continuity by seamlessly switching to a standby server in case of abnormalities detected on the primary server.

      A customer keen on using Mirroring Controller enquired about its usage without an arbitration server. So the Center of Excellence team went to work and tested 10 different scenarios of a Fujitsu Enterprise Postgres environment running without an arbitration server, and then performed the test of the same 10 scenarios in an environment with an artibtation server.

      The testing highlighted the effectiveness of the arbitration server in preventing split brain scenarios - more on this later.

      But before proceeding, let me take a step back and explain some of the Fujitsu Enterprise Postgres components discussed here.

      What is Mirroring Controller?

      The Mirroring Controller is a module within Fujitsu Enterprise Postgres that continually monitors the system to switch online transaction processing to a standby server seamlessly in case an abnormality is detected.

      This feature ensures business continuity and is a must-have for high-availability, mission-critical systems, which by definition require maximum uptime. It accomplishes this through system monitoring and automatic switch/disconnect.

      img-dgm-mirroring-controller-02

      What is the arbitration server?

      Fujitsu Enterprise Postgres provides the Server Assistant component, which determines the status of database servers as a third party, and if necessary, isolates affected databases. The server running the Server Assistant - the arbitration server - evaluates the status of the primary server reported by the standby server (or vice versa), and takes the appropriate action, such as fencing the malfunctioning server.

      img-dgm-mirroring-controller-03

      To learn more about the arbitration server, check the General Description > High reliability with database multiplexing

      How does the arbitration server prevent split brain scenarios?

      The term split brain refers to a scenario when multiple nodes within a cluster simultaneously assume they are the primary node. This situation can arise when a backup server transitions to primary status while the original primary server recovers and reconnects.

      If each server considers it is active and commits any transactions that the other does not, then you cannot easily resynchronize the databases and merge the information on each server.

      The Mirroring Controller arbitration server plays a vital role in addressing this issue. Within a master-standby configuration, it monitors server health and intervenes when detecting anomalies, effectively preventing data inconsistency between nodes.

      Test case scenarios

      Here are the scenarios that we tested, first without an arbitration server, and then with an arbitration server.

      # Test case scenario Result without arbitration server Result with arbitration server
      1 Primary is stopped; Linux OS is shut down

      Failed over, as expected.

      Arbitration_cmd and Fencing_cmd are executed.

      Failed over, as expected.

      Fencing_cmd is executed.

      2 Primary database instance is brought down using pg_ctl stop

      Failed over, as expected.

      Arbitration_cmd and Fencing_cmd are not executed.

      Failed over, as expected.

      Fencing_cmd is not executed.

      3 Postmaster process is killed on primary

      Failed over, as expected.

      Arbitration_cmd and Fencing_cmd are not executed.

      Failed over, as expected.

      Fencing_cmd is not executed.

      4 Storage error: Data storage full on primary

      Failed over, as expected.

      Arbitration_cmd and Fencing_cmd are not executed.

      Failed over, as expected.

      Fencing_cmd is not executed.

      5 Storage error: WAL directory full on primary

      Failed over, as expected.

      Arbitration_cmd and Fencing_cmd are not executed.

      Failed over, as expected.

      Fencing_cmd i not executed.

      6 Mirroring Controller is stopped on primary using mc_ctl stop -M /mc

      No failover, as expected.

      Arbiter and fencing scripts are not executed.

      No failover, as expected.

      Fencing_cmd is not executed.

      7 Mirroring Controller is forcibly stopped on primary using mc_ctl stop -M /mc -e

      No failover, as expected.

      Arbiter and fencing scripts are not executed.

      Database instance on primary is up and running.

      No failover, as expected.

      Fencing script is not executed.

      Database instance on primary is up and running without any event logged for crash.

      8 Mirroring Controller processes (mc_agent and mc_keeper) are killed and Mirroring Controller is stopped, but database instance is kept running

      No failover, as expected.

      Arbiter and fencing scripts are not executed.

      Database instance on primary is up and running.

      No failover, as expected.

      Fencing script is not executed.

      Database instance on primary is up and running without any event logged for crash.

      9 Network error: Network (streaming and admin) is disrupted between primary and standby.

      Failed over, as expected.

      Arbitration and fencing scripts are executed, but unable to fence due to network failure.

      Split brain scenario is observed.

      Failed over, as expected

      Fencing script executed, as it goes through arbitration network.

      Split-brain scenario is prevented.

      10 Network error: Arbitration (eth1) network is disrupted; bit streaming replication and admin network is configured on other network (eth0).

      Not applicable, as only one network is configurable, when there is no arbitration server.

      No failover, as expected.

      Fencing script is not executed.

      Test results with Mirroring Controller but without arbitration server

      We tested the following use case scenarios with Mirroring Controller without an arbitration server. Logs of the execution are provided for reference.

      Scenario 1: Primary is stopped; Linux OS is shut down

      Result - Server status: Failed over, as expected. Arbitration_cmd and Fencing_cmd are executed.

      [fepuser@fep15-ha-server1: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      switched

      server_id  host_role  host           host_status  db_proc_status        disk_status
      -----------------------------------------------------------------------------------
      server1    primary    192.168.100.5  normal       abnormal(wal_sender)  normal
      server2    unknown    192.168.100.6  abnormal     unknown               unknown
      [fepuser@fep15-ha-server1: ~]$
      [fepuser@fep15-ha-server1: ~]$
      [fepuser@fep15-ha-server1: ~]$ cat /mc/hb_chg.log
               Arbitration Command executed at: Thu Feb  1 16:58:58 AEDT 2024

               Fencing Command executed at: Thu Feb  1 16:58:58 AEDT 2024
      [fepuser@fep15-ha-server1: ~]$

      Standby (new primary) Mirroring Controller log

      Feb  1 16:58:55 fep15-ha-server1 MirroringControllerOpen[6225]: WARNING: detected an error on the monitored object "server(server2)": no response:ping timeout (MCA00019)
      Feb  1 16:58:56 fep15-ha-server1 MirroringControllerOpen[6225]: WARNING: detected an error on the monitored object "server(server2)": no response:ping timeout (MCA00019)
      Feb  1 16:58:58 fep15-ha-server1 MirroringControllerOpen[6225]: WARNING: detected an error on the monitored object "server(server2)": no response:ping timeout (MCA00019)
      Feb  1 16:58:58 fep15-ha-server1 MirroringControllerOpen[6225]: LOG: executing arbitration command and inquiring the result of arbitration (MCA00191)
      Feb  1 16:58:58 fep15-ha-server1 MirroringControllerOpen[6225]: LOG: starting degenerate because of result of arbitration command for database server "server2"  result:"0" (MCA00193)
      Feb  1 16:58:58 fep15-ha-server1 MirroringControllerOpen[6225]: LOG: executing fencing command (MCA00192)
      Feb  1 16:58:58 fep15-ha-server1 MirroringControllerOpen[6225]: LOG: fencing command for database server "server2" succeeded: result:"0" (MCA00198)
      Feb  1 16:59:00 fep15-ha-server1 MirroringControllerOpen[6225]: LOG: promotion processing completed (MCA00062)

      Standby (new primary) database log

      2024-02-01 16:58:58.719 AEDT [6256] LOG:  received promote request
      2024-02-01 16:58:58.720 AEDT [6259] FATAL:  terminating walreceiver process due to administrator command
      2024-02-01 16:58:58.720 AEDT [6256] LOG:  invalid record length at 0/18000060: wanted 24, got 0
      2024-02-01 16:58:58.720 AEDT [6256] LOG:  redo done at 0/18000028 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 97.78 s
      2024-02-01 16:58:58.758 AEDT [6256] LOG:  selected new timeline ID: 5
      2024-02-01 16:58:59.587 AEDT [6256] LOG:  archive recovery complete
      2024-02-01 16:59:00.178 AEDT [6254] LOG:  checkpoint starting: force
      2024-02-01 16:59:00.180 AEDT [6252] LOG:  database system is ready to accept connections
      2024-02-01 16:59:00.917 AEDT [6252] LOG:  received SIGHUP, reloading configuration files
      2024-02-01 16:59:03.274 AEDT [6254] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.194 s, total=3.097 s; sync files=2, longest=0.099 s, average=0.097 s; distance=16384 kB, estimate=16384 kB

      Scenario 2: Primary database instance is brought down using pg_ctl stop

      Result: Failed over, as expected. Arbitration_cmd and Fencing_cmd is not executed.

      [fepuser@fep15-ha-server1: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      switched

      server_id  host_role                  host           host_status  db_proc_status        disk_status
      ---------------------------------------------------------------------------------------------------
      server1    primary                    192.168.100.5  normal       abnormal(wal_sender)  normal
      server2    none(inactivated primary)  192.168.100.6  normal       abnormal(postmaster)  normal
      [fepuser@fep15-ha-server1: ~]$

      Primary database log

      2024-02-01 16:30:13.928 AEDT [356137] LOG:  received fast shutdown request
      2024-02-01 16:30:13.934 AEDT [356137] LOG:  aborting any active transactions
      2024-02-01 16:30:13.935 AEDT [356146] LOG:  user profile status writer shutting down
      2024-02-01 16:30:13.940 AEDT [356137] LOG:  background worker "user profile status writer" (PID 356146) exited with exit code 1
      2024-02-01 16:30:13.940 AEDT [356137] LOG:  background worker "logical replication launcher" (PID 356147) exited with exit code 1
      2024-02-01 16:30:13.941 AEDT [356139] LOG:  shutting down
      2024-02-01 16:30:13.986 AEDT [356139] LOG:  checkpoint starting: shutdown immediate
      2024-02-01 16:30:14.018 AEDT [356139] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.046 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16383 kB, estimate=16383 kB
      2024-02-01 16:30:14.056 AEDT [356137] LOG:  database system is shut down

      Primary Mirroring Controller log

      Feb  1 16:30:14 fep15-ha-server2 MirroringControllerOpen[356515]: WARNING: detected an error on the monitored object "database process(postmaster)": no response:connection to server at "::1", port 27500 failed: Connection refused#012#011Is the server running on that host and accepting TCP/IP connections?#012 (MCA00019)
      Feb  1 16:30:14 fep15-ha-server2 MirroringControllerOpen[356515]: LOG: starting to fail over from server2 to server1 (MCA00021)
      Feb  1 16:30:14 fep15-ha-server2 MirroringControllerOpen[356515]: LOG: fail over completed.switched from server2 to server1 (MCA00022)

      Standby Mirroring Controller log

      Feb  1 16:30:14 fep15-ha-server1 MirroringControllerOpen[373364]: LOG: promotion processing completed (MCA00062)

      Scenario 3: Postmaster process is killed on primary

      Result: Failed over, as expected. Arbitration_cmd and Fencing_cmd is not executed.

      [fepuser@fep15-ha-server1: /mc]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      switched

      server_id  host_role                  host           host_status  db_proc_status        disk_status
      ---------------------------------------------------------------------------------------------------
      server1    none(inactivated primary)  192.168.100.5  normal       abnormal(postmaster)  normal
      server2    primary                    192.168.100.6  normal       abnormal(wal_sender)  normal
      [fepuser@fep15-ha-server1: /mc]$

      Primary Mirroring Controller log

      Feb  1 16:13:12 fep15-ha-server1 MirroringControllerOpen[371058]: WARNING: detected an error on the monitored object "database process(postmaster)": no response:connection to server at "::1", port 27500 failed: Connection refused#012#011Is the server running on that host and accepting TCP/IP connections?#012 (MCA00019)
      Feb  1 16:13:12 fep15-ha-server1 MirroringControllerOpen[371058]: LOG: starting to fail over from server1 to server2 (MCA00021)
      Feb  1 16:13:13 fep15-ha-server1 MirroringControllerOpen[371058]: LOG: fail over completed.switched from server1 to server2 (MCA00022)

      Standby Mirroring Controller log

      Feb  1 16:13:13 fep15-ha-server2 MirroringControllerOpen[352280]: LOG: promotion processing completed (MCA00062)

      Scenario 4: Storage error - Data Storage full on primary

      Result: Failed over, as expected. Arbitration_cmd and Fencing_cmd is not executed.

      Test case scenario execution log

      [fepuser@fep15-ha-server2: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      switched

      server_id  host_role                  host           host_status  db_proc_status        disk_status
      --------------------------------------------------------------------------------------------------------------------------
      server1    none(inactivated primary)  192.168.100.5  normal       abnormal(postmaster)  abnormal(data,tran_log,tablespace)
      server2    primary                    192.168.100.6  normal       abnormal(wal_sender)  normal
      [fepuser@fep15-ha-server2: ~]$

      Primary log

      Feb  1 11:29:11 fep15-ha-server1 MirroringControllerOpen[5955]: WARNING: detected an error on the monitored object "data storage destination directory(/database/inst1)": read/write error:exception=class java.io.IOException message=No space left on device (MCA00019)
      Feb  1 11:29:11 fep15-ha-server1 MirroringControllerOpen[5955]: WARNING: detected an error on the monitored object "transaction log storage destination directory(/database/inst1/pg_wal)": read/write error:exception=class java.io.IOException message=No space left on device (MCA00019)
      Feb  1 11:29:11 fep15-ha-server1 MirroringControllerOpen[5955]: WARNING: detected an error on the monitored object "tablespace directory(/pg_tblspc/enc_tblspc)": read/write error:exception=class java.io.IOException message=No space left on device (MCA00019)
      Feb  1 11:29:11 fep15-ha-server1 rsyslogd[888]: imjournal: fopen() failed for path: '/var/lib/rsyslog/imjournal.state.tmp': No space left on device [v8.2102.0-5.el8 try https://www.rsyslog.com/e/2013 ]
      Feb  1 11:29:12 fep15-ha-server1 MirroringControllerOpen[5955]: WARNING: detected an error on the monitored object "tablespace directory(/pg_tblspc/enc_tblspc)": read/write error:exception=class java.io.IOException message=No space left on device (MCA00019)
      Feb  1 11:29:13 fep15-ha-server1 MirroringControllerOpen[5955]: WARNING: detected an error on the monitored object "tablespace directory(/pg_tblspc/enc_tblspc)": read/write error:exception=class java.io.IOException message=No space left on device (MCA00019)
      Feb  1 11:29:13 fep15-ha-server1 MirroringControllerOpen[5955]: LOG: starting to fail over from server1 to server2 (MCA00021)
      Feb  1 11:29:13 fep15-ha-server1 MirroringControllerOpen[5955]: LOG: stopped database instance forcibly (MCA00064)
      Feb  1 11:29:13 fep15-ha-server1 MirroringControllerOpen[5955]: LOG: fail over completed.switched from server1 to server2 (MCA00022)

      Standby (new primary) log

      Feb  1 11:29:13 fep15-ha-server2 MirroringControllerOpen[376691]: LOG: promotion processing completed (MCA00062)

      Primary database log

      2024-02-01 11:27:30.462 AEDT [9876] ERROR:  could not create file "base/5/16522_vm": No space left on device
      2024-02-01 11:27:30.462 AEDT [9876] CONTEXT:  COPY pgbench_accounts, line 1000
      2024-02-01 11:27:30.462 AEDT [9876] STATEMENT:  copy pgbench_accounts from stdin with (freeze on)
      2024-02-01 11:28:10.854 AEDT [5984] LOG:  checkpoint starting: time
      2024-02-01 11:28:13.244 AEDT [5984] LOG:  checkpoint complete: wrote 24 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=2.319 s, sync=0.032 s, total=2.390 s; sync files=24, longest=0.021 s, average=0.002 s; distance=202 kB, estimate=202 kB
      2024-02-01 11:29:09.692 AEDT [10377] WARNING:  could not extend file "base/5/16538", but retrying: No space left on device
      2024-02-01 11:29:09.692 AEDT [10377] CONTEXT:  COPY pgbench_accounts, line 2000
      2024-02-01 11:29:13.121 AEDT [5982] LOG:  received immediate shutdown request
      2024-02-01 11:29:13.136 AEDT [5982] LOG:  database system is shut down
      ~

      Scenario 5: Storage error - WAL directory full on primary

      Result: Failed over, as expected. Arbitration_cmd and Fencing_cmd is not executed.

      [fepuser@fep15-ha-server2: /run/pg_wal]$ df -h .
      Filesystem      Size  Used Avail Use% Mounted on
      tmpfs           1.9G  1.9G  1.9M 100% /run
      [fepuser@fep15-ha-server2: /run/pg_wal]$


      [fepuser@fep15-ha-server1: /database/inst1/log]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      switched

      server_id  host_role                  host           host_status  db_proc_status        disk_status
      ---------------------------------------------------------------------------------------------------
      server1    primary                    192.168.100.5  normal       abnormal(wal_sender)  normal
      server2    none(inactivated primary)  192.168.100.6  normal       abnormal(postmaster)  normal
      [fepuser@fep15-ha-server1: /database/inst1/log]$

      Primary database log

      2024-02-01 12:32:49.594 AEDT [401613] PANIC:  could not write to file "pg_wal/xlogtemp.401613": No space left on device
      2024-02-01 12:32:49.594 AEDT [401613] CONTEXT:  writing block 4846 of relation base/5/24714
              COPY pgbench_accounts, line 421000
      2024-02-01 12:32:49.594 AEDT [401613] STATEMENT:  copy pgbench_accounts from stdin with (freeze on)
      2024-02-01 12:32:50.099 AEDT [400627] LOG:  server process (PID 401613) was terminated by signal 6: Aborted
      2024-02-01 12:32:50.099 AEDT [400627] DETAIL:  Failed process was running: copy pgbench_accounts from stdin with (freeze on)
      2024-02-01 12:32:50.099 AEDT [400627] LOG:  terminating any other active server processes
      2024-02-01 12:32:50.102 AEDT [400627] LOG:  all server processes terminated; reinitializing
      2024-02-01 12:32:50.202 AEDT [400627] LOG:  auto-open keystore has been opened
      2024-02-01 12:32:50.203 AEDT [401627] LOG:  database system was interrupted; last known up at 2024-02-01 12:29:44 AEDT
      2024-02-01 12:32:50.206 AEDT [401630] FATAL:  the database system is in recovery mode
      2024-02-01 12:32:50.493 AEDT [401633] FATAL:  the database system is in recovery mode
      2024-02-01 12:32:50.500 AEDT [401634] FATAL:  the database system is in recovery mode
      2024-02-01 12:32:50.507 AEDT [401635] FATAL:  the database system is in recovery mode
      2024-02-01 12:32:50.514 AEDT [401636] FATAL:  the database system is in recovery mode
      2024-02-01 12:32:51.248 AEDT [401627] LOG:  database system was not properly shut down; automatic recovery in progress
      2024-02-01 12:32:51.254 AEDT [401627] LOG:  redo starts at 0/140000A0
      2024-02-01 12:32:51.319 AEDT [401627] LOG:  redo done at 0/15FFEC50 system usage: CPU: user: 0.05 s, system: 0.01 s, elapsed: 0.06 s
      2024-02-01 12:32:51.323 AEDT [401639] FATAL:  the database system is not yet accepting connections
      2024-02-01 12:32:51.323 AEDT [401639] DETAIL:  Consistent recovery state has not been yet reached.
      2024-02-01 12:32:51.331 AEDT [401627] FATAL:  could not write to file "pg_wal/xlogtemp.401627": No space left on device
      2024-02-01 12:32:51.332 AEDT [401640] FATAL:  the database system is not yet accepting connections
      2024-02-01 12:32:51.332 AEDT [401640] DETAIL:  Consistent recovery state has not been yet reached.
      2024-02-01 12:32:51.334 AEDT [400627] LOG:  startup process (PID 401627) exited with exit code 1
      2024-02-01 12:32:51.334 AEDT [400627] LOG:  terminating any other active server processes
      2024-02-01 12:32:51.339 AEDT [401641] FATAL:  the database system is in recovery mode
      2024-02-01 12:32:51.341 AEDT [400627] LOG:  shutting down due to startup process failure
      2024-02-01 12:32:51.348 AEDT [400627] LOG:  database system is shut down

      Primary Mirroring Controller log

      Feb  1 12:32:50 fep15-ha-server2 MirroringControllerOpen[400600]: WARNING: detected an error on the monitored object "database process(postmaster)": no response:connection to server at "::1", port 27500 failed: FATAL:  the database system is in recovery mode#012 (MCA00019)
      Feb  1 12:32:51 fep15-ha-server2 MirroringControllerOpen[400600]: WARNING: detected an error on the monitored object "database process(postmaster)": no response:connection to server at "::1", port 27500 failed: FATAL:  the database system is in recovery mode#012 (MCA00019)
      Feb  1 12:32:51 fep15-ha-server2 MirroringControllerOpen[400600]: LOG: starting to fail over from server2 to server1 (MCA00021)
      Feb  1 12:32:51 fep15-ha-server2 MirroringControllerOpen[400600]: LOG: fail over completed.switched from server2 to server1 (MCA00022)

      Standby Mirroring Controller log

      Feb  1 12:32:51 fep15-ha-server1 MirroringControllerOpen[13212]: LOG: promotion processing completed (MCA00062)

      Scenario 6: Mirroring Controler is stopped on primary using mc_ctl stop -M /mc

      Result: No Failover, as expected. Arbitration_cmd and Fencing_cmd is not executed.

      Primary Mirroring Controller log

      Feb  1 18:05:50 fep15-ha-server1 MirroringControllerOpen[11781]: LOG: stopping Mirroring Controller (MCA00041)
      Feb  1 18:05:50 fep15-ha-server1 MirroringControllerOpen[11781]: LOG: Mirroring Controller stopped  target server:"server1" (MCA00042)

      Standby Mirroring Controller log

      No event is recorded on the standby Mirroring Controller log.

      Primary database log

      2024-02-01 18:05:50.163 AEDT [11687] LOG:  received fast shutdown request
      2024-02-01 18:05:50.169 AEDT [11687] LOG:  aborting any active transactions
      2024-02-01 18:05:50.170 AEDT [11696] LOG:  user profile status writer shutting down
      2024-02-01 18:05:50.174 AEDT [11687] LOG:  background worker "user profile status writer" (PID 11696) exited with exit code 1
      2024-02-01 18:05:50.174 AEDT [11687] LOG:  background worker "logical replication launcher" (PID 11697) exited with exit code 1
      2024-02-01 18:05:50.175 AEDT [11689] LOG:  shutting down
      2024-02-01 18:05:50.275 AEDT [11689] LOG:  checkpoint starting: shutdown immediate
      2024-02-01 18:05:50.370 AEDT [11689] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.002 s, sync=0.015 s, total=0.118 s; sync files=2, longest=0.009 s, average=0.008 s; distance=16384 kB, estimate=16384 kB
      2024-02-01 18:05:50.703 AEDT [11687] LOG:  database system is shut down

      Scenario 7: Mirroring Controller is forcibly stopped on primary using mc_ctl stop -M /mc -e

      Result: No Failover, as expected. Arbitration_cmd and Fencing_cmd is not executed. Database instance on primary is up and running.

      [fepuser@fep15-ha-server1: /mc]$ mc_ctl stop -M /mc/ -e
      stopping Mirroring Controller forcibly (MCA00044)
      stopped Mirroring Controller forcibly (MCA00045)
      [fepuser@fep15-ha-server1: /mc]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      unknown

      server_id  host_role  host           host_status  db_proc_status  disk_status
      -----------------------------------------------------------------------------
      server1    unknown    192.168.100.5  normal       unknown         unknown
      server2    standby    192.168.100.6  normal       normal          normal
      [fepuser@fep15-ha-server1: /mc]$
      [fepuser@fep15-ha-server1: /mc]$ pg_ctl -D /database/inst1/ status
      pg_ctl: server is running (PID: 6252)
      /opt/fsepv15server64/bin/postgres "-D" "/database/inst1"
      [fepuser@fep15-ha-server1: /mc]$

      Primary database log

      No related event in the database log

      Primary Mirroring Controller log

      Feb  1 17:11:12 fep15-ha-server1 MirroringControllerOpen[10086]: LOG: stopping Mirroring Controller forcibly (MCA00044)
      Feb  1 17:11:12 fep15-ha-server1 MirroringControllerOpen[10086]: LOG: stopped Mirroring Controller forcibly (MCA00045)

      Standby Mirroring Controller log

      No related event in the Mirroring Controller log.

      Scenario 8: Mirroring Controller processes (mc_agent and mc_keeper) are killed and Mirroring Controller is stopped, but database instance is kept running

      Result: No Failover, as expected. Arbitration_cmd and Fencing_cmd is not executed. Database instance on primary is up and running.

      [fepuser@fep15-ha-server1: ~]$ date; kill -9 10992 10979
      Thu Feb  1 17:42:01 AEDT 2024
      [fepuser@fep15-ha-server1: ~]$
      [fepuser@fep15-ha-server1: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      unknown

      server_id  host_role  host           host_status  db_proc_status  disk_status
      -----------------------------------------------------------------------------
      server1    unknown    192.168.100.5  normal       unknown         unknown
      server2    standby    192.168.100.6  normal       normal          normal
      [fepuser@fep15-ha-server1: ~]$

      Mirroring Controller log

      No event is recorded in the Mirroring Controller log of primary or standby servers.

      Scenario 9: Network error: Network is disrupted between primary and standby - IP address is brought down using ifconfig eth1 down)

      Note: Mirroring Controller and replication are running on the same network

      Result - Server status: Failed over, as expected. Arbitration_cmd and Fencing_cmd are executed. Split-brain scenario visible.

      [fsepuser@fep15-rhel8-ha-server1: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      switchable

      server_id  host_role  host         host_status  db_proc_status  disk_status
      ---------------------------------------------------------------------------
      server1    primary    10.1.254.21  normal       normal          normal
      server2    standby    10.1.254.22  normal       normal          normal
      [fsepuser@fep15-rhel8-ha-server1: ~]$

      [fsepuser@fep15-rhel8-ha-server1: ~]$ ps -ef | grep fsepuser
      root      445079    1518  0 12:12 pts/0    00:00:00 su - fsepuser
      fsepuser  445080  445079  0 12:12 pts/0    00:00:00 -bash
      fsepuser  445260       1  0 12:12 pts/0    00:00:00 mc_keeper start -M /mc
      fsepuser  445273  445260  0 12:12 pts/0    00:00:02 mc_agent start -M /mc
      fsepuser  445300       1  0 12:12 ?        00:00:00 /opt/fsepv15server64/bin/postgres -D /database/inst1
      fsepuser  445301  445300  0 12:12 ?        00:00:00 postgres: logger
      fsepuser  445302  445300  0 12:12 ?        00:00:00 postgres: checkpointer
      fsepuser  445303  445300  0 12:12 ?        00:00:00 postgres: background writer
      fsepuser  445305  445300  0 12:12 ?        00:00:00 postgres: walwriter
      fsepuser  445306  445300  0 12:12 ?        00:00:00 postgres: autovacuum launcher
      fsepuser  445307  445300  0 12:12 ?        00:00:00 postgres: archiver last was 000000090000000000000037.00000028.backup
      fsepuser  445308  445300  0 12:12 ?        00:00:00 postgres: logical replication launcher
      fsepuser  445997  445300  0 12:15 ?        00:00:00 postgres: walsender repluser 10.1.0.21(60824) streaming 0/38000328
      fsepuser  446907  445080  0 12:18 pts/0    00:00:00 ps -ef
      fsepuser  446908  445080  0 12:18 pts/0    00:00:00 grep --color=auto fsepuser
      [fsepuser@fep15-rhel8-ha-server1: ~]$

      ------------------------------------------------------

      [fsepuser@fep15-rhel8-ha-server2: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      switchable

      server_id  host_role  host         host_status  db_proc_status  disk_status
      ---------------------------------------------------------------------------
      server1    primary    10.1.254.21  normal       normal          normal
      server2    standby    10.1.254.22  normal       normal          normal
      [fsepuser@fep15-rhel8-ha-server2: ~]$
      [fsepuser@fep15-rhel8-ha-server2: ~]$ psql
      psql (15.0)
      Type "help" for help.

      postgres=# SHOW primary_conninfo ;
                                     primary_conninfo                                
      -------------------------------------------------------------------------------
      user=repluser password=Secret host=10.1.0.20 port=27500 application_name=stby
      (1 row)

      postgres=#


      [fsepuser@fep15-rhel8-ha-server2: ~]$ ps  -ef | grep fsepuser
      root        1556    1532  0 Feb06 pts/0    00:00:00 su - fsepuser
      fsepuser    1557    1556  0 Feb06 pts/0    00:00:00 -bash
      fsepuser  557223       1  0 12:15 pts/0    00:00:00 mc_keeper start -M /mc
      fsepuser  557236  557223  1 12:15 pts/0    00:00:02 mc_agent start -M /mc
      fsepuser  557263       1  0 12:15 ?        00:00:00 /opt/fsepv15server64/bin/postgres -D /database/inst1
      fsepuser  557264  557263  0 12:15 ?        00:00:00 postgres: logger
      fsepuser  557265  557263  0 12:15 ?        00:00:00 postgres: checkpointer
      fsepuser  557266  557263  0 12:15 ?        00:00:00 postgres: background writer
      fsepuser  557267  557263  0 12:15 ?        00:00:00 postgres: startup recovering 000000090000000000000038
      fsepuser  557268  557263  0 12:15 ?        00:00:00 postgres: walreceiver streaming 0/38000328
      fsepuser  558101    1557  0 12:18 pts/0    00:00:00 ps -ef
      fsepuser  558102    1557  0 12:18 pts/0    00:00:00 grep --color=auto fsepuser
      [fsepuser@fep15-rhel8-ha-server2: ~]$

      ===================================================================

      [root@fep15-rhel8-ha-server1 ~]# ifconfig eth1 down
      [root@fep15-rhel8-ha-server1 ~]# date
      Thu Feb  8 12:19:33 AEDT 2024
      [root@fep15-rhel8-ha-server1 ~]#

      [fsepuser@fep15-rhel8-ha-server1: /mc]$ ifconfig
      eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
              inet 10.1.0.20  netmask 255.255.255.0  broadcast 10.1.0.255
              inet6 fe80::222:48ff:fe7e:cac7  prefixlen 64  scopeid 0x20<link&>
              ether 00:22:48:7e:ca:c7  txqueuelen 1000  (Ethernet)
              RX packets 465126  bytes 134442811 (128.2 MiB)
              RX errors 0  dropped 0  overruns 0  frame 0
              TX packets 570268  bytes 327462701 (312.2 MiB)
              TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

      lo: flags=73<UP,LOOPBACK,RUNNINGgt;  mtu 65536
              inet 127.0.0.1  netmask 255.0.0.0
              inet6 ::1  prefixlen 128  scopeid 0x10<host>
              loop  txqueuelen 1000  (Local Loopback)
              RX packets 1549836  bytes 161595079 (154.1 MiB)
              RX errors 0  dropped 0  overruns 0  frame 0
              TX packets 1549836  bytes 161595079 (154.1 MiB)
              TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

      [fsepuser@fep15-rhel8-ha-server1: /mc]$

      ===================================================================

      [fsepuser@fep15-rhel8-ha-server1: /mc]$ mc_ctl status -M /mc/
      mc_ctl: The IP address or host name of the server where the command was executed in is not found on the definition file "/mc/network.conf" (MCA00051)
      [fsepuser@fep15-rhel8-ha-server1: /mc]$
      [fsepuser@fep15-rhel8-ha-server1: /mc]$ cat hb_chg.log
      Arbitration Command executed at: Thu Feb  8 12:19:32 AEDT 2024

      Fencing Command executed at: Thu Feb  8 12:19:32 AEDT 2024
      [fsepuser@fep15-rhel8-ha-server1: /mc]$

      ------------------------------------------------


      [fsepuser@fep15-rhel8-ha-server2: /mc]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      switched

      server_id  host_role  host         host_status  db_proc_status        disk_status
      ---------------------------------------------------------------------------------
      server1    unknown    10.1.254.21  abnormal     unknown               unknown
      server2    primary    10.1.254.22  normal       abnormal(wal_sender)  normal
      [fsepuser@fep15-rhel8-ha-server2: /mc]$ cat hb_chg.log
      Arbitration Command executed at: Thu Feb  8 12:19:32 AEDT 2024

      Fencing Command executed at: Thu Feb  8 12:19:32 AEDT 2024
      [fsepuser@fep15-rhel8-ha-server2: /mc]$

      Server1 Mirroring Controller log

      Feb  8 12:19:28 fep15-rhel8-ha-server1 MirroringControllerOpen[445273]: WARNING: detected an error on the monitored object "server(server2)": no response:ping timeout (MCA00019)
      Feb  8 12:19:30 fep15-rhel8-ha-server1 MirroringControllerOpen[445273]: WARNING: detected an error on the monitored object "server(server2)": no response:ping timeout (MCA00019)
      Feb  8 12:19:32 fep15-rhel8-ha-server1 MirroringControllerOpen[445273]: WARNING: detected an error on the monitored object "server(server2)": no response:ping timeout (MCA00019)
      Feb  8 12:19:32 fep15-rhel8-ha-server1 MirroringControllerOpen[445273]: LOG: executing arbitration command and inquiring the result of arbitration (MCA00191)
      Feb  8 12:19:32 fep15-rhel8-ha-server1 MirroringControllerOpen[445273]: LOG: starting degenerate because of result of arbitration command for database server "server2"  result:"0" (MCA00193)
      Feb  8 12:19:32 fep15-rhel8-ha-server1 MirroringControllerOpen[445273]: LOG: starting to detach standby server "server2" automatically (MCA00024)
      Feb  8 12:19:32 fep15-rhel8-ha-server1 MirroringControllerOpen[445273]: LOG: executing fencing command (MCA00192)
      Feb  8 12:19:32 fep15-rhel8-ha-server1 MirroringControllerOpen[445273]: LOG: fencing command for database server "server2" succeeded: result:"0" (MCA00198)
      Feb  8 12:19:32 fep15-rhel8-ha-server1 MirroringControllerOpen[445273]: LOG: detach standby server "server2" completed automatically (MCA00025)

      Feb  8 12:20:47 fep15-rhel8-ha-server1 MirroringControllerOpen[445273]: WARNING: detected an error on the monitored object "database process(postmaster)": no response:connection to server at "::1", port 27500 failed: fe_sendauth: no password supplied#012 (MCA00019)
      Feb  8 12:21:09 fep15-rhel8-ha-server1 MirroringControllerOpen[447660]: ERROR: mc_ctl: The IP address or host name of the server where the command was executed in is not found on the definition file "/mc/network.conf" (MCA00051)

      Server2 Mirroring Controller log

      Feb  8 12:19:28 fep15-rhel8-ha-server2 MirroringControllerOpen[557236]: WARNING: detected an error on the monitored object "server(server1)": no response:ping timeout (MCA00019)
      Feb  8 12:19:30 fep15-rhel8-ha-server2 MirroringControllerOpen[557236]: WARNING: detected an error on the monitored object "server(server1)": no response:ping timeout (MCA00019)
      Feb  8 12:19:32 fep15-rhel8-ha-server2 MirroringControllerOpen[557236]: WARNING: detected an error on the monitored object "server(server1)": no response:ping timeout (MCA00019)
      Feb  8 12:19:32 fep15-rhel8-ha-server2 MirroringControllerOpen[557236]: LOG: executing arbitration command and inquiring the result of arbitration (MCA00191)
      Feb  8 12:19:32 fep15-rhel8-ha-server2 MirroringControllerOpen[557236]: LOG: starting degenerate because of result of arbitration command for database server "server1"  result:"0" (MCA00193)
      Feb  8 12:19:32 fep15-rhel8-ha-server2 MirroringControllerOpen[557236]: LOG: executing fencing command (MCA00192)
      Feb  8 12:19:32 fep15-rhel8-ha-server2 MirroringControllerOpen[557236]: LOG: fencing command for database server "server1" succeeded: result:"0" (MCA00198)
      Feb  8 12:19:32 fep15-rhel8-ha-server2 MirroringControllerOpen[557236]: LOG: promotion processing completed (MCA00062)

      Primary database log

      2024-02-08 12:19:32.252 AEDT [445997] FATAL:  terminating connection due to administrator command
      2024-02-08 12:19:32.252 AEDT [445997] STATEMENT:  START_REPLICATION SLOT "repl_slot1" 0/38000000 TIMELINE 9 COMPRESS 0
      2024-02-08 12:19:32.264 AEDT [445300] LOG:  received SIGHUP, reloading configuration files
      2024-02-08 12:19:32.264 AEDT [445300] LOG:  parameter "synchronous_standby_names" removed from configuration file, reset to default

      Standby database log

      2024-02-08 12:19:32.252 AEDT [557268] FATAL:  could not receive data from WAL stream: FATAL:  terminating connection due to administrator command
      2024-02-08 12:19:32.252 AEDT [557267] LOG:  invalid record length at 0/38000410: wanted 24, got 0
      2024-02-08 12:19:32.279 AEDT [558371] LOG:  started streaming WAL from primary at 0/38000000 on timeline 9
      2024-02-08 12:19:32.495 AEDT [557267] LOG:  received promote request
      2024-02-08 12:19:32.495 AEDT [558371] FATAL:  terminating walreceiver process due to administrator command
      2024-02-08 12:19:32.495 AEDT [557267] LOG:  waiting for WAL to become available at 0/38000428
      2024-02-08 12:19:32.495 AEDT [557267] LOG:  redo done at 0/380003D8 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 247.21 s
      2024-02-08 12:19:32.495 AEDT [557267] LOG:  last completed transaction was at log time 2024-02-08 12:16:30.382893+11
      2024-02-08 12:19:32.502 AEDT [557267] LOG:  selected new timeline ID: 10
      2024-02-08 12:19:32.578 AEDT [557267] LOG:  archive recovery complete
      2024-02-08 12:19:32.594 AEDT [557265] LOG:  checkpoint starting: force
      2024-02-08 12:19:32.597 AEDT [557263] LOG:  database system is ready to accept connections
      2024-02-08 12:19:32.606 AEDT [557263] LOG:  received SIGHUP, reloading configuration files
      2024-02-08 12:19:32.723 AEDT [557265] LOG:  checkpoint complete: wrote 5 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.105 s, sync=0.007 s, total=0.130 s; sync files=4, longest=0.003 s, average=0.002 s; distance=16385 kB, estimate=16385 kB

      ===================================================================

      [root@fep15-rhel8-ha-server1 ~]# ifconfig eth1 up
      [root@fep15-rhel8-ha-server1 ~]# date
      Thu Feb  8 12:34:06 AEDT 2024
      [root@fep15-rhel8-ha-server1 ~]#
      [root@fep15-rhel8-ha-server1 ~]# su - fsepuser
      Last login: Thu Feb  8 12:20:36 AEDT 2024 on pts/0
      [fsepuser@fep15-rhel8-ha-server1: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      not-switchable

      server_id  host_role  host         host_status  db_proc_status        disk_status
      ---------------------------------------------------------------------------------
      server1    primary    10.1.254.21  normal       abnormal(wal_sender)  normal
      server2    primary    10.1.254.22  normal       abnormal(wal_sender)  normal
      [fsepuser@fep15-rhel8-ha-server1: ~]$

      Scenario 10: This scenario is not applicable

      This scenario is not applicable, as only one network is configurable, when there is no arbitration server.

      Test results with Mirroring Controller and arbitration server

      Scenario 1: Primary is stopped; Linux OS is shut down

      Result - Server status: Failed over, as expected. Fencing_cmd is executed.

      [fsepuser@fep15-rhel8-ha-server2: /mc]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      switched

      server_id  host_role  host       host_status  db_proc_status        disk_status
      -------------------------------------------------------------------------------
      server1    unknown    10.1.0.20  abnormal     unknown               unknown
      server2    primary    10.1.0.21  normal       abnormal(wal_sender)  normal
      [fsepuser@fep15-rhel8-ha-server2: /mc]$
      [fsepuser@fep15-rhel8-ha-arbiter: ~]$ cat /mc/hb_chg.log

      Fencing Command executed at: Mon Feb  5 01:21:13 UTC 2024
      [fsepuser@fep15-rhel8-ha-arbiter: ~]$

      Standby (new primary) Mirroring Controller log

      Feb  5 01:21:08 fep15-rhel8-ha-server2 MirroringControllerOpen[1293691]: WARNING: detected an error on the monitored object "server(server1)": no response:ping timeout (MCA00019)
      Feb  5 01:21:10 fep15-rhel8-ha-server2 MirroringControllerOpen[1293691]: WARNING: detected an error on the monitored object "server(server1)": no response:ping timeout (MCA00019)
      Feb  5 01:21:12 fep15-rhel8-ha-server2 MirroringControllerOpen[1293691]: WARNING: detected an error on the monitored object "server(server1)": no response:ping timeout (MCA00019)
      Feb  5 01:21:12 fep15-rhel8-ha-server2 MirroringControllerOpen[1293691]: LOG: requesting arbitration server "arbiter" to arbitrate for database server "server1" (MCA00159)
      Feb  5 01:21:13 fep15-rhel8-ha-server2 MirroringControllerOpen[1293691]: LOG: the arbitration server "arbiter" didn't arbitrate status of database server "server1" as sanity (MCA00197)
      Feb  5 01:21:13 fep15-rhel8-ha-server2 MirroringControllerOpen[1293691]: LOG: promotion processing completed (MCA00062)

      Arbiter Mirroring Controller log

      Feb  5 01:21:04 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: LOG: disconnected from the database server "server1" (MCR00076)
      Feb  5 01:21:05 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: LOG: stop monitoring of database server "server1" (MCR00098)
      Feb  5 01:21:09 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: WARNING: detected an error on database server "server1" (MCR00099)
      Feb  5 01:21:11 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: WARNING: detected an error on database server "server1" (MCR00099)
      Feb  5 01:21:12 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: LOG: database server "server2" requested to arbitrate for database server "server1" status (MCR00056)
      Feb  5 01:21:12 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: LOG: arbitrating the status of database server "server1" (MCR00057)
      Feb  5 01:21:13 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: WARNING: detected an error on database server "server1" (MCR00099)
      Feb  5 01:21:13 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: LOG: could not receive response from database server server1 (MCR00059)
      Feb  5 01:21:13 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: LOG: executing fencing command (MCR00062)
      Feb  5 01:21:13 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: LOG: fencing command for database server "server1" succeeded: result:"0" (MCR00063)

      Standby (new primary) database log

      2024-02-05 12:21:03.896 AEDT [1293726] FATAL:  could not receive data from WAL stream: FATAL:  terminating connection due to administrator command
      2024-02-05 12:21:03.897 AEDT [1293725] LOG:  invalid record length at 0/1A0003D8: wanted 24, got 0
      2024-02-05 12:21:03.977 AEDT [1295256] FATAL:  could not connect to the primary server: connection to server at "10.1.254.21", port 27500 failed: FATAL:  the database system is shutting down
      2024-02-05 12:21:03.977 AEDT [1293725] LOG:  waiting for WAL to become available at 0/1A0003F0
      2024-02-05 12:21:13.078 AEDT [1293725] LOG:  received promote request
      2024-02-05 12:21:13.078 AEDT [1295284] FATAL:  terminating walreceiver process due to administrator command
      2024-02-05 12:21:13.078 AEDT [1293725] LOG:  waiting for WAL to become available at 0/1A0003F0
      2024-02-05 12:21:13.078 AEDT [1293725] LOG:  redo done at 0/1A0003A0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 327.01 s
      2024-02-05 12:21:13.078 AEDT [1293725] LOG:  last completed transaction was at log time 2024-02-05 12:17:26.334037+11
      2024-02-05 12:21:13.084 AEDT [1293725] LOG:  selected new timeline ID: 4
      2024-02-05 12:21:13.181 AEDT [1293725] LOG:  archive recovery complete
      2024-02-05 12:21:13.220 AEDT [1293723] LOG:  checkpoint starting: force
      2024-02-05 12:21:13.224 AEDT [1293721] LOG:  database system is ready to accept connections
      2024-02-05 12:21:13.247 AEDT [1293723] LOG:  checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.006 s, total=0.028 s; sync files=2, longest=0.004 s, average=0.003 s; distance=0 kB, estimate=14745 kB
      2024-02-05 12:21:13.285 AEDT [1293721] LOG:  received SIGHUP, reloading configuration files

      Scenario 2: Primary database instance is brought down using pg_ctl stop

      Result - Server status: Failed over, as expected.

      [fsepuser@fep15-rhel8-ha-server2: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      switched

      server_id  host_role                  host       host_status  db_proc_status        disk_status
      -----------------------------------------------------------------------------------------------
      server1    primary                    10.1.0.20  normal       abnormal(wal_sender)  normal
      server2    none(inactivated primary)  10.1.0.21  normal       abnormal(postmaster)  normal
      [fsepuser@fep15-rhel8-ha-server2: ~]$

      Primary database log

      2024-02-05 12:44:10.521 AEDT [1293721] LOG:  received fast shutdown request
      2024-02-05 12:44:10.528 AEDT [1293721] LOG:  aborting any active transactions
      2024-02-05 12:44:10.533 AEDT [1293721] LOG:  background worker "logical replication launcher" (PID 1295332) exited with exit code 1
      2024-02-05 12:44:10.534 AEDT [1293723] LOG:  shutting down
      2024-02-05 12:44:10.667 AEDT [1293723] LOG:  checkpoint starting: shutdown immediate
      2024-02-05 12:44:10.718 AEDT [1293723] LOG:  checkpoint complete: wrote 5 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.018 s, total=0.055 s; sync files=5, longest=0.016 s, average=0.004 s; distance=16383 kB, estimate=16384 kB
      2024-02-05 12:44:10.805 AEDT [1301779] FATAL:  the database system is shutting down
      2024-02-05 12:44:10.811 AEDT [1301780] FATAL:  the database system is shutting down
      2024-02-05 12:44:10.816 AEDT [1301781] FATAL:  the database system is shutting down
      2024-02-05 12:44:10.822 AEDT [1301782] FATAL:  the database system is shutting down
      2024-02-05 12:44:10.935 AEDT [1293721] LOG:  database system is shut down

      Standby (New Primary) log

      2024-02-05 12:44:10.931 AEDT [6723] LOG:  replication terminated by primary server
      2024-02-05 12:44:10.931 AEDT [6723] DETAIL:  End of WAL reached on timeline 4 at 0/1D0000A0.
      2024-02-05 12:44:10.931 AEDT [6723] FATAL:  could not send end-of-streaming message to primary: server closed the connection unexpectedly
                      This probably means the server terminated abnormally
                      before or while processing the request.
              no COPY in progress
      2024-02-05 12:44:10.932 AEDT [6722] LOG:  invalid record length at 0/1D0000A0: wanted 24, got 0
      2024-02-05 12:44:10.940 AEDT [7336] FATAL:  could not connect to the primary server: connection to server at "10.1.0.21", port 27500 failed: Connection refused
                      Is the server running on that host and accepting TCP/IP connections?
      2024-02-05 12:44:10.940 AEDT [6722] LOG:  waiting for WAL to become available at 0/1D0000B8
      2024-02-05 12:44:11.651 AEDT [6722] LOG:  received promote request
      2024-02-05 12:44:11.651 AEDT [6722] LOG:  redo done at 0/1D000028 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 138.53 s
      2024-02-05 12:44:11.651 AEDT [6722] LOG:  last completed transaction was at log time 2024-02-05 12:42:56.459462+11
      2024-02-05 12:44:11.657 AEDT [6722] LOG:  selected new timeline ID: 5
      2024-02-05 12:44:11.789 AEDT [6722] LOG:  archive recovery complete
      2024-02-05 12:44:11.825 AEDT [6720] LOG:  checkpoint starting: force
      2024-02-05 12:44:11.830 AEDT [6718] LOG:  database system is ready to accept connections
      2024-02-05 12:44:11.857 AEDT [6718] LOG:  received SIGHUP, reloading configuration files
      2024-02-05 12:44:11.980 AEDT [6720] LOG:  checkpoint complete: wrote 5 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.125 s, sync=0.013 s, total=0.155 s; sync files=4, longest=0.006 s, average=0.004 s; distance=16384 kB, estimate=16384 kB

      Primary Mirroring Controller log

      Feb  5 01:44:10 fep15-rhel8-ha-server2 MirroringControllerOpen[1293691]: WARNING: detected an error on the monitored object "database process(postmaster)": no response:connection to server at "::1", port 27500 failed: FATAL:  the database system is shutting down#012 (MCA00019)
      Feb  5 01:44:11 fep15-rhel8-ha-server2 MirroringControllerOpen[1293691]: WARNING: detected an error on the monitored object "database process(postmaster)": no response:connection to server at "::1", port 27500 failed: Connection refused#012#011Is the server running on that host and accepting TCP/IP connections?#012 (MCA00019)
      Feb  5 01:44:11 fep15-rhel8-ha-server2 MirroringControllerOpen[1293691]: LOG: starting to fail over from server2 to server1 (MCA00021)
      Feb  5 01:44:11 fep15-rhel8-ha-server2 MirroringControllerOpen[1293691]: LOG: fail over completed.switched from server2 to server1 (MCA00022)

      Standby Mirroring Controller log

      Feb  5 01:44:11 fep15-rhel8-ha-server1 MirroringControllerOpen[6688]: LOG: promotion processing completed (MCA00062)

      Arbiter log

      No event is recorded in the arbiter Mirroring Controller log

      Scenario 3: Postmaster process is killed on primary

      Result - Server status: Failed over, as expected

      [fsepuser@fep15-rhel8-ha-server2: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      switched

      server_id  host_role                  host       host_status  db_proc_status        disk_status
      -----------------------------------------------------------------------------------------------
      server1    none(inactivated primary)  10.1.0.20  normal       abnormal(postmaster)  normal
      server2    primary                    10.1.0.21  normal       abnormal(wal_sender)  normal
      [fsepuser@fep15-rhel8-ha-server2: ~]$

      Primary Mirroring Controller log

      Feb  5 01:58:32 fep15-rhel8-ha-server1 MirroringControllerOpen[6688]: WARNING: detected an error on the monitored object "database process(postmaster)": no response:connection to server at "::1", port 27500 failed: Connection refused#012#011Is the server running on that host and accepting TCP/IP connections?#012 (MCA00019)
      Feb  5 01:58:32 fep15-rhel8-ha-server1 MirroringControllerOpen[6688]: LOG: starting to fail over from server1 to server2 (MCA00021)
      Feb  5 01:58:32 fep15-rhel8-ha-server1 MirroringControllerOpen[6688]: LOG: fail over completed.switched from server1 to server2 (MCA00022)

      Standby Mirroring Controller log

      Feb  5 01:58:32 fep15-rhel8-ha-server2 MirroringControllerOpen[1302592]: LOG: promotion processing completed (MCA00062)

      Arbiter log

      No event is recorded in the arbiter Mirroring Controller log

      Scenario 4: Storage error - Data Storage full on primary

      Result - Server status: Failed over, as expected.

      Test case scenario execution log

      [fsepuser@fep15-rhel8-ha-server1: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      switched

      server_id  host_role                  host       host_status  db_proc_status        disk_status
      -----------------------------------------------------------------------------------------------------------
      server1    primary                    10.1.0.20  normal       abnormal(wal_sender)  normal
      server2    none(inactivated primary)  10.1.0.21  normal       abnormal(postmaster)  abnormal(data,tran_log)
      [fsepuser@fep15-rhel8-ha-server1: ~]$

      Primary Mirroring Controller log

      Feb  5 02:22:54 fep15-rhel8-ha-server2 MirroringControllerOpen[1302592]: WARNING: detected an error on the monitored object "data storage destination directory(/database/inst1)": read/write error:exception=class java.io.IOException message=No space left on device (MCA00019)
      Feb  5 02:22:54 fep15-rhel8-ha-server2 MirroringControllerOpen[1302592]: WARNING: detected an error on the monitored object "transaction log storage destination directory(/database/inst1/pg_wal)": read/write error:exception=class java.io.IOException message=No space left on device (MCA00019)
      Feb  5 02:22:55 fep15-rhel8-ha-server2 MirroringControllerOpen[1302592]: WARNING: detected an error on the monitored object "data storage destination directory(/database/inst1)": read/write error:exception=class java.io.IOException message=No space left on device (MCA00019)
      Feb  5 02:22:55 fep15-rhel8-ha-server2 MirroringControllerOpen[1302592]: WARNING: detected an error on the monitored object "transaction log storage destination directory(/database/inst1/pg_wal)": read/write error:exception=class java.io.IOException message=No space left on device (MCA00019)
      Feb  5 02:22:55 fep15-rhel8-ha-server2 MirroringControllerOpen[1302592]: LOG: starting to fail over from server2 to server1 (MCA00021)
      Feb  5 02:22:55 fep15-rhel8-ha-server2 MirroringControllerOpen[1302592]: LOG: stopped database instance forcibly (MCA00064)
      Feb  5 02:22:55 fep15-rhel8-ha-server2 rsyslogd[1051]: imjournal: fopen() failed for path: '/var/lib/rsyslog/imjournal.state.tmp': No space left on device [v8.2102.0-5.el8 try https://www.rsyslog.com/e/2013 ]
      Feb  5 02:22:55 fep15-rhel8-ha-server2 MirroringControllerOpen[1302592]: LOG: fail over completed.switched from server2 to server1 (MCA00022)

      Standby (new primary) Mirroring Controller log

      Feb  5 02:22:55 fep15-rhel8-ha-server1 MirroringControllerOpen[11773]: LOG: promotion processing completed (MCA00062)

      Arbiter log

      No event is recorded in the arbiter Mirroring Controller log

      Primary database log

      2024-02-05 13:22:55.591 AEDT [1302622] LOG:  received immediate shutdown request
      2024-02-05 13:22:55.637 AEDT [1302622] LOG:  database system is shut down

      Standby (New primary) log

      2024-02-05 13:22:55.628 AEDT [11808] FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
                      This probably means the server terminated abnormally
                      before or while processing the request.
      2024-02-05 13:22:55.629 AEDT [11807] LOG:  invalid record length at 0/21000148: wanted 24, got 0
      2024-02-05 13:22:55.643 AEDT [15753] FATAL:  could not connect to the primary server: connection to server at "10.1.0.21", port 27500 failed: server closed the connection unexpectedly
                      This probably means the server terminated abnormally
                      before or while processing the request.
      2024-02-05 13:22:55.643 AEDT [11807] LOG:  waiting for WAL to become available at 0/21000160
      2024-02-05 13:22:55.700 AEDT [11807] LOG:  received promote request
      2024-02-05 13:22:55.700 AEDT [11807] LOG:  redo done at 0/21000110 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 853.66 s
      2024-02-05 13:22:55.703 AEDT [11807] LOG:  selected new timeline ID: 7
      2024-02-05 13:22:55.795 AEDT [11807] LOG:  archive recovery complete
      2024-02-05 13:22:55.811 AEDT [11805] LOG:  checkpoint starting: force
      2024-02-05 13:22:55.814 AEDT [11803] LOG:  database system is ready to accept connections
      2024-02-05 13:22:55.845 AEDT [11805] LOG:  checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.009 s, total=0.035 s; sync files=2, longest=0.005 s, average=0.005 s; distance=0 kB, estimate=14745 kB
      2024-02-05 13:22:55.909 AEDT [11803] LOG:  received SIGHUP, reloading configuration files

      Scenario 5: Storage error - WAL directory full on primary

      Result - Server status: Failed over, as expected.

      [fsepuser@fep15-rhel8-ha-server1: ~]$ df -h /mnt/resource/pg_wal/
      Filesystem      Size  Used Avail Use% Mounted on
      /dev/sdb1       7.9G  7.4G     0 100% /mnt/resource
      [fsepuser@fep15-rhel8-ha-server1: ~]$

      [fsepuser@fep15-rhel8-ha-server1: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      switched

      server_id  host_role                  host       host_status  db_proc_status        disk_status
      -----------------------------------------------------------------------------------------------
      server1    none(inactivated primary)  10.1.0.20  normal       abnormal(postmaster)  normal
      server2    primary                    10.1.0.21  normal       abnormal(wal_sender)  normal
      [fsepuser@fep15-rhel8-ha-server1: ~]$

      Primary database log

      2024-02-05 15:13:30.093 AEDT [21638] PANIC:  could not write to file "pg_wal/xlogtemp.21638": No space left on device
      2024-02-05 15:13:30.093 AEDT [21638] CONTEXT:  writing block 6186 of relation base/5/16409
              COPY pgbench_accounts, line 503000
      2024-02-05 15:13:30.093 AEDT [21638] STATEMENT:  copy pgbench_accounts from stdin with (freeze on)
      2024-02-05 15:13:32.410 AEDT [19756] LOG:  server process (PID 21638) was terminated by signal 6: Aborted
      2024-02-05 15:13:32.410 AEDT [19756] DETAIL:  Failed process was running: copy pgbench_accounts from stdin with (freeze on)
      2024-02-05 15:13:32.410 AEDT [19756] LOG:  terminating any other active server processes
      2024-02-05 15:13:32.414 AEDT [19756] LOG:  all server processes terminated; reinitializing
      2024-02-05 15:13:33.285 AEDT [21674] LOG:  database system was interrupted; last known up at 2024-02-05 15:13:26 AEDT
      2024-02-05 15:13:33.288 AEDT [21678] FATAL:  the database system is in recovery mode
      2024-02-05 15:13:33.295 AEDT [21677] FATAL:  the database system is in recovery mode
      2024-02-05 15:13:33.298 AEDT [21679] FATAL:  the database system is in recovery mode
      2024-02-05 15:13:33.302 AEDT [21680] FATAL:  the database system is in recovery mode
      2024-02-05 15:13:33.308 AEDT [21681] FATAL:  the database system is in recovery mode
      2024-02-05 15:13:33.634 AEDT [21674] LOG:  database system was not properly shut down; automatic recovery in progress
      2024-02-05 15:13:33.640 AEDT [21674] LOG:  redo starts at 0/25000338
      2024-02-05 15:13:33.770 AEDT [21674] LOG:  missing contrecord at 0/27FFEDF0
      2024-02-05 15:13:33.770 AEDT [21674] LOG:  redo done at 0/27FFEDB0 system usage: CPU: user: 0.06 s, system: 0.06 s, elapsed: 0.13 s
      2024-02-05 15:13:33.772 AEDT [21674] FATAL:  could not write to file "pg_wal/xlogtemp.21674": No space left on device
      2024-02-05 15:13:33.778 AEDT [19756] LOG:  startup process (PID 21674) exited with exit code 1
      2024-02-05 15:13:33.778 AEDT [19756] LOG:  terminating any other active server processes
      2024-02-05 15:13:33.779 AEDT [19756] LOG:  shutting down due to startup process failure 2024-02-05 15:13:33.788 AEDT [19756] LOG:  database system is shut down

      Primary Mirroring Controller log

      Feb  5 04:13:33 fep15-rhel8-ha-server1 MirroringControllerOpen[19726]: WARNING: detected an error on the monitored object "database process(postmaster)": no response:connection to server at "::1", port 27500 failed: FATAL:  the database system is in recovery mode#012 (MCA00019)
      Feb  5 04:13:34 fep15-rhel8-ha-server1 MirroringControllerOpen[19726]: WARNING: detected an error on the monitored object "database process(postmaster)": no response:connection to server at "::1", port 27500 failed: Connection refused#012#011Is the server running on that host and accepting TCP/IP connections?#012 (MCA00019)
      Feb  5 04:13:34 fep15-rhel8-ha-server1 MirroringControllerOpen[19726]: LOG: starting to fail over from server1 to server2 (MCA00021)
      Feb  5 04:13:34 fep15-rhel8-ha-server1 MirroringControllerOpen[19726]: LOG: fail over completed.switched from server1 to server2 (MCA00022)

      Standby Mirroring Controller log

      Feb  5 04:13:34 fep15-rhel8-ha-server2 MirroringControllerOpen[1310904]: LOG: promotion processing completed (MCA00062)

      Arbiter log

      No event is recorded in the arbiter Mirroring Controller log

      Scenario 6: Mirroring Controler is stopped on primary using mc_ctl stop -M /mc

      Result - Server status: No Failover

      [fsepuser@fep15-rhel8-ha-server2: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      unknown

      server_id  host_role  host       host_status  db_proc_status          disk_status
      ---------------------------------------------------------------------------------
      server1    standby    10.1.0.20  normal       abnormal(wal_receiver)  normal
      server2    unknown    10.1.0.21  normal       unknown                 unknown
      [fsepuser@fep15-rhel8-ha-server2: ~]$

      Primary Mirroring Controller log

      Feb  5 04:52:38 fep15-rhel8-ha-server2 MirroringControllerOpen[1346327]: LOG: stopping Mirroring Controller (MCA00041)
      Feb  5 04:52:38 fep15-rhel8-ha-server2 MirroringControllerOpen[1310904]: LOG: disconnected from arbitration server "arbiter" (MCA00153)
      Feb  5 04:52:38 fep15-rhel8-ha-server2 MirroringControllerOpen[1346327]: LOG: Mirroring Controller stopped  target server:"server2" (MCA00042)

      Standby Mirroring Controller log

      No event is recorded on the standby Mirroring Controller log.

      Arbiter log

      Feb  5 04:52:38 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: LOG: disconnected from the database server "server2" (MCR00076)
      Feb  5 04:52:38 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: LOG: stop monitoring of database server "server2" (MCR00098)

      Primary database log

      2024-02-05 15:52:38.299 AEDT [1310934] LOG:  received fast shutdown request
      2024-02-05 15:52:38.303 AEDT [1310934] LOG:  aborting any active transactions
      2024-02-05 15:52:38.307 AEDT [1310934] LOG:  background worker "logical replication launcher" (PID 1335548) exited with exit code 1
      2024-02-05 15:52:38.309 AEDT [1310936] LOG:  shutting down
      2024-02-05 15:52:38.414 AEDT [1310936] LOG:  checkpoint starting: shutdown immediate
      2024-02-05 15:52:38.443 AEDT [1310936] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.034 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16383 kB, estimate=33059 kB
      2024-02-05 15:52:38.706 AEDT [1310934] LOG:  database system is shut down

      Scenario 7: Mirroring Controller is forcibly stopped on primary using mc_ctl stop -M /mc -e

      Result - Database instance on primary is up and running without any event logged for Mirroring Controller crash. Server status: No failover

      [fsepuser@fep15-rhel8-ha-server2: ~]$ mc_ctl stop -M /mc/ -e
      stopping Mirroring Controller forcibly (MCA00044)
      stopped Mirroring Controller forcibly (MCA00045)
      [fsepuser@fep15-rhel8-ha-server2: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      unknown

      server_id  host_role  host       host_status  db_proc_status  disk_status
      ----------------------------------------------------------------------
      server1    standby    10.1.0.20  normal       normal          normal
      server2    unknown    10.1.0.21  normal       unknown         unknown
      [fsepuser@fep15-rhel8-ha-server2: ~]$
      [fsepuser@fep15-rhel8-ha-server2: ~]$ pg_ctl -D /database/inst1/ status
      pg_ctl: server is running (PID: 1346548)
      /opt/fsepv15server64/bin/postgres "-D" "/database/inst1"
      [fsepuser@fep15-rhel8-ha-server2: ~]$

      Primary database log

      No related event in the database log

      Primary Mirroring Controller log

      Feb  5 05:00:23 fep15-rhel8-ha-server2 MirroringControllerOpen[1346890]: LOG: stopping Mirroring Controller forcibly (MCA00044)
      Feb  5 05:00:23 fep15-rhel8-ha-server2 MirroringControllerOpen[1346890]: LOG: stopped Mirroring Controller forcibly (MCA00045)

      Standby Mirroring Controller log

      No related event in the Mirroring Controller log.

      Arbiter log

      Feb  5 05:00:23 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: LOG: disconnected from the database server "server2" (MCR00076)

      Scenario 8: Mirroring Controller processes (mc_agent and mc_keeper) are killed and Mirroring Controller is stopped, but database instance is kept running

      Result - Database instance on primary is up and running without any event logged for Mirroring Controller crash. Server status: No Failover

      [fsepuser@fep15-rhel8-ha-server2: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      unknown

      server_id  host_role  host       host_status  db_proc_status  disk_status
      -------------------------------------------------------------------------
      server1    standby    10.1.0.20  normal       normal          normal
      server2    unknown    10.1.0.21  normal       unknown         unknown
      [fsepuser@fep15-rhel8-ha-server2: ~]$

      [fsepuser@fep15-rhel8-ha-server1: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      unknown

      server_id  host_role  host       host_status  db_proc_status  disk_status
      -------------------------------------------------------------------------
      server1    standby    10.1.0.20  normal       normal          normal
      server2    unknown    10.1.0.21  normal       unknown         unknown
      [fsepuser@fep15-rhel8-ha-server1: ~]$

      Mirroring Controller log

      No event is recorded in the Mirroring Controller log of primary or standby servers.

      Arbiter log

      Feb  5 05:37:40 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: LOG: disconnected from the database server "server2" (MCR00076)

      Scenario 9: Network error: Network is disrupted between primary and standby - IP address is brought down using ifconfig eth1 down)

      Note: Admin and Arbitration are running on separate networks

      Result - Network error: Network disrupted (IP address is taken down using ‘ifconfig eth1 down’). Server status: Failed over. Fencing_cmd is executed. Split-brain scenario is prevented.

      [root@fep15-rhel8-ha-server1 ~]# ifconfig eth1 down
      [root@fep15-rhel8-ha-server1 ~]#

      [fsepuser@fep15-rhel8-ha-server2: /mc]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      switched

      server_id  host_role  host         host_status  db_proc_status        disk_status
      ---------------------------------------------------------------------------------
      server1    unknown    10.1.254.21  abnormal     unknown               unknown
      server2    primary    10.1.254.22  normal       abnormal(wal_sender)  normal
      [fsepuser@fep15-rhel8-ha-server2: /mc]$
      [fsepuser@fep15-rhel8-ha-server1: ~]$ mc_ctl status -M /mc/ --arbiter
      arbiter_id  host         status  
      -------------------------------------
      arbiter     10.1.254.23  online  
      [fsepuser@fep15-rhel8-ha-server1: ~]$


      [fsepuser@fep15-rhel8-ha-server1: ~]$ mc_ctl status -M /mc/
      mc_ctl: The IP address or host name of the server where the command was executed in is not found on the definition file "/mc/network.conf" (MCA00051)
      [fsepuser@fep15-rhel8-ha-server1: ~]$

      Primary Mirroring Controller log

      Feb  8 11:29:41 fep15-rhel8-ha-server1 MirroringControllerOpen[432697]: WARNING: detected an error on the monitored object "server(server2)": no response:ping timeout (MCA00019)
      Feb  8 11:29:43 fep15-rhel8-ha-server1 MirroringControllerOpen[432697]: WARNING: detected an error on the monitored object "server(server2)": no response:ping timeout (MCA00019)
      Feb  8 11:29:45 fep15-rhel8-ha-server1 MirroringControllerOpen[432697]: WARNING: detected an error on the monitored object "server(server2)": no response:ping timeout (MCA00019)
      Feb  8 11:29:45 fep15-rhel8-ha-server1 MirroringControllerOpen[432697]: LOG: executing arbitration command and inquiring the result of arbitration (MCA00191)
      Feb  8 11:29:45 fep15-rhel8-ha-server1 MirroringControllerOpen[432697]: LOG: starting degenerate because of result of arbitration command for database server "server2"  result:"0" (MCA00193)
      Feb  8 11:29:45 fep15-rhel8-ha-server1 MirroringControllerOpen[432697]: LOG: starting to detach standby server "server2" automatically (MCA00024)
      Feb  8 11:29:45 fep15-rhel8-ha-server1 MirroringControllerOpen[432697]: LOG: executing fencing command (MCA00192)
      Feb  8 11:29:45 fep15-rhel8-ha-server1 MirroringControllerOpen[432697]: LOG: fencing command for database server "server2" succeeded: result:"0" (MCA00198)
      Feb  8 11:29:45 fep15-rhel8-ha-server1 MirroringControllerOpen[432697]: LOG: detach standby server "server2" completed automatically (MCA00025)

      Feb  8 11:30:27 fep15-rhel8-ha-server1 MirroringControllerOpen[433813]: ERROR: mc_ctl: The IP address or host name of the server where the command was executed in is not found on the definition file "/mc/network.conf" (MCA00051)

      Standby Mirroring Controller log

      Feb  8 11:29:42 fep15-rhel8-ha-server2 MirroringControllerOpen[544659]: WARNING: detected an error on the monitored object "server(server1)": no response:ping timeout (MCA00019)
      Feb  8 11:29:44 fep15-rhel8-ha-server2 MirroringControllerOpen[544659]: WARNING: detected an error on the monitored object "server(server1)": no response:ping timeout (MCA00019)
      Feb  8 11:29:46 fep15-rhel8-ha-server2 MirroringControllerOpen[544659]: WARNING: detected an error on the monitored object "server(server1)": no response:ping timeout (MCA00019)
      Feb  8 11:29:46 fep15-rhel8-ha-server2 MirroringControllerOpen[544659]: LOG: executing arbitration command and inquiring the result of arbitration (MCA00191)
      Feb  8 11:29:46 fep15-rhel8-ha-server2 MirroringControllerOpen[544659]: LOG: starting degenerate because of result of arbitration command for database server "server1"  result:"0" (MCA00193)
      Feb  8 11:29:46 fep15-rhel8-ha-server2 MirroringControllerOpen[544659]: LOG: executing fencing command (MCA00192)
      Feb  8 11:29:46 fep15-rhel8-ha-server2 MirroringControllerOpen[544659]: LOG: fencing command for database server "server1" succeeded: result:"0" (MCA00198)
      Feb  8 11:29:46 fep15-rhel8-ha-server2 MirroringControllerOpen[544659]: LOG: promotion processing completed (MCA00062)

      Scenario 10: Network error: Streaming network is disrupted; bit streaming replication is configured on second network and is disrupted

      Note: Mirroring Controller and replication are running on different networks.

      Result - Network error: Arbitration network is disrupted. Arbitration network is configured on second network and  is disrupted. No failover. Fencing script is not executed. Server status: No Failover

      [root@fep15-rhel8-ha-server2 ~]# date; ifconfig eth1 down
      Mon Feb  5 17:17:14 AEDT 2024
      [root@fep15-rhel8-ha-server2 ~]#

      [fsepuser@fep15-rhel8-ha-server2: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      switchable

      server_id  host_role  host       host_status  db_proc_status  disk_status
      -------------------------------------------------------------------------
      server1    standby    10.1.0.20  normal       normal          normal
      server2    primary    10.1.0.21  normal       normal          normal
      [fsepuser@fep15-rhel8-ha-server2: ~]$

      [fsepuser@fep15-rhel8-ha-server1: ~]$ mc_ctl status -M /mc/
      mirroring status
      -----------------
      switchable

      server_id  host_role  host       host_status  db_proc_status  disk_status
      -------------------------------------------------------------------------
      server1    standby    10.1.0.20  normal       normal          normal
      server2    primary    10.1.0.21  normal       normal          normal
      [fsepuser@fep15-rhel8-ha-server1: ~]$

      [fsepuser@fep15-rhel8-ha-arbiter: ~]$ mc_arb status -M /mc
      server_id  host         status  
      -------------------------------------
      server1    10.1.254.21  online  
      server2    10.1.254.22  offline  
      [fsepuser@fep15-rhel8-ha-arbiter: ~]$

      Primary Mirroring Controller log

      Feb  5 06:17:39 fep15-rhel8-ha-server2 MirroringControllerOpen[1354449]: ERROR: timeout waiting for communication with Mirroring Controller Arbitration process  server:"arbiter" (MCA00154) Feb  5 06:17:39 fep15-rhel8-ha-server2 MirroringControllerOpen[1354449]: LOG: trying to connect to arbitration server "arbiter" (MCA00150)

      Standby database log

      No event recorded in the standby Mirroring Controller log.

      Arbiter log

      Feb  5 06:17:16 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: WARNING: detected an error on database server "server2" (MCR00099) Feb  5 06:17:18 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: WARNING: detected an error on database server "server2" (MCR00099) Feb  5 06:17:20 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: WARNING: detected an error on database server "server2" (MCR00099) Feb  5 06:17:39 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]:
      disconnected from the database server "server2" (MCR00076) Feb  5 06:17:39 fep15-rhel8-ha-arbiter MirroringControllerArbiter[2468428]: ERROR: unexpected error occurred: class java.lang.RuntimeException: java.net.SocketTimeoutException#012#011at com.fujitsu.fepmc.net.rpc.RPCSocketProcessor.run(Unknown Source)#012#011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)#012#011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)#012#011at java.lang.Thread.run(Thread.java:748)#012 (MCR00030)

      Topics: Fujitsu Enterprise Postgres, High Availability, Business continuity, Mirroring Controller, Split brain, Arbitration server

      Receive our blog

      Search by topic

      Posts by Tag

      See all
      Fujitsu Enterprise Postgres
      The hybrid multi-cloud Postgres backed by Fujitsu
      photo-rajesh-kandasamy-in-hlight-circle-blue-to-cyan
      Rajesh Kandasamy
      Technical Consultant, Fujitsu Enterprise Postgres Center of Excellence
      Rajesh is a Technical Consultant with the Fujitsu Enterprise Postgres Center of Excellence and holds an OSS-DB Silver certification in PostgreSQL.
      He brings over 12 years of experience in various relational database management systems (RDBMS) such as PostgreSQL, Oracle, and SQL Server, as well as NoSQL technologies like MongoDB. In his role, he provides consulting and support to Fujitsu Enterprise Postgres customers, helping them leverage features effectively.

      Receive our blog

      Fill the form to receive notifications of future posts

      Search by topic

      see all >