<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=2826169&amp;fmt=gif">
Start  trial

    Start trial

      The Fujitsu OSS team in collaboration with the PostgreSQL open source community has added support in PostgreSQL 14 to enable decoding of two-phase commits in logical replication. Let's take a look at what it can do.

      photo-ajin-cherian-in-circleBackground

      Two-phase commit, as the name implies, is a mechanism by which a transaction is committed in two phases. This is usually done to achieve consistency in distributed databases. The two phases of the transaction are the PREPARE phase and then the COMMIT/ROLLBACK phase.

      The actual commands used to achieve two-phase commits in PostgreSQL are:

      • PREPARE TRANSACTION
      • COMMIT PREPARED
      • ROLLBACK PREPARED

      PostgreSQL has supported two-phase commits since version 8.0 and logical replication since version 10.0, but two-phase commits were never supported by logical replication.

      The commands PREPARE TRANSACTION, COMMIT PREPARED and ROLLBACK PREPARED were supported within an instance, but when these commands needed to be logically replicated to a standby, they no longer retained their original meaning. The PREPARE TRANSACTION command was treated as a NOP and not decoded at all, the COMMIT PREPARED command was treated as a COMMIT, and the ROLLBACK PREPARED command was treated as an ABORT.

      What is a two-phase commit?

      Two-phase commit is a type of atomic commitment protocol that helps maintain consistency between distributed databases. Plain commits, which provide atomicity within a database, are sometimes not enough to provide consistency for transactions that span across databases.

      img-man-using-computer-with-interconnected-dots-01To illustrate the problem, let’s take an example of transfer of money between two accounts in two different banks. For example:

      • John has an account in bank A which has $300.
      • Mark has an account in bank B which has $100.
      • John wants to transfer $100 to Mark.

      For the transaction to go through, $100 needs to be subtracted from the balance in bank A and $100 needs to be added to the balance in bank B. At the end of the transaction, both accounts should have a balance of $200. If at any time during the transfer either of the transactions fail, then the state of the accounts should return back to how they were before the transfer started.

      img-man-pointing-at-interconnected-dots

      Transactions can fail for multiple reasons. Databases are built such that if any outage occurs before a transaction is committed, then that transaction is rolled back. In our example, if the outage occurred while the amount was being deducted from John’s account, then after the outage John’s account will not reflect the deducted amount since the transaction failed. This is how simple commits maintain consistency within the database.

      But let’s consider the scenario where the transaction which deducted $100 from John’s account succeeded with a single commit, but the transaction which added $100 to Mark’s account in bank B failed and was rolled back. Then at the end of this operation, while John’s account has the money deducted, Mark will not have received that amount. The $100 has vanished. Simple commits have chances to fail when dealing with distributed transactions.
      The two-phase commit evolved as a mechanism to solve this problem.

      Step-by-step of a distributed transaction

      For two-phase commits, one of the databases (or an external arbiter) acts as the coordinator of the distributed transaction.

      Phase 1

      One database starts applying the transaction and then does a prepare. It sends the prepared transaction to the other database in a prepare message. The second database gets the prepare message and also prepares the transaction. A prepare involves making the changes involved in the transaction but not committing it. The changes are written to the disk so that they can outlast an outage. Once both the databases prepare the transaction, and all required information regarding the transaction is stored on disk, the prepare phase is completed.

      Phase2

      Next, the arbiter initiates the commit phase. If the second database fails to prepare the transaction for whatever reason, then the arbiter initiates the rollback phase. So, depending on whether the prepare went through in the second database, the changes are either both committed or both rolled back. Outages that occur at the final commit phase are recoverable, as the required prepared transaction has been written to disk and can be re-applied.

      Two-phase commits are not really relevant for single instance database installations, but they are relevant for large installations where the data is replicated across multiple database instances.

      It is important that PostgreSQL supports two-phase commits through logical replication.

      Feature overview

      Prior to PostgreSQL version 14, logical replication transactions were decoded and replicated only after the transaction was committed. This was to avoid replicating transactions that might eventually be aborted.

      Transactions decoded on COMMIT
      Decoding transactions on commit

      With the new feature that the Fujitsu OSS team in collaboration with the members of the PostgreSQL open source community implemented in PostgreSQL 14, the commands PREPARE TRANSACTION, COMMIT PREPARED and ROLLBACK PREPARED are now supported in logical replication, and transactions are decoded and replicated when the PREPARE TRANSACTION command is decoded. The PREPARE TRANSACTION initiates transaction replay and decoding just like the COMMIT command in the WAL SENDER.

      Transactions decoded on PREPARE
      Diagram showing usage of openCryptoki and OpenSSL
      Decoding transactions on prepare

      We also defined new plugin callbacks that will allow logical decoding plugins to support two-phase commits.

      Callback name Description
      filter_prepare_cb Allows plugins to filter transactions that do not need to be decoded at prepare time based on the GID used in the PREPARE TRANSACTION command.
      begin_prepare_cb The start of a prepared transaction.
      prepare_cb Called when the PREPARE TRANSACTION command is decoded.
      commit_prepared_cb Called when the COMMIT PREPARED command is decoded.
      rollback_prepared_cb Called when the ROLLBACK PREPARED command is decoded.

      You can view further details of the implementation here icon-external-link-02-variation-01.

      Plugins modified

      • test_decoding

      This plugin is a logical decoding output plugin which serves as an example for users to develop their own logical decoding plugin. test_decoding receives WAL through the logical decoding mechanism and decodes it into a text representation of the operations performed.

      It was modified to be able to use the new two-phase callbacks and decode transactions at prepare time.

      APIs modified

      • pg_create_logical_replication_slot()

      This API was modified to take a new option specifying if the slot supports two-phase commits. Replication slots created with two-phase option can be used by output plugins to support two-phase commits.

      • pg_create_logical_replication_slot(slot_name name, plugin name [, temporary boolean, two_phase boolean ] )

      Details can be viewed here icon-external-link-02-variation-01.

      Example

      Let's have a look at how you can check the decoded ouput of transactions in two-phased commits.

      1Create a replication slot

      Use test_decoding as the output plugin, and pass true so that the slot supports decoding of two-phase commits.

      postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
          slot_name    |   lsn
      -----------------+-----------
       regression_slot | 0/16B1970
      (1 row)  

      2Create a table:

      postgres=# CREATE TABLE data(id serial primary key, data text);
      CREATE TABLE

      3Check the decoded output of a prepared transaction and a committed transaction:

      postgres=# BEGIN;
      postgres=*# INSERT INTO data(data) VALUES('5');
      postgres=*# PREPARE TRANSACTION 'test_prepared1';

      postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
          lsn    | xid |          data
      -----------+-----+---------------------------------------------------------
       0/1689DC0 | 529 | BEGIN 529
       0/1689DC0 | 529 | table public.data: INSERT: id[integer]:3 data[text]:'5'
       0/1689FC0 | 529 | PREPARE TRANSACTION 'test_prepared1', txid 529
      (3 rows)

      postgres=# COMMIT PREPARED 'test_prepared1';
      postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
          lsn    | xid |       data
      -----------+-----+--------------------------------------------
       0/168A060 | 529 | COMMIT PREPARED 'test_prepared1', txid 529
      (4 rows)

      postgres=# select * from data;
       id | data
      ----+------
        1 |  5
      (1 row)

      4Check the decoded output of a prepared transaction and a rolled back transaction:

      postgres=# BEGIN;
      postgres=*# INSERT INTO data(data) VALUES('6');
      postgres=*# PREPARE TRANSACTION 'test_prepared2';
      postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
          lsn    | xid |      data
      -----------+-----+---------------------------------------------------------
       0/168A180 | 530 | BEGIN 530
       0/168A1E8 | 530 | table public.data: INSERT: id[integer]:4 data[text]:'6'
       0/168A430 | 530 | PREPARE TRANSACTION 'test_prepared2', txid 530
      (3 rows)

      postgres=# ROLLBACK PREPARED 'test_prepared2';
      postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
          lsn    | xid | data
      -----------+-----+----------------------------------------------
       0/168A4B8 | 530 | ROLLBACK PREPARED 'test_prepared2', txid 530
      (1 row)

      postgres=# select * from data;
       id | data
      ----+------
        1 | 5
      (1 row)

      For the future

      With the changes that went into this feature in PostgreSQL 14, we now have the decoder side infrastructure that allows decoding of two-phase commits at prepare time. We have also modified the test_decoding plugin to make use of this infrastructure.

      The next step is to implement the support for two-phase into the biggest logical decoding plugin inside PostgreSQL - the pgoutput plugin. This plugin is what supports the PUBLISHER/SUBSCRIBER model of logical replication in PostgreSQL. It is also the most widely used plugin for logical replication. The Fujitsu OSS team is working with the open source community to get this support added in PostgreSQL 15.

      For two-phase transactions to work in a distributed database, PostgreSQL also needs to build support for the standby to inform the master about a failed PREPARE and initiate a ROLLBACK. This type of standby feedback mechanism is not present in PostgreSQL, and is a candidate for future improvement.

      Subscribe to be notified of future blog posts
      If you would like to be notified of my next blog posts and other PostgreSQL-related articles, fill the form here.
      We also have a series of technical articles for PostgreSQL enthusiasts of all stripes, with tips and how-to's.

       

      Explore PostgreSQL Insider >

      Topics: PostgreSQL, PostgreSQL community, Data governance

      Receive our blog

      Search by topic

      see all >

      Read our latest blogs

      Read our most recent articles regarding all aspects of PostgreSQL and Fujitsu Enterprise Postgres.

      Receive our blog

      Fill the form to receive notifications of future posts

      Search by topic

      see all >