In this post, I will explain the improvements introduced in PostgreSQL 15 to address the following two issues in logical replication communication:
- Cases where the walsender sends an empty transaction if all DMLs in the transaction are not published according to publication filter. This causes a waste of resources such as CPU/memory/network.
- When processing large transactions, if the walsender is busy processing unpublished DMLs in the transaction, it may be unable to communicate with the walreceiver for a long time. This may cause unexpected timeout error even though the walsender is working as expected.
Communication in logical replication
Before we go further, I would like to briefly introduce two concepts about communication in logical replication: the types of communication message, and filtering.
Types of communication message
In logical replication, there are two types of messages sent by walsender to walreceiver:
- Keep-alive message
This message is used to tell the walreceiver that the walsender is working as expected.
The user can set the timeout GUC of walreceiver using the wal_receiver_timeout parameter (which defaults to 60 seconds). Within wal_receiver_timeout, if the walreceiver does not receive any type of message from the walsender, it will exit with a timeout error.
- Logical replication protocol message
There are many types of logical replication protocol messages1, but in this blog we only focus on two of them:
- DML messages, including INSERT, UPDATE, DELETE, and TRUNCATE
- Messages that define the start and end of a transaction, such as BEGIN and COMMIT messages.
The full list of logical replication protocols can be viewed in the PostgreSQL website.
You may not be aware of this, but not all DMLs in a transaction will be sent to the walreceiver. This is because filters can be specified on a table, row, or operation type when creating a publication. Therefore, some DMLs will not be sent if the filter conditions are met.
Now, I will separately introduce how the two issues mentioned at the beginning of this blog post have been improved/fixed.
Improvement to empty transactions
If all DMLs in a transaction are filtered out during decoding, then we call this transaction an empty transaction. Before PostgreSQL 15, the standard logical decoding plugin pgoutput (the logical replication plugin used by default in PostgreSQL) would send every transaction to the walreceiver. Even for an empty transaction, while DML-related messages would not be sent, messages that define the start and end of the transaction would still be sent. It was a waste of CPU cycles and network bandwidth to build/transmit these empty transactions. We can see this process below.
To address this problem, we made walsender postpone the BEGIN message until the first DML message is sent. At the end of decoding, if no BEGIN message has been sent, then the COMMIT message will not be sent either. Detailed development information can be viewed in GitHub.
As for non-empty transactions, there is also a change in timing of sending messages. Let’s take an example to see this change - suppose we have the following transaction T1:
postgres=*# INSERT INTO tab_not_publish VALUES (1); -- This DML will be filtered out
INSERT 0 1
postgres=*# INSERT INTO tab_publish VALUES (1); -- This DML will be published
INSERT 0 1
The timing of sending the BEGIN message and the COMMIT message can be seen in the figures below.
As we can see above, only the INSERT message of table tab_publish can be sent from the walsender to the walreceiver.
As shown above, after the modification, if all DMLs in a transaction are filtered out, then the BEGIN message and the COMMIT message will not be sent. In this way, the walsender can skip sending empty transactions.
However, in synchronous logical replication, before PostgreSQL 15, to confirm that the data has been synchronized, when the walreceiver receives the COMMIT message of the empty transaction, it will synchronize the local data and send a feedback message to the walsender to confirm that the data has been synchronized.
The walsender will only continue after receiving the feedback message from the walreceiver, otherwise walsender will block client backend to commit the transaction. Therefore, in PostgreSQL 15, after the walsender skips an empty transaction in the synchronous logical replication, it will send a keep-alive message to the walreceiver and ask for a feedback message. Then the walreceiver will synchronize data and send a feedback message to the walsender according to this message. In this way, we can avoid the situation that the transaction delays the response in synchronous logical replication. The figures below show the changes in communication in synchronous logical replication.
Fix for unexpected timeout error
Before PostgreSQL 15, if a transaction had lots of consecutive DMLs that were not published, it would cause the walsender to not communicate with the walreceiver for a long time, as the walsender would be busy decoding these unpublished DMLs. In this case, even though the walsender was working as expected, since the walreceiver had not received any messages from the walsender within the specified timeout, it would cause the walreceiver to get an unexpected timeout error.
In PostgreSQL 15, to avoid this error, the walsender keeps communication periodically with the walreceiver. Therefore, when the walsender processes a certain threshold of DMLs (regardless of whether those DMLs are published or not), it will try to send keep-alive messages to the walreceiver to maintain communication if needed. Detailed information about this development is accessible in GitHub.
Suppose we have a transaction T2, and within T2 there are lots of DMLs, but none of them will be published. As can be seen below, when the transaction T2 is processed, the communication between walsender and walreceiver has been changed.
As can be seen above, we set the threshold to 100 in the PostgreSQL kernel (after performance testing, it is confirmed that this threshold can solve this timeout error and will not degrade performance2). In this way, the walreceiver does not have this unexpected timeout error when the walsender works as expected.
Finally, I will share the results of performance tests on the improvement to the empty transactions.
As mentioned above, after improving the handling of empty transactions, the walsender no longer sends empty transactions containing only BEGIN and COMMIT messages to the walreceiver. This reduces network traffic and improves performance. In my tests, I found that after the improvement, when decoding an empty transaction, in asynchronous logical replication, the walsender will transmit 97 bytes less; in synchronous logical replication, although the walsender will transmit an additional keep-alive message, the total transmission is still reduced by 79 bytes. Next, let's see the performance improvement in network bandwidth consumption through the test results.
Obviously, the proportion of empty transactions in the transactions transmitted by the walsender affects the test results, so we tested for five different proportions of empty transactions. In addition, after the improvement, in synchronous logical replication, if the walsender skips an empty transaction, it will send an additional keep-alive message, so both synchronous logical replication and asynchronous logical replication are tested. As can be seen below, performance is improved both in asynchronous logical replication and synchronous logical replication. (The x-axis represents the proportion of empty transactions in the transmitted transactions. The y-axis represents the total amount of data transferred by the walsender to the walreceiver, in bytes.)
It can be seen from the test results, the higher the proportion of empty transactions, the greater the improvement. When the proportion of empty transactions is 25%, 50%, 75%, and 100%, the network transmission was reduced by about 8%, 22%, 40%, and 84%, respectively. There is also no performance degradation when there are no empty transactions.
For the future
In this post, I explained the improvements in performance and functionality of logical replication resulting from skipping empty transactions and fixing unexpected timeout errors. However, there are still some limitations to the mechanism of skipping empty transactions.
For example, empty transactions will not be skipped for two-phase commits. This is because if the walsender restarts between preparing the transaction and committing the prepared transaction, we cannot clearly know whether the prepared transaction was skipped or not at commit time. Streaming of in-progress transactions (set by the parameter "streaming" of subscription_parameter) also has not been improved due to the same problem. Soon, we may try to improve the processing of empty transactions in the two types of transactions mentioned above in a better way.
In addition, to improve the efficiency of applying streaming of in-progress transactions on the walreceiver, our team has shared patches for parallel apply of transactions in the walreceiver3 and has been actively discussing in the community for continuous improvement.