Continuing the tradition, in PostgreSQL 15 we have delivered several improvements, and also included a very useful new command that will simplify coding in several use cases. Let's go through them.
Logical replication improvements
This version introduces several improvements to logical replication.
- Support for prepared transactions to built-in logical replication.
The user needs to specify the two_phase = true option when creating the subscription so that prepared transactions can be replicated.
CREATE PUBLICATION mypub FOR ALL TABLES;
CREATE SUBSCRIPTION mysub CONNECTION 'dbname=postgres'
PUBLICATION mypub WITH (two_phase = true);This ability reduces lag when replicating data, since instead of sending all transaction data at commit time, it will start sending the data at prepare time. It reduces the lag to replicate data, because there could be a long time gap between prepare and commit.
It also provides the base for building conflict-free logical replication. That is because if a conflict is detected during prepare time while applying the transaction on the subscriber, we can roll back the prepare even on the publisher side. Note that this cannot be done by default in PostgreSQL, so we need to make some careful changes in the application. I write about this in detail in my blog post here.
- Publications can now publish all tables in a schema.
Previously, if users wanted to publish all tables in a schema, they needed to mention each table separately. But with this new clause, all tables in the schema can be replicated without having to be individually specified.
This can also be used along with the existing syntax where the user can specify individual tables.CREATE PUBLICATION mypub FOR TABLES IN SCHEMA mysch;
CREATE PUBLICATION mypub FOR TABLE mytab, TABLES IN SCHEMA mysch;Tables added later to the listed schemas will also be replicated.
You can read more about it in our blog post here.
- Publication content can now be filtered using a WHERE clause.
CREATE PUBLICATION mypub FOR TABLE mytab WHERE (c1 > 10);
This can help distribute data among nodes and improve performance by sending data selectively.
Currently, the WHERE clause allows only simple expressions. It cannot contain user-defined functions, operators, types, collations, system column references, or non-immutable built-in functions. But note that these restrictions may be uplifted in the future.
If the publication publishes UPDATE or DELETE operations, then the row filter WHERE clause must contain only columns that are covered by the replica identity. If it publishes only INSERT operations, then the WHERE clause can be used for any column.
More information is available in our blog post here.
- Publications can now publish only specific columns.
This feature is similar to row filters, with the difference that it allows you to restrict publication to a subset of columns.
CREATE PUBLICATION mypub FOR TABLE mytab (c1, c2);The column list can be based on behavioral or performance reasons, and can contain only simple column references. A column list cannot be specified if the publication also publishes FOR TABLES IN SCHEMA (but this restriction may also be lifted in the future).
If the publication publishes UPDATE or DELETE operations, the column list must include the table's replica identity columns. If it publishes only INSERT operations, then the column list may omit replica identity columns.
If you would like to learn more, we have a blog post here.
- Logical replication can now be run as the subscription owner.
Previously, the replication had to be run as the superuser. Only superusers, roles with bypassrls, and table owners can replicate into tables where row-level security policies are in effect.
- Previously, conflicts could be resolved by manually removing the conflicting data or by skipping the transaction via pg_replication_origin_advance, and both methods can be quite tricky for the user.
With PostgreSQL 15, we tried to provide a simpler method by allowing the user to skip by specifying the LSN of the conflicting transaction.
ALTER SUBSCRIPTION mysub SKIP (lsn = 0/14C0378)The LSN of the failed transaction that the user needs to specify will be available in the server log.
You can read more about it in our blog post here.
- Users can set a parameter to automatically disable replication on conflict. This is useful for scenarios where a retry could not possibly succeed without human intervention - for example, in case of a primary key violation or another constraint violation that would not get resolved automatically.
- The new system view pg_stat_subscription_stats shows stats about errors that occurred during the application of logical replication changes or during initial table synchronization.
The MERGE command
The PostgreSQL community has been working on the MERGE feature for a few years, and PostgreSQL 15 introduces this command, which can handle INSERT, UPDATE, and DELETE actions - all in a single transaction.
The example below will insert, update, or delete from the target table, depending on whether the tables match and the specified conditions are met.
USING SourceProducts Source
ON Source.ProductID = Target.ProductID
WHEN NOT MATCHED AND Source.ProductId IS NOT NULL THEN
INSERT VALUES (Source.ProductID, Source.ProductName, Source.Price)
WHEN MATCHED AND Target.ProductName IN ('Table', 'Desk') THEN
UPDATE SET ProductName = Source.ProductName, Price = Source.Price
WHEN MATCHED THEN
DELETE;
Without MERGE, the example above would require a PL procedure or PL SQL function with various if-else conditions, which would also take more time to execute.
You must ensure that the join produces at most one candidate change row for each target row, or the error "MERGE command cannot affect row a second time" will occur. For each candidate change row, the first clause to evaluate as true will be executed.
One common use case is while trying to maintain Slowly Changing Dimensions (SCD) in a data warehouse. In such cases, one needs to:
- Insert new records into the data warehouse,
- Remove records from the warehouse that are not in the source anymore, and
- Update values in the warehouse that have been updated in the source.
Timothy Steward wrote a blog post about this command here.
Base backups
This is a feature that has existed for a long time in PostgreSQL, and is being evolved over releases. This has been improved in PostgreSQL 15 as well. The pg_basebackup command now allows you to specify targets for backups and supports various compression options.
The target backup location is specified using -t target or --target=target, where target can be client (default value, will send the backup to the machine running the command), server (will store the backup on the server), or blackhole (will discard the content, and should be used only for testing and debugging). Note that this option cannot be used with the default WAL streaming option –Xstream.
One of the common complaints about backups was that if they were large, they would take longer. So with this version, we introduced server-side compression and client-side compression. Backups can be compressed on the server side or the client side, with the compression options gzip, LZ4, and Zstandard (note that client-side compression with gzip was possible prior to PostgreSQL 15). These capabilities allow faster and smaller backups.
Sorting performance improvements
I would also like to highlight important performance improvements introduced to sorting, both in memory and on disk.
In-memory sorts
Not only has the performance of in-memory sorts been improved, but they have also been optimized to reduce memory consumption.
- The performance of single-column sorts has been improved by more than 25%. This applies only when the result contains a single column.
For example, this improvement will apply to SELECT col1 from mytab ORDER BY col1;, but not to SELECT col1, col2 from mytab ORDER BY col1;.
- Memory usage has been reduced by using generation memory context.
We were using a memory allocation scheme that rounded requests to the next power of 2. The performance improvement depends on tuple size, but up to ~40% improvement has been observed.
- Function call overhead was reduced by adding specialized sort routines for common datatypes.
The observed improvement is ~5%.
Sorts that exceed work_mem
The performance of sorts that exceed work_mem has also been improved by switching to a batch sorting algorithm that uses more output streams than before. The improvement depends on work_mem – the smaller the value, the greater the improvement, with ~40% improvement observer. One of my PostgreSQL Community colleagues has written a nice blog post here if you want to learn more about these performance improvements.
PostgreSQL 16 and beyond
To conclude, I would like to list the features that I've noticed being discussed in the PostgreSQL community and that might be included in PostgreSQL 16 or a later version. I want to note that there is no assurance that these features will actually be added. This is just a summary based on my observation of discussions in the PostgreSQL Community - neither I nor the community can guarantee that these features will be implemented.
- Various improvements in Logical Replication.
- Allow same table replication by filtering based on origins.
Currently, bidirectional replication of the same table leads to an infinite loop, but by allowing the same table replication by filtering based on origins, we will make this feature loop-free.
- Parallel Apply
- Replication of sequences
- Enable logical replication from standby
- DDL replication
- Time-delayed replication
- Use of indexes on the subscriber when the publisher has specified replica identity full
- Replication of other objects, such as LOBs
- …
- Allow same table replication by filtering based on origins.
- Reduced number of commands that need superuser privilege.
- SQL/JSON enhancements to increase standard compliance.
- Transparent column encryption - automatic, transparent encryption and decryption of particular columns in the client.
- Change the build infrastructure by replacing it with Meson build system (developer-oriented feature).
- Asynchronous I/O - will allow prefetching data and improve system performance.
- Direct I/O - will bypass the OS cache and lead to better performance in some cases.
- Various improvements in hash indexes - allow unique indexes and multi-column indexes.
- Improvements in vacuum technology by using performance data structures, advancing relfrozenxid earlier, and reducing WAL volume.
- Improvements in partitioning technology.
- Improve statistics/monitoring.
- 64bit XIDs - can avoid freezing and reduce the need for autovacuum.
- TDE - can help in meeting security compliance in many organizations.
- Incremental maintenance of materialized views.
For future releases, there are many more discussions going on, but these are the ones that caught my eye.
leverages and extends the strength and reliability of PostgreSQL with additional enterprise features.