True to tradition, this latest release of PostgreSQL comes with lots of improvements and new capabilities. So let me take you through some of them.
The evolution of PostgreSQL - versions and major milestones
Before diving into the exciting new features included in PostgreSQL 17, I would like to take a moment to look back at how PostgreSQL has evolved over the years, and how each release builds a foundation to the next one.
After being inherited from a university project, between version 6.0 to 9.6, numerous enterprise features were added which allowed PostgreSQL to be used for larger and more critical applications. Areas of improvement include streaming replication and synchronous replication for failover and High Availability, support for JSONB and JSON data types, realizing index only scans, and parallelism. All of these capabilities collectively have encouraged more users to adopt PostgreSQL, and the community has seen that PostgreSQL started to become the mainstream database around this time.
Gaining momentum, PostgreSQL continued its evolution from version 10 and has now reached a maturity to be recognized as an enterprise-class database worldwide. It is widely known that major companies such as AWS, Microsoft, and Google are using PostgreSQL and also providing PostgreSQL as the database layer for their solution and services. Declarative partitioning and logical replication were two big features in the version 10 release. SQL stored procedures in version 11 has helped facilitate the migration from other proprietary databases such as Oracle. I believe it is fair to say that one of the key features of version 12 is the introduction of table access methods, which allow pluggable storage engines to write storage needs specific to their needs in PostgreSQL. Note that partitioning enhancements were implemented between these three releases; version 11 introducing partitioning by hash keys and version 12 was released with performance improvements.
From there, continuous feedback from users and passionate engineers in the community have successfully pushed PostgreSQL to higher levels of usability and performance, achieving better support for large data volumes. Version 13 was released with parallel vacuum, de-duplication in B-tree index, and incremental sorting. Read performance was improved with version 14 with its snapshot scalability at very high client count, along with a mechanism to APPLY the updates for in-progress transactions to reduce the lag of large transactions. Other feature improvements are reducing BLOAT for B-tree index updates and parallel foreign scans were added in postgres_fdw.
In version 15, the MERGE command was added, various improvements to logical replication were applied, and statistics were moved from the backend to the shared memory which formed the basis of adding more number of statistics in the coming years. Row filtering and column filtering were added to logical replication, both with server-side compression.
In version 16, workload distribution between primary and standby for logical replication was enabled by allowing logical replication from standby to primary. Parallel apply was created to reduce apply lag for large transactions. Sequel JSON standards was improved and faster relation extension and load balancing by libpq was also developed.
All of the milestone achievements in feature enhancements led to this year’s release of PostgreSQL 17, the highlights of which are:
- Failover slots which allow logical replication to continue after failover
- Incremental backups for large backup use cases
- 1GB limit for vacuum clean up has been removed
- JSON table feature for better compliance with SQL JSON standards
PostgreSQL 17 enhancements and new features
Now let's take a look at some details of the features of version 17.
Improvements to storage
- Incremental backups
- Reduced backup data size - only the changed blocks are copied
- Combine a base backup and incremental backups to reconstruct a full backup using pg_combinebackup
- A new background worker named summarize_wal needs to be enabled to collect WAL
- Vacuum improvements
- Faster index cleanup and reduce its memory usage
This is achieved by changing the mechanism to remove dead tuples during vacuum to use TID store to collect dead tuples from the heap and use them to clean up the index - Reduced WAL sync and write time
The freezing and pruning steps in the vacuum process has been combined to produce a single WAL record - Reduced WAL volume
Vacuuming of relations with no indexes are optimized by reducing the steps by marking them LP_UNUSED directly
- Faster index cleanup and reduce its memory usage
- Faster reads by using streaming APIs
- Improved performance of subsystems on top of SLRU
SLRU cache sizes are made configurable and the subcache chunks are divided into units named banks so that the impact of eviction buffer search and LWLock is limited to a specific bank - Allow table Ams to skip fetching a block from the heap
- Improved execution time of queries that use the IN/ANY clause with a B-tree index by optimizing array matches in the B-tree index
- Improved performance of heavily contended WAL writes (client count of 256 and above)
Improvements to logical replication
- Failover for logical replication
Even if the primary/publisher node goes down, the subscription can continue by getting data from the physical standby without human interventionCREATE SUBSCRIPTION sub CONNECTION '$connstr'
PUBLICATION pub WITH (failover = 'true’) - Upgrade of logical replication nodes
Logical replication setups will be retained after major version upgrades where the older version is 17 - Faster logical decoding in cases with many subtransactions
- Allow hash index to be used for lookup during APPLY when primary key or replica identity are not available on the subscriber
- Easier replication setup for subscriptions
The new tool pg_createsubscriber can be used to copy all tables from any physical standby. My colleague Hayato Kuroda has published a blog explaining this feature here.
Regarding the future direction of logical replication to make it enterprise ready, it is my understanding that there are two main elements to work on: a concrete conflict detection resolution mechanism and some cluster-wide sequences. Other features to work on may also include DDL replication feature and node management APIs to be made available. This is an exciting space and I hope in the future releases, we will move towards this direction.
Improvements to SQL
- Partitioning
- Support identity columns in partitioned tables
- Allow exclusion constraints on partitioned tables
As long as the exclusion constraints compare partition key columns for equality, other columns can use exclusion constraint-specific comparisons
CREATE TABLE idxpart (a int4range, b int4range, c int4range,
EXCLUDE USING GIST (b with =, c with &&)) PARTITION BY RANGE (a);
ERROR: unique constraint on partitioned table must include all partitioning columns,
DETAIL: EXCLUDE constraint on table "idxpart" lacks column "a" which is part of the partition key.CREATE TABLE idxpart (a int4range, b int4range, c int4range,
EXCLUDE USING GIST (a with =, b with =, c with &&)) PARTITION BY RANGE (a, b);
- Use multiple workers to build BRIN indexes
- Queries that generate initPlans can use parallel workers to execute initPlan
EXPLAIN (COSTS OFF) SELECT c1 FROM t1 WHERE c1 = (SELECT 1);
QUERY PLAN
------------------------------------------
Gather
Workers Planned: 2
InitPlan 1
-> Result
-> Parallel Seq Scan on t1
Filter: (c1 = (InitPlan 1).col1 - Eliminated IS NOT NULL query restrictions on NOT NULL columns
CREATE TABLE pred_tab (a int NOT NULL, b int, c int NOT NULL);
EXPLAIN (COSTS OFF) SELECT * FROM pred_tab t WHERE t.a IS NOT NULL;
QUERY PLAN
------------------------
Seq Scan on pred_tab t - Eliminated scans on NOT NULL columns if IS NULL is specified
EXPLAIN (COSTS OFF) SELECT * FROM pred_tab t WHERE t.a IS NULL;
QUERY PLAN
--------------------------
Result
One-Time Filter: false - Allow correlated IN subqueries to be transformed into joins
EXPLAIN (costs off) SELECT * from tenk1 A WHERE hundred in
(select hundred from tenk2 B where B.odd = A.odd);
QUERY PLAN
------------------------------------------------------------
Hash Join
Hash Cond: ((a.odd = b.odd) AND (a.hundred = b.hundred))
-> Seq Scan on tenk1 a
-> Hash
-> HashAggregate
Group Key: b.odd, b.hundred
-> Seq Scan on tenk2 b - Improved CTE plans by considering the statistics and sort order of columns referenced in earlier row output clauses
- Allow pushdown of EXISTS and IN subqueries to the postgres_fdw foreign server
EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.c1 FROM ft1 t1 WHERE EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c1)
ORDER BY t1.c1 OFFSET 100 LIMIT 10;
Foreign Scan
Output: t1.c1
Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
Remote SQL: SELECT r1."C 1" FROM "S 1"."T 1" r1 WHERE EXISTS (SELECT NULL FROM "S 1"."T 1" r2
WHERE ((r2."C 1" = r1."C 1"))) ORDER BY r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 100::bigint - Allow joins with non-join qualifications to be pushed down to foreign servers and custom scans
- MERGE command feature
- MERGE command now supports RETURNING clause
MERGE INTO products p USING stock s ON p.product_id = s.product_id
WHEN MATCHED AND s.quantity > 0 THEN UPDATE SET in_stock = true, quantity = s.quantity
WHEN NOT MATCHED THEN INSERT (product_id, in_stock, quantity) VALUES (s.product_id, true, s.quantity)
RETURNING merge_action(), p.*;
Xmerge_action | product_id | in_stock | quantity
--------------+------------+----------+----------
UPDATE | 1001 | t | 50
INSERT | 1003 | t | 10 - MERGE command supports WHEN NOT MATCHED BY SOURCE
- MERGE command can modify updatable views
- MERGE command now supports RETURNING clause
- Introduced a trigger in the login event, allowing to fire some actions right on the user connection
- Speeded up the serial portion of parallel aggregates and better scales the following in parallel queries: sum(numeric)avg(numeric)var_pop(numeric)variance(numeric)stddev_pop(numericstddev_samp(numeric)stddev(numeric)array_agg(anyarray) string_agg(text)string_agg(bytea)
- Introduced built-in collation provider
Improvements to security/SQL
- Avoid the need to grant superuser privileges by using the pg_maintain role for VACUUM, ANALYZE, CLUSTER, REFRESH MATERIALIZED VIEW, REINDEX, and LOCK TABLE; alternatively, MAINTAIN privilege can be granted to users on a table
- Make TLS connections without a network round trip negotiation
- ALTER SYSTEM improvements
- Allow ALTER SYSTEM to set unrecognized custom server variables
- Add system variable allow_alter_system to disallow ALTER SYSTEM
- Useful in environments where configuration is managed by external tools
Improvements to SQL/JSON
- Introduced function JSON_TABLE () to convert JSON data to a table representation
CREATE TABLE my_films ( js jsonb );
INSERT INTO my_films VALUES (
'{ "favorites" : [
{ "kind" : "horror", "films" : [
{ "title" : "Psycho",
"director" : "Alfred Hitchcock" } ] }
] }');
SELECT jt.* FROM my_films,
JSON_TABLE (js, '$.favorites[*]'
COLUMNS (id FOR ORDINALITY,
kind text PATH '$.kind',
title text PATH '$.films[*].title’,
director text PATH '$.films[*].director')) AS jt;
id | kind | title | director
----+--------+--------+------------------
1 | horror | Psycho | Alfred Hitchcock - SQL/JSON constructor functions
- JSON() - Converts a given expression specified as text or bytea string (in UTF8 encoding) into a JSON value
JSON('{"a":123, "b":[true,"foo"], "a":"bar"}'){"a":123, "b":[true,"foo"], "a":"bar"}
- JSON_SCALAR() - Converts a given SQL scalar value into a JSON scalar value
JSON_SCALAR(123.45)123.45
- JSON_SERIALIZE() - Converts an SQL/JSON expression into a character or binary string
JSON_SERIALIZE('{ "a" : 1 }' RETURNING bytea)\x7b20226122203a2031207d20
- JSON() - Converts a given expression specified as text or bytea string (in UTF8 encoding) into a JSON value
- SQL/JSON query functions
- JSON_EXISTS() - Returns true if the SQL/JSON path_expression applied to the JSON value yields any items
SELECT JSON_EXISTS(jsonb '{"key1": [1,2,3]}', '$.key1[2]’);
------------------------
t - JSON_QUERY() - Returns the result (JSON, array, or string) of applying the SQL/JSON path_expression to the JSON value
SELECT JSON_QUERY(jsonb '{"a": "[1, 2]"}', '$.a’);
------------------------
[1, 2] - JSON_VALUE() - Returns the result (SQL/JSON scalar) of applying the SQL/JSON path_expression to the JSON value
SELECT JSON_VALUE(jsonb '[1,2]', '$[1]’);
------------------------
2
- JSON_EXISTS() - Returns true if the SQL/JSON path_expression applied to the JSON value yields any items
Improvements to monitoring
- New view pg_wait_events
-[ RECORD 1 ]---+---------------------------------------------
pid | 21090
state|
wait_event_type | Activity
wait_event | CheckpointerMain
description | Waiting in main loop of checkpointer process - All checkpointer-related stats could be found in pg_stat_checkpointer
- Index Vacuum progress in pg_stat_progress_vacuum
- Removed the parameter old_snapshot_threshold
Full list
For the PostgreSQL enthusiasts and the readers interested in the full list of new/enhanced features and other changes, you can view it here.
What's next: PostgreSQL 18 and beyond
Last but not least, I would like to close off this article by sharing some of the active discussions in the community. Please be reminded that none of these features are claimed to be provided in the next or future releases.
- Transparent column encryption - automatic, transparent encryption and decryption of particular columns in the client
- Asynchronous I/O - index prefetch to improve index performance, allow prefetching data to improve system performance, and vectored I/O for bulk writes
- Import/export statistics to help to run queries after upgrade without first running analyze
- Enhance table AM APIs to suit different storage engines
- Amcheck for Gist and Gin indexes
- Various improvements in Logical Replication - DDL Replication, replication of sequences, conflict detection and resolution, node management APIs, slot invalidation for unused slots, among others
- Executor improvements - special-case executor expression steps for common combinations (JIT generated code simpler), JIT compilation per plan node, SQL standard Row Pattern Recognition (RPR)
- Improvements in partitioning technology, especially in pruning when large number of partitions are present
- Improvements in Indexing especially in nbtree
- Optimizer improvements to make various kind of queries work better
- Introduce compression at wire_protocol_level
- Parallelism - parallelize vacuum on tables, parallel Create Index for GIN Indexes, parallelize correlated subqueries, TID range scan
- 64bit XIDs which allows to avoid freezing and reduce the need of autovacuum
- WAL Size reduction with smaller headers in WAL
- TOAST improvements such as custom formats and compression dictionaries
- Stats - split index and table statistics into different types of stats, more stats
- CI and build system improvements
Final thoughts
As we look ahead to the future of PostgreSQL, it’s clear that version 17 marks a significant milestone in its evolution, with its enhancements in performance, scalability, and security. The community continues to adapt PostgreSQL to the ever-changing demands of modern data management, and the commitment to innovation and excellence ensures that PostgreSQL will remain a cornerstone for developers and enterprises alike.
Whether you’re a long-time user or new to PostgreSQL, the release of version 17 offer exciting opportunities to leverage its capabilities for your projects.
As we embrace these new features, we also look forward to the continued growth and development of PostgreSQL, driven by a passionate and dedicated community. I very much look forward to continuing my participation and contributions to the PostgreSQL community.