Continuous and active work is in progress in the community. Let me share my insider view on what more to look forward to in future releases of PostgreSQL, for version 14 and beyond.
In terms of PostgreSQL development, what I envisage for the next few years are improvements in high availability, scale out like horizontal scaling, parallelism, and logical replication. I discussed this in my previous post.
The next few years of PostgreSQL
In the PostgreSQL community, there is not a specific roadmap for development. However, when I analyze current trends and popular proposals in the community, I do see strong focus and activity in high availability, scale out, new storage engines, performance and scalability improvements.
Three areas that I would like to highlight in this post are: logical replication, parallelism, and scale-out.
We are working to enable decoding of in-progress transactions and prepared transactions in the logical replication framework.
This consists of two separate features:
- Decoding of in-progress transactions
This will reduce the apply lag on the subscriber, and will avoid many large transactions to perform I/O on the publisher side. It will result in overall better performance for logical replication of large transactions.
Using this infrastructure, we have also solved the problem of transactions having many DDLs, which used to take a long time and resulted in very high CPU usage. It has been observed that decoding a transaction that truncated a table with 1000 partitions would be finished in 1 second, whereas before it used to take 4-5 minutes.
- Decoding of prepared transactions
This will offer users an option to allow two-phase distributed commit in a logical replication setup, and this can be the basis for a conflict-free logical replication.
Having both these features in logical replication will open up many use cases of logical replication.
Each year we keep improving parallelism in PostgreSQL, and this year is no exception, especially regarding parallel writes, for which we have already done some infrastructure work in PostgreSQL 13.
There are identified cases where performance could be improved by dividing writes and running the operation in parallel. We are trying to improve in the areas of loading large data into the PostgreSQL database.
It would be also quite interesting to make other DML operations like UPDATE and DELETE to be parallelized.
In the community, work is underway for parallel foreign scan, two-phase commit, and bulk-load improvements for foreign tables.
Users using postgres_fdw will benefit from potential performance improvements when accessing data stored in external PostgreSQL servers. This will encourage PostgreSQL implementations at an expanded scale.
Other areas of enhancement
Transparent data encryption
All I can say at this stage is there are many organizations that are interested in this feature, and this has been actively worked on in the community. It is a very large-scale feature and might take multiple releases to finalize. As a result, it is difficult to estimate the completion date, but the PostgreSQL community is engaged on this and I would also love to see this feature in PostgreSQL as it will increase footprint and usage in areas that are not currently there.
Reliability and scalability
The community is working towards making improvements in high availability, which is critical in improving reliability, especially in the event of failover and outage. This is one of the key areas where PostgreSQL is looking to improve. With respect to scalability, we are working towards improving it for both single-node and multi-node systems, which will increase its adoption in large-scale applications.
AI and autonomous features
Because we don't have any specific roadmap, it is difficult to say that the community is going in this or that direction, or even if this is a high priority. However, as autonomous features such as self-provisioning, self-tuning and self-scaling become more prominent, especially after cloud providers started providing database services, different companies working on PostgreSQL might plan to work in this area. Consequently, this may lead to proposals in the community and I would welcome such a move.
Work is already being done in PostgreSQL on features like self-tuning. For example, we are continuously improving (auto) vacuum with every release, and the significant improvements in this area will help in self-tuning.
I am not aware of a definitive plan, but a lot of work is being done to improve the monitoring and statistics in PostgreSQL, which will help diagnose what is going on inside PostgreSQL.
Why is it slow? Why has it stopped? What is happening with performance? These tools, utilities, and features are being developed release by release. There is no explicit tool that I am referring to, but in general, work is being undertaken to improve the visibility of database statistics.
The current model of PostgreSQL
I think one of the aspects which distinguishes PostgreSQL is its license and copyright conditions that allow many organizations to easily adapt or enhance it. This works very well in terms of increasing the reach of PostgreSQL and making it a truly unique OSS database.
If you review database engines rankings and the number of different companies getting involved with PostgreSQL year on year, it clearly shows the popularity of PostgreSQL is increasing steadily. I definitely see this as a reflection of a very good open-source model.
I’m also impressed with the way the PostgreSQL community controls the changes that go into PostgreSQL. Each change is very, very thoroughly reviewed. The feature code is added to the repository only when a PostgreSQL Committer, one or more reviewers, and the original author of the code are satisfied with it. This rigorous check entails a lot of filtering, and results in only good features and high-quality code making it into each new release.
This model is embraced and further enhanced by some of the proprietary companies which want to adapt PostgreSQL or build features at a fast pace so they can go to market with a differentiated solution. Again, I believe this is a positive thing for PostgreSQL.
I look forward to continuing this communication on a regular basis.