I attended the PostgreSQL Development Conference 2024, which was held from in Vancouver, Canada. In this article, I will mainly introduce the content of the session I gave.
What is the PostgreSQL Development Conference (PGConf.dev)?
PGConf.dev is an international conference where PostgreSQL developers and community managers gather to give talks and hold discussions.
It stands out from general technical events by placing a strong emphasis on fostering meaningful interactions between PostgreSQL developers and community managers. Unlike conferences that are saturated with corporate advertising, PGConf.dev provides a platform where professionals can engage in insightful discussions, share innovative ideas, and collaborate on advancing the PostgreSQL ecosystem.
Until last year, a conference called PostgreSQL Conference (PGCon) was held, but PGConf.dev is its successor, continuing the tradition of bringing together experts and enthusiasts in the field. This year, it was held at Simon Fraser University (SFU) in Vancouver, Canada. The venue at SFU not only offered a conducive environment for learning and collaboration but also added a touch of academic charm to the conference setting.
Three people from Fujitsu - me, Amit Kapila, and Zhijie Hou - attended and hosted two talks.
My session: New features added to logical replication
My talk was mainly about logical replication.
Logical replication is a mechanism that extracts changes made to data and replicates them to another PostgreSQL server. A well-known replication feature of PostgreSQL is streaming replication, but this feature requires that the physical representation of data be consistent between nodes, so replication cannot be performed on heterogeneous operating systems or between different major versions of PostgreSQL. Logical replication relaxes these restrictions, making it possible to build a more flexible system.
Streaming replication | Logical replication | |
Instance terminology | Primary / standby | Publisher / subscriber |
Type of content sent | Exact WAL records | Replication messages, information extracted from WAL |
Initial synchronization | pg_basebackup | Automatic |
Replication target | DB cluster | Database |
What downstream can do | Read-only queries | Read and write queries |
Environments | OS and major versions must be the same | Can be different |
Starting with PostgreSQL 17, a new server application for creating logical standbys (subscribers) will be added, and pg_upgrade will be available without destroying logical replication configurations. I explained these new features of the upcoming version
Resolving the issues of setting up new logical replication in large-scale environments
Although logical replication is still being actively developed, it still has some problems.
One of them is that it is difficult to set up new logical replication in a large-scale environment. In logical replication, a COPY statement is first issued for all target tables to perform initial data synchronization. Therefore, initial data synchronization may take a long time depending on the number of tables and the amount of data involved.
In addition, since logical replication needs to keep the WAL generated during synchronization, if the synchronization time is too long, the WAL storage disk may become full and the server process may crash.
Known challenges in creating a subscriber
- Takes a long time
- Initial synchronization runs COPY command per table
- Estimated execution time is proportional to the number of tables
- Requires additional disk resources
- Replication slots are created while copying date
- Generated WAL files are preserved
- They may fill up disk, which is very problematic
Therefore, we focused on read replicas (asynchronous physical standbys) that may exist in the system, and developed pg_createsubscriber, a server application that converts physical standbys into subscribers.
Since the problem is caused by having to copy from scratch a large amount of data, the time required for initial synchronization can be reduced by performing streaming replication to a certain extent, and building logical replication based on the nodes that are following the changes. The problem of large amounts of remaining WAL is solved by not performing initial data synchronization using the COPY statement in the first place.
New server application pg_createsubscriber in PostgreSQL 17
- Converts physical standy into logical subscriber
- Confirms the standby is caught up at the correct point, then
- Defines subscriptions on the standby
- Done by introducing a new server application
- Must be executed on the standby server
$ pg_createsubscriber [option...] { -d | --database } dbname
{ -D | --pgdate } datadir
{ -P | --publisher-server } connstr
- Must be executed on the standby server
- Pushed on HEAD
Simplifying the upgrade process for logical replication clusters
PostgreSQL 17 also solves another issue that logical replication had: it is not (practically) compatible with pg_upgrade.
When building a logical replication environment, objects such as replication slots and replication origins are generated. These are necessary to record the replication status, such as the WAL transmission status and application status, but because they are node-specific information, they are not migrated by upgrades using pg_upgrade. Therefore, after upgrading a node that builds logical replication, users had to manually reconstruct these internal objects.
For this reason, we have improved pg_upgrade to reference and rebuild these internal objects. This function makes it possible to automatically resume replication even when upgrading a node that has logical replication.
Wrapping up
This was my second time attending an international conference, following last year, where I introduced my work on logical replication.
Because PGConf.dev focuses on interaction between developers, I was able to discuss solutions to logical replication issues with the participating developers, and I was happy to have insightful discussions with my fellow professionals there. I once again realized that PostgreSQL's scalability holds infinite possibilities.
Our team at Fujitsu hopes to continue to actively propose ideas and participate in discussions at conferences, further contributing to the development of PostgreSQL.
Further information
For details about the talks at PGConf.dev 2024, please see below.
- PGConf.dev official website (PGConf.dev 2024 official page)
- Online upgrade of replication clusters without downtime – Kuroda Lecture (PGConf.dev 2024 official page)
- PostgreSQL 17 and beyond – Amit Kapila (PGConf.dev 2024 official page)