
PostgreSQL added a highly anticipated feature that has been in discussion within the community for some time now: support for incremental backups in pg_basebackup.
This long-awaited enhancement marks a significant milestone in PostgreSQL's evolution, providing improved efficiency in data management and backup systems.
What is pg_basebackup?
pg_basebackup is a command-line utility used to take a base backup of a PostgreSQL server. A base backup is a consistent snapshot of the server's data directory, essential for disaster recovery and database cloning.
Until PostgreSQL 16, pg_basebackup only supported full backups. To achieve incremental backups, users had to rely on third-party tools like pgBackrest and Barman. With PostgreSQL 17, incremental backup support is natively integrated into the utility.
What is incremental backup?
Incremental backup captures only changes made to your database since the last backup. This is a major improvement over traditional full backups, which copy the entire database every time.
Incremental backups save storage space and reduce backup times, especially for large databases with frequent updates.
Pre-requsite configuration
In order to be able take incremental backups using pg_basebackup, the summarize_wal parameter must be set to on (default: off).
The summarize_wal parameter was introduced in PostgreSQL 17 and it enables the WAL summarizer process. This process tracks changes within the database, allowing pg_basebackup to identify and copy only the modified parts during incremental backups.
We need to take into consideration that WAL summarization can be enabled either on a primary or on a standby server. Additionally, WAL summarization cannot be enabled when wal_level is set to minimal. This parameter can only be set in the postgresql.conf file or on the server command line.
To enable WAL summarization, we need to execute the commands below:
postgres=# ALTER SYSTEM summarize_wal TO on;
postgres=# SELECT pg_reload_conf();
Let’s understand how this new enhancement is working: we need to keep in mind that before performing incremental backups, we must complete an initial full backup. This step is crucial, as incremental backups rely on the existence of a previous backup.
Performing a full backup
Execute the command below to perform a full backup:
[postgres@pgsqll7-beta bin]$ ./pg_basebackup -D /backups/full_bkp/ -P -v
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/B50000298 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot “pg_basebackup_21429"
1563005/1563005 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/B5000159
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed
[postgres@pgsqll7-beta bin]$
Verify the content of /backups/full_bkp to verify the full backup:
[postgres& pgsqll7-beta bin]$ ll -lrt /backups/full_bkp/
total 244
-rw-------. 1 postgres postgres 227 May 27 15:24 backup_label
drwx------. 4 postgres postgres 77 May 27 15:24 pg_wal
drwx------. 2 postgres postgres 6 May 27 15:24 pg_twophase
drwx------. 2 postgres postgres 6 May 27 15:24 pg_subtrans
drwx------. 2 postgres postgres 6 May 27 15:24 pg_snapshots
drwx------. 2 postgres postgres 6 May 27 15:24 pg_serial
drwx------. 2 postgres postgres 6 May 27 15:24 pg_notify
drwx------. 2 postgres postgres 6 May 27 15:24 pg_dynshmem
drwx------. 2 postgres postgres 6 May 27 15:24 pg_commit_ts
drwx------. 4 postgres postgres 36 May 27 15:24 pg_multixact
drwx------. 6 postgres postgres 46 May 27 15:24 base
drwx------. 2 postgres postgres 18 May 27 15:25 pg_xact
drwx------. 2 postgres postgres 6 May 27 15:25 pg_tblspe
drwx------. 2 postgres postgres 6 May 27 15:25 pg_stat_tmp
drwx------. 2 postgres postgres 6 May 27 15:25 pg_stat
drwx------. 2 postgres postgres 6 May 27 15:25 pg_replslot
-Yw-------. 1 postgres postgres 88 May 27 15:25 postgresql.auto.conf
-rw-------. 1 postgres postgres 3 May 27 15:25 PG_VERSION
drwx------. 4 postgres postgres 68 May 27 15:25 pg_logical
-Yw-------. 1 postgres postgres 2640 May 27 15:25 pg_ident.conf
-Yw-------. 1 postgres postgres S711 May 27 15:25 pg_hba.conf
-Yw-------. 1 postgres postgres 30704 May 27 15:25 postgresql.conf
drwx------. 2 postgres postgres 46 May 27 15:25 log
drwx------. 2 postgres postgres 4096 May 27 15:25 global
-Yw-------. 1 postgres postgres 44 May 27 15:25 current_logfiles
-Yw-------. 1 postgres postgres 183811 May 27 15:25 backup_manifest
[postgres@pgsqll7-beta bin]$
Now, before proceeding for incremental backup, let’s perform a few transactions.
[postgrestpgsqll/-beta bin]$ ./psql
psql (17beta1)
Type "help" for help.
postgres=# CREATE TABLE tl (id numeric);
CREATE TABLE
postgres=# INSERT INTO tl VALUES (generate_series(1,10000));
INSERT 0 10000
postgres=# \q
[postgres@pgsql17-beta bin]$
Performing an incremental backup
After completing the full backup, proceed with the incremental backups. Each incremental backup will only include changes made since the last backup.
To create the incremental backup, we need to pass the backup_mainfest file as an argument, which is created while taking the full backup.
[postgres@pgsql17-beta bin]$ ./pg_basebackup --incremental=/backups/full_bkp/backup_manifest -D /backups/incr_bkp/ -v -P
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/B7000029 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_22145"
5952/1563390 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/B7000159
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
Ppg_basebackup: base backup completed
[postgres@pgsqll7-beta bin]$
Restoring backups
Restoring backups should be straightforward and reliable. To simplify this process, PostgreSQL 17 introduces pg_combinebackup, a CLI tool that reconstructs the full backup from a series of incremental backups.
Create a directory for the restored data and use pg_combinebackup to combine the backups:
[root pgsql17-beta ~]# mkdir /test_restore
[root pgsql17-beta ~]# chown -R postgres:postgres /test_restore/
[root pgsql17-beta ~] #chmod 0700 /test_restore/
[postgres@pgsql17-beta bin]$ ./pg_combinebackup /backups/full_bkp/
/backups/incr_bkp/ -o /test_restore/
[postgres@pgsql17-beta bin]$
We need to make sure that, backups are provided in a sequence: starting with the full backup, followed by each incremental backup in order.
The -o/--output option specifies the output directory for the restored backup. To start the PostgreSQL server from the restored path, follow these steps:
- Change the port number in postgresql.conf of the new restored backup (test_restore)
- Start the PostgreSQL service as follows:
[postgrestpgsqlil/-beta bin] $ ./pg_ctl -bD /test_restore start
waiting for server to start....2024-05-27 15:40:59.709 +08 [22479]
LOG: redirecting log output to logging collector process
2024-05-27 15:40:59.709 +02 (22479) HINT: Future log output will appear in directory "log".
. done
server started
[postgres@pgsqll7-beta bin]$
After the successful start of the PostgreSQL service of the newly restored cluster, let’s verify as below:
[postgres@pgsqll7-beta bin]$ ps -ef | grep postgres
postgres 15542 1 oO 14:46 ? 00:00:00 /usr/pgsql-17/bin/postgres -D
/opt/pgsqll7
postgres 19543 19542 0 14:46 ? 00:00:00 postgres: logger
postgres 19546 19542 0 14:46 ? 00:00:00 postgres: checkpointer
postgres 19547 19542 0 14:46 ? 00:00:00 postgres: background writer
postgres 19552 19542 0 14:46 ? 00:00:00 postgres: walwriter
postgres 19553 19542 0 14:46 ? 00:00:00 postgres: autovacuum launcher
postgres 19554 19542 0 14:46 ? 00:00:00 postgres: logical replication launcher
postgres 22115 19542 0 15:32 ? 00:00:00 postgres: walsummarizer
root 22226 21277 0 15:36 pts/1 00:00:00 su -1 postgres
postgres 22227 22226 0 15:36 pts/1 00:00:00 -bash
postgres 22475 1 0 15:40 ? 00:00:00 /usr/pgsql-17/bin/postgres -D /test_restore
postgres 22480 22479 0 15:40 ? 00:00:00 postgres: logger
postgres 22481 22479 0 15:40 ? 00:00:00 postgres: checkpointer
postgres 22482 22479 0 15:40 ? 00:00:00 postgres: background writer
postgres 22484 22479 0 15:41 ? 00:00:00 postgres: walwriter
postgres 22485 22479 0 15:41 ? 00:00:00 postgres: walsummarizer
postgres 22486 22479 0 15:41 ? 00:00:00 postgres: autovacuum launcher
postgres 22487 22479 0 15:41 ? 00:00:00 postgres: logical replication launcher
postgres 22530 22227 0 15:42 pts/1 00:00:00 ps -ef
postgres 22531 22227 0 15:42 pts/1 00:00:00 grep --color=auto postgres
[postgres@pgsqll7-beta bin]$
Wrapping up
As we have seen, the new background process walsummarizer is now running for both instances, and that's it! You have successfully taken and restored incremental backups in PostgreSQL 17.
This new feature enhances backup efficiency and simplifies data recovery, making PostgreSQL an even more robust choice for your database needs.