
One of the most persistent issues PostgreSQL administrators face is the gradual buildup of table and index bloat. This bloat occurs when frequent updates and deletes leave behind dead tuples that occupy space but are no longer useful. While PostgreSQL’s VACUUM command can clean up these dead tuples so that space is reusable, it does not actually shrink the size of the physical table file. Over time, this leads to wasted storage, slower sequential scans, inefficient index lookups, and degraded overall database performance.
Traditionally, the options for reclaiming physical space have been VACUUM FULL or CLUSTER. Both can shrink the table files and reorganize data, but they come with a major downside: they require an Access Exclusive Lock on the table, blocking all reads and writes until the operation completes. On large and busy production systems, such downtime is often unacceptable.
The pg_repack extension was created to address this challenge. It allows you to reorganize tables and rebuild indexes while keeping your database online. Unlike VACUUM FULL, pg_repack holds an access exclusive lock only for a very short time at the beginning and end of its process. During the bulk of its operation, normal queries, inserts, updates, and deletes can continue to run against the table. This makes pg_repack a safe and efficient tool for production workloads.
Requirements and preconditions
Before running pg_repack, there are a few important requirements that administrators must understand.

- First, the extension only supports PostgreSQL versions 9.5 and above. This is because pg_repack relies on catalog features and index build improvements introduced in 9.5. Earlier versions such as 9.4 and before cannot use it.
- Second, pg_repack requires free disk space equivalent to roughly twice the size of the target table and its indexes. This is because the extension creates a new copy of the table, builds fresh indexes on that copy, and maintains a log of ongoing changes while the repack is in progress. For example, if you want to reorganize a table that consumes 2 GB along with indexes of 500 MB, you should expect to need at least 5 GB of available disk space for the duration of the operation.
- Third, only superusers or the owners of the target tables are permitted to run pg_repack. This restriction exists because the extension makes low-level modifications to system catalogs and needs high privileges. Owners can bypass the superuser requirement by running with the --no-superuser-check option.
- Finally, the target table must have a PRIMARY KEY or at least a UNIQUE index on a NOT NULL column. This requirement exists because pg_repack must be able to uniquely identify rows in order to replay updates and deletes safely. Without a unique identifier, the extension cannot guarantee consistency when applying logged changes to the new copy of the table.
Installation and setup
Installing pg_repack is straightforward. The source code is distributed on the PGXN (PostgreSQL Extension Network). After downloading and extracting it, you can build it with standard PostgreSQL extension tools. On Linux or UNIX systems, the process typically looks like this:
$ make
$ sudo make install
This process uses PostgreSQL’s PGXS build framework to ensure the extension compiles correctly against your installed PostgreSQL version. You may need to install the PostgreSQL development packages (postgresql-devel) and ensure that the directory containing pg_config is available in your $PATH.
Once installed, you enable it in the target database by creating the extension:
This step creates the necessary support objects under the schema repack. If at any point you want to remove it, you can simply run:
When upgrading from an older version of pg_repack, you must first drop the old extension before installing the new one.
How pg_repack works internally
The most important part of understanding pg_repack is learning how it reorganizes data without blocking queries. The extension achieves this by combining catalog-level table swaps with a logging mechanism that captures ongoing changes.
Full table repack
When you repack an entire table, the process unfolds in several carefully designed steps.
- Log table creation
pg_repack begins by creating a temporary log table. This auxiliary table will store any row changes (inserts, updates, deletes) that occur during the reorganization.
- Trigger installation
Next, the extension installs a trigger on the original table. This trigger captures every data modification and writes it into the log table. By doing this, pg_repack ensures that no transactions are lost while the table is being copied.
- Copying the original data
The extension then creates a brand-new table and copies all existing rows from the old table into this new structure. Because it writes rows sequentially and without dead tuples, the new table is compact and defragmented.
- Rebuilding indexes
Once the data is copied, pg_repack recreates all of the indexes on the new table. These indexes are built fresh and are therefore smaller and faster than their bloated counterparts.
- Replaying logged changes
At this point, the extension replays the modifications captured in the log table, applying them to the new table so that it reflects the most up-to-date state.
- Catalog swap
The old table and its indexes are then swapped with the new copies by updating the PostgreSQL system catalogs (pg_class, pg_index). From the application’s perspective, the table name and OID remain the same, so no queries are disrupted.
- Dropping the old table
Finally, the old bloated table and its indexes are dropped, leaving only the new compact versions.
During this entire process, pg_repack holds an ACCESS EXCLUSIVE lock only briefly — during trigger creation at the beginning and during the catalog swap at the end. For the majority of the time, it holds only an ACCESS SHARE lock, which allows other sessions to continue reading and writing to the table. This is the key to why pg_repack is safe to run in production.
Index-only repack
Sometimes, the table itself is fine but the indexes have become bloated. In such cases, pg_repack can perform an index-only repack. The sequence is simpler but follows the same principles:

- The extension creates new versions of the indexes using the CREATE INDEX CONCURRENTLY command. This ensures that the new indexes are built without blocking writes.
- Once the new indexes are ready, it swaps them into the system catalogs in place of the old indexes.
- The bloated indexes are then dropped.
This process avoids rewriting the entire table and is therefore faster and less disk-intensive than a full repack.
Usage examples
Running pg_repack follows a command-line tool format similar to pg_dump or vacuumdb. Some common examples include:
- To reorganize all clustered tables in the database testrepack:
pg_repack testrepack
- To reorganize only a specific table:
pg_repack --table public.pgbench_branches testrepack
- To reorganize only all indexes of specific table:
pg_repack --table pgbench_branches --only-index testrepack
- To move a specific table into a different tablespace:
pg_repack -d testrepack --table public.pgbench_branches --tablespace test_tblsp
- To move only the indexes of a table to a separate tablespace:
pg_repack -d testrepack --table public.pgbench_ branches --only-indexes --tablespace test_tblsp_idx
- To speed up processing on very large tables, you can also specify parallel jobs:
pg_repack -j 4 --table public.pgbench_branches testrepack
Common options can be used
Option | Detailed description |
-a, --all | Runs pg_repack across all databases in the PostgreSQL cluster. Useful for cluster-wide maintenance when you want to reclaim space everywhere. Requires superuser access and can take significant time if databases are large. |
-t, --table=TABLE | Restricts repacking to a specific table. Used when only one table suffers from bloat or fragmentation, avoiding unnecessary processing of other objects. |
-c, --schema=SCHEMA | Repack all tables inside the specified schema. Convenient when a whole schema contains heavily updated or bloated tables. |
-s, --tablespace=TABLESPACE | Moves repacked tables into the specified tablespace. Useful for relocating data across storage devices (for example, shifting large tables to faster disks or balancing space usage). Only applies to tables unless combined with --moveidx. |
-S, --moveidx | Ensures that indexes are also moved into the specified tablespace when used with --tablespace. Without this option, only the table moves and indexes remain in their original location. |
-j, --jobs=NUM | Runs repacking with multiple parallel worker jobs for each table. This can significantly speed up processing of large tables with many indexes, but higher values may also increase I/O and CPU load. |
-I, --index=INDEX | Targets only the specified index for repacking or movement. Useful when an index is bloated or needs relocation, without touching the associated table or other indexes. |
-x, --only-indexes | Repack or move only the indexes of a table, leaving the table data itself unchanged. Helpful in cases of index fragmentation without table bloat. |
-Z, --no-analyze | Skips the automatic ANALYZE step that pg_repack normally performs at the end. This prevents immediate statistics updates, which may be desirable if you prefer to manage statistics collection separately. |
-k, --no-superuser-check | Skips the initial check that enforces running as a superuser. Intended for cases where a non-superuser role with equivalent privileges is used. Incorrect use may cause failures if required permissions are missing. |
How pg_repack does a full-table repack
(pg_repack --table public.pgbench_branches testrepack)
As mentioned above in section How pg_repack works internally > Full table repack, let’s understand more like below. Before that, let's find the OID for table public.pgbench_branches.
oid | relname
-------+------------------
16398 | pgbench_branches
(1 row)
testrepack=#
0 Pre-flight: advisory lock, sanity checks, and a very short exclusive lock
What happens
- Ensures only one repack per table is running (advisory lock).
- Grabs a brief ACCESS EXCLUSIVE lock to do trigger/index sanity checks safely.
- Figures out index definitions and tablespace choices.
Log entries
select repack.version(), repack.version_sql();
-- serialize per-database+table: (16185446, relid)
SELECT pg_try_advisory_lock(16185446, CAST(-2147483648 + 16398 AS integer));
BEGIN ISOLATION LEVEL READ COMMITTED;
SET LOCAL lock_timeout = 100;
LOCK TABLE public.pgbench_branches IN ACCESS EXCLUSIVE MODE; -- short hold
RESET lock_timeout;
-- check invalid/valid indexes & defs
SELECT pg_get_indexdef(indexrelid) FROM pg_index WHERE indrelid = 16398 AND NOT indisvalid;
SELECT indexrelid, repack.repack_indexdef(indexrelid, indrelid, NULL, FALSE)
FROM pg_index WHERE indrelid = 16398 AND indisvalid;
-- check for conflicting triggers
SELECT repack.conflicted_triggers(16398);
-- prepare per-index type info (btree, etc.)
SELECT repack.create_index_type(16405,16398);
Locks
- ACCESS EXCLUSIVE on public.pgbench_branches (very short, bounded by lock_timeout=100ms retries).
- Purpose: perform DDL-sensitive checks atomically before wiring the change-logging.
1 Log table is created
What happens
- A per-table change-log (WAL-like) table records every INSERT/UPDATE/DELETE while the copy runs.
- Autovacuum is disabled on the log to reduce interference/noise.
Log entries
SELECT repack.disable_autovacuum('repack.log_16398');
Locks
- Regular locks on internal repack.log_16398. No meaningful impact on the original table yet (that was covered by the short exclusive in step 0).
2 Change-capture trigger installed on the original table
What happens
- Row-level AFTER trigger writes each DML into the log table.
- ALWAYS ensures it fires regardless of session replication roles.
Log entries
AFTER INSERT OR DELETE OR UPDATE ON public.pgbench_branches
FOR EACH ROW EXECUTE PROCEDURE repack.repack_trigger('bid');
ALTER TABLE public.pgbench_branches ENABLE ALWAYS TRIGGER repack_trigger;
Locks
- The earlier short ACCESS EXCLUSIVE from step 0 ensures a clean window for creating/enabling the trigger. Once installed, normal traffic resumes.
3 Bulk copy to the new heap (the clean table)
What happens
- A brand-new heap is built and filled sequentially (no dead tuples → compact and defragmented).
- Done under an ACCESS SHARE lock on the original table so reads/writes can continue.
- Copy reads only the base table (ONLY avoids partitions/inheritance surprises).
Log entries
BEGIN ISOLATION LEVEL READ COMMITTED;
SELECT pg_backend_pid();
SELECT pid FROM pg_locks
WHERE locktype = 'relation' AND granted = false AND relation = 16398
AND mode = 'AccessExclusiveLock' AND pid <> pg_backend_pid();
COMMIT;
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT set_config('work_mem', current_setting('maintenance_work_mem'), true);
-- build a list of virtual xacts to reason about in-flight txns (internal hygiene)
SELECT coalesce(array_agg(l.virtualtransaction), '{}')
FROM pg_locks AS l
LEFT JOIN pg_stat_activity AS a ON l.pid = a.pid
LEFT JOIN pg_database AS d ON a.datid = d.oid
WHERE l.locktype = 'virtualxid'
AND l.pid NOT IN (pg_backend_pid(), 470912) -- exclude self/worker
AND (l.virtualxid, l.virtualtransaction) <> ('1/1','-1/0')
AND (a.application_name IS NULL OR a.application_name <> 'pg_repack')
AND a.query !~* E'^\\s*vacuum\\s+'
AND a.query !~ E'^autovacuum: '
AND ((d.datname IS NULL OR d.datname = current_database()) OR l.database = 0);
-- clear any stale rows if a previous run left residue
DELETE FROM repack.log_16398;
SAVEPOINT repack_sp1;
SET LOCAL lock_timeout = 100;
LOCK TABLE public.pgbench_branches IN ACCESS SHARE MODE; -- allows readers & writers
RESET lock_timeout;
-- create clean heap in the same tablespace (param #2 = 'pg_default')
SELECT repack.create_table(16398, 'pg_default');
-- bulk copy base data
INSERT INTO repack.table_16398
SELECT bid, bbalance, filler FROM ONLY public.pgbench_branches;
-- keep autovacuum off on the new heap during build
SELECT repack.disable_autovacuum('repack.table_16398');
COMMIT;
Locks
- ACCESS SHARE on public.pgbench_branches during the copy (longest-held lock).
This does not block normal inserts/updates/deletes; it only conflicts with table-rewriting DDL.
4 Rebuild indexes on the new heap
What happens
- Indexes are rebuilt fresh on repack.table_16398 (smaller, better-packed).
- Done off-line from the application’s POV (on the new heap).
Log entries
CREATE UNIQUE INDEX index_16405 ON repack.table_16398 USING btree (bid);
Locks
- Locks only the new table and new index objects (not user-visible yet). No impact on concurrent traffic.
5 Apply logged changes (catch-up)
What happens
- Replays batched change-log rows into the new heap so it matches current state.
- First pass: apply in batches (here, up to 1000 per batch).
- Final pass: after very short exclusive locks, drain everything (limit 0) so both heaps are identical at the moment of swap.
Log entries
SELECT repack.repack_apply(
'SELECT * FROM repack.log_16398 ORDER BY id LIMIT $1', -- fetch
'INSERT INTO repack.table_16398 VALUES ($1.*)', -- INSERT
'DELETE FROM repack.table_16398 WHERE (bid)=($1.bid)', -- DELETE
'UPDATE repack.table_16398 SET (bid,bbalance,filler)=($2.bid,$2.bbalance,$2.filler) WHERE (bid)=($1.bid)', -- UPDATE
'DELETE FROM repack.log_16398 WHERE id IN (', -- ACK
1000 -- batch size
);
-- short stop-the-world windows to guarantee a perfect cutover
SAVEPOINT repack_sp1;
SET LOCAL lock_timeout = 100;
LOCK TABLE public.pgbench_branches IN ACCESS EXCLUSIVE MODE; -- freeze writers briefly
RESET lock_timeout;
SAVEPOINT repack_sp1;
SET LOCAL lock_timeout = 100;
LOCK TABLE repack.table_16398 IN ACCESS EXCLUSIVE MODE; -- protect the target
RESET lock_timeout;
-- 2nd pass: drain everything (limit 0 = no cap)
SELECT repack.repack_apply(
'SELECT * FROM repack.log_16398 ORDER BY id LIMIT $1',
'INSERT INTO repack.table_16398 VALUES ($1.*)',
'DELETE FROM repack.table_16398 WHERE (bid)=($1.bid)',
'UPDATE repack.table_16398 SET (bid,bbalance,filler)=($2.bid,$2.bbalance,$2.filler) WHERE (bid)=($1.bid)',
'DELETE FROM repack.log_16398 WHERE id IN (',
0
);
Locks
- Long phase: no blocking locks (copy is under ACCESS SHARE on the original; catch-up itself doesn’t lock the original).
- Short windows: two ACCESS EXCLUSIVE locks (original and new heap) to quiesce writes and guarantee the log is fully drained before swap. Each is bounded by lock_timeout=100ms retries.
6 Catalog-level swap (name stays, storage changes)
What happens
- Atomically swaps the original table’s storage and indexes with the freshly built ones by updating system catalogs (pg_class, pg_index, dependencies).
From the app’s POV, OID and name of public.pgbench_branches remain the same; only the underlying relfilenode moves to the compact copy. - This is the second short ACCESS EXCLUSIVE on the original.
Log entries
COMMIT;
Locks
- ACCESS EXCLUSIVE on public.pgbench_branches (and briefly on new objects as needed) during the swap. Very short.
7 Cleanup, ANALYZE, release advisory lock
What happens
- Drops the old (now detached) heap and old indexes.
- Runs ANALYZE to refresh stats on the swapped-in table.
- Releases the table-scoped advisory lock.
Log entries
SAVEPOINT repack_sp1;
SET LOCAL lock_timeout = 100;
LOCK TABLE public.pgbench_branches IN ACCESS EXCLUSIVE MODE; -- short
RESET lock_timeout;
SELECT repack.repack_drop(16398, 4); -- drops old heap & index copies (4 objects here)
COMMIT;
BEGIN;
ANALYZE public.pgbench_branches;
COMMIT;
SELECT pg_advisory_unlock(16185446, CAST(-2147483648 + 16398 AS integer));
Locks
- Brief ACCESS EXCLUSIVE to drop the old copies safely and ensure no race with concurrent DDL.
- ANALYZE runs without heavy locking (it uses lightweight locks and short metadata locks).
How pg_repack does an index-only repack
(pg_repack --table pgbench_branches --only-index testrepack)
0 Pre-flight & discovery (very short, prep work)
What happens
- Sets search path, checks extension version.
- Confirms the target is a plain table (not partition root only etc.).
- Lists indexes of the table (valid/invalid), gathers the DDL template for rebuild.
- Takes a table-scoped advisory lock so only one repack can operate on this table.
Log entries
select repack.version(), repack.version_sql();
-- sanity: ensure not a partitioned table entry; confirm it's a relation 'r'
SELECT r FROM (VALUES ($1, 'r')) AS given_t(r,kind)
WHERE NOT EXISTS (SELECT FROM repack.tables WHERE relid=to_regclass(given_t.r))
AND NOT EXISTS (
SELECT FROM pg_catalog.pg_class c
WHERE c.oid=to_regclass(given_t.r) AND c.relkind = given_t.kind AND given_t.kind = 'p'
);
-- $1 = 'pgbench_branches'
-- discover indexes on the table
SELECT repack.oid2text(i.oid), idx.indexrelid, idx.indisvalid, idx.indrelid, $1::text, n.nspname
FROM pg_index idx
JOIN pg_class i ON i.oid = idx.indexrelid
JOIN pg_namespace n ON n.oid = i.relnamespace
WHERE idx.indrelid = $1::regclass
ORDER BY indisvalid DESC, i.relname, n.nspname;
-- $1 = 'pgbench_branches'
-- prevent concurrent repack on the same table (per-DB namespace 16185446)
SELECT pg_try_advisory_lock(16185446, CAST(-2147483648 + 16398 AS integer));
Locks
- Catalog reads only.
- Advisory lock (not a heavyweight table lock) to serialize repack operations for relid=16398.
1 Build new index concurrently (online rebuild; allows writes)
What happens
- pg_repack derives the index DDL to use for the rebuild.
- It then issues CREATE INDEX CONCURRENTLY (CIC), which:
- does not block normal reads/writes,
- takes a SHARE UPDATE EXCLUSIVE lock on the table internally (PostgreSQL behavior of CIC),
- performs multiple scans and a final validation phase.
Log entries
SELECT repack.repack_indexdef($1, $2, $3, true);
-- $1 = 16405 (old index OID: pgbench_branches_pkey)
-- $2 = 16398 (table OID: pgbench_branches)
-- $3 = NULL (tablespace override)
-- build the new index concurrently (name assigned for the transient build)
CREATE UNIQUE INDEX CONCURRENTLY index_16405
ON public.pgbench_branches USING btree (bid);
Locks (important)
- CREATE INDEX CONCURRENTLY acquires an internal SHARE UPDATE EXCLUSIVE lock on public.pgbench_branches during phases of the build.
This does not block SELECT/INSERT/UPDATE/DELETE, but it conflicts with operations that need stronger locks (e.g., VACUUM FULL, CLUSTER, REINDEX TABLE without CONCURRENTLY, table rewrite DDL, etc.).
2 Catalog swap of the index (very short, blocking window)
What happens
Once the new index is ready, pg_repack:
- Briefly takes an ACCESS EXCLUSIVE lock on the table to ensure no concurrent DDL
- Calls pack.repack_index_swap(<old_index_oid>), which atomically:
- Flips catalog pointers/flags so the newly built index becomes the live one,
- Handles renames/validity bits as needed,
- Commits immediately.
Log entries
SET LOCAL lock_timeout = 100;
LOCK TABLE pgbench_branches IN ACCESS EXCLUSIVE MODE; -- very short
RESET lock_timeout;
SELECT repack.repack_index_swap($1); -- swap old <-> new
-- $1 = 16405 (old/live index OID; swap installs the concurrently built replacement)
COMMIT;
Locks
- ACCESS EXCLUSIVE on public.pgbench_branches (table) very briefly (bounded by lock_timeout=100ms with retries).
This is the only truly blocking window in index-only repack and is usually milliseconds.
3 Drop the old index concurrently (non-blocking)
What happens
The detached/bloated index is removed with DROP INDEX CONCURRENTLY, which:
- Avoids blocking normal writes
- Requires a lighter lock on the table (again, PostgreSQL treats this as needing table presence but keeps it online)
Log entries
Locks
- DROP INDEX CONCURRENTLY avoids taking an exclusive lock on the table. It does not block reads/writes; it will wait on conflicting DDL if necessary.
Common pitfalls and errors
There are a few errors that users frequently encounter when using pg_repack.
One common error is:
This occurs when the target table does not have a primary key or unique index. The solution is to define one before running pg_repack.
Another error is:
This happens if two pg_repack sessions are targeting the same table simultaneously. Because of the risk of deadlock, the extension enforces a restriction that only one session can process a given table at a time.
Restrictions and limitations
Although powerful, pg_repack does have some limitations. It cannot reorganize temporary tables, because these are not persistent in system catalogs. It cannot cluster tables by GiST indexes, since these do not provide a straightforward physical ordering.
And while a repack is in progress, you cannot perform DDL operations such as altering the table structure, dropping indexes, or adding constraints. The only allowed operations are VACUUM and ANALYZE. These restrictions exist to prevent catalog corruption and ensure that the repack completes safely.
Conclusion
The pg_repack extension fills a critical gap in PostgreSQL administration by providing a way to reclaim space and improve performance without taking tables offline. By copying data into fresh structures while capturing and replaying concurrent modifications, it achieves the same end result as VACUUM FULL or CLUSTER, but without long blocking locks.
For database administrators who need to maintain performance in busy production environments, pg_repack is an essential tool. It should be combined with regular monitoring of table bloat, careful scheduling during off-peak hours, and proper disk space plan.