
Automating partition management in Fujitsu Enterprise Postgres
In Fujitsu Enterprise Postgres, partitioning large tables is a common approach to managing large datasets in databases, as it natively supports table partitioning, allowing users to split large tables into smaller, more manageable pieces, called partitions.
This feature is invaluable for optimizing query performance, as it can reduce the number of rows scanned by focusing only on relevant partitions. However, managing and creating new partitions manually can be time-consuming, especially when working with high-frequency data INSERTs.
To address this issue, you can use pg_partman, a PostgreSQL extension designed to simplify and automate this process.
What is pg_partman?
pg_partman (Partition Manager) is an open-source PostgreSQL extension developed to handle the automation of table partitioning based on time intervals or serial keys. By managing the creation, maintenance, and cleanup of partitions, pg_partman reduces administrative overhead and ensures partitioned tables are optimized for performance.
Key features of pg_parman include:
- Automatic creation of new partitions based on specified time intervals (e.g., daily, weekly, monthly) or integer series
- Cleanup and dropping of old partitions based on a retention policy
- Ability to pre-create partitions to minimize runtime overhead
- Support for native PostgreSQL partitioning as well as legacy inheritance-based partitioning
Why use pg_partman?
Using pg_partman offers several advantages:
- Ease of management
pg_partman simplifies the partition management process, automatically creating and dropping partitions as needed.
- Improved performance
Proper partitioning can significantly boost query performance by minimizing data scanned during SELECT operations.
- Reduced downtime
By pre-creating partitions, pg_partman reduces downtime caused by runtime partition creation.
- Retention policies
It allows you to automatically delete older partitions, which helps manage storage costs and keep the dataset lean.
Getting started with pg_partman
Installation
To install pg_partman, you'll need to have superuser privileges. Here are the general steps:
- Install the pg_partman extension package from git:
- Unzip the source code
- In the directory where you unzip the pg_partman, run
make install
- The background worker must be loaded on database start by adding the library to shared_preload_libraries in postgresql.conf. This will be useful for partition maintenance operations to run automatically.
shared_preload_libraries = 'pg_partman_bgw'
- You can set other settings for the background worker (BGW) in postgresql.conf.
As a minimum, you need to set dbname so maintenance can run on the specified database(s). You can add or change these settings anytime and just reload the configuration to apply them.
pg_partman_bgw.interval = 3600
pg_partman_bgw.role = fepuser
pg_partman_bgw.dbname = 'test_partition'Configuration options for the background worker
Parameter | Description |
pg_partman_bgw.interval | Number of seconds between calls to run_maintenance(). Default: 3600 (1 hour).
|
pg_partman_bgw.dbname | (Required) The database(s) that run_maintenance() will run on. If multiple, use a comma-separated list. If not set, BGW will do nothing. |
pg_partman_bgw.role |
The role that run_maintenance() will run as. Default: postgres. |
pg_partman_bgw.analyze |
(Optional) By default, whenever a new child table is created, an analyze is run on the parent table of the partition set to ensure constraint exclusion works. This analyze can be skipped by setting this to false and help increase the speed of moving large amounts of data. |
The run_maintenance() function is part of the pg_partman extension, and is used to perform maintenance tasks on partitioned tables, such as creating new partitions, dropping old partitions, and other necessary upkeep to ensure the partitioning scheme remains efficient and up-to-date.
- Add the extension to your database:
test_partition=# CREATE SCHEMA partman;
CREATE SCHEMA
test_partition=# CREATE EXTENSION pg_partman SCHEMA partman;
CREATE EXTENSION
test_partition=#
Once installed, pg_partman creates a set of functions and tables in the Fujitsu Enterprise Postgres instance that facilitate partition management.
Basic usage
To demonstrate pg_partman functionality, let’s assume you’re working with a table events that records temperature data with a timestamp.
- Create a partitioned table
CREATE SCHEMA data_mart;
CREATE TABLE data_mart.organization ( org_id BIGSERIAL,
org_name TEXT,
CONSTRAINT pk_organization PRIMARY KEY (org_id)
);
CREATE TABLE data_mart.events(
event_id BIGSERIAL,
operation CHAR(1),
value FLOAT(24),
parent_event_id BIGINT,
event_type VARCHAR(25),
org_id BIGSERIAL,
created_at timestamp,
CONSTRAINT pk_data_mart_event PRIMARY KEY (event_id, created_at),
CONSTRAINT ck_valid_operation CHECK (operation = 'C' OR operation = 'D'),
CONSTRAINT fk_orga_membership
FOREIGN KEY(org_id)
REFERENCES data_mart.organization (org_id),
CONSTRAINT fk_parent_event_id
FOREIGN KEY(parent_event_id, created_at)
REFERENCES data_mart.events (event_id,created_at)
) PARTITION BY RANGE (created_at);
CREATE INDEX idx_org_id ON data_mart.events(org_id);
CREATE INDEX idx_event_type ON data_mart.events(event_type); - Configure pg_partman
Configure the table for time-based partitioning. For example, if you want daily partitions:
SELECT partman.create_parent(
p_parent_table := 'data_mart.events',
p_control := 'created_at',
p_interval := '1 day',
p_premake := 10);
create_parent
---------------
t
(1 row)
Here’s a breakdown of the parameters:
- p_parent_table: The table to partition.
- p_control: The column to use for partitioning.
- p_interval: The partition interval. Options include: YEAR,MONTH,DAY,HOUR,MINUTE,SECOND,YEAR TO MONTH,DAY TO HOUR,DAY TO MINUTE,DAY TO SECOND,HOUR TO MINUTE,HOUR TO SECOND,MINUTE TO SECOND.
- p_premake: Additional partitions to stay ahead of the current partition.
Various options for partman.create_parent function
Argument | Data type | Description |
p_parent_table |
text |
Name of the parent table for partitioning. |
p_control |
text |
Column used to control the partitioning process. |
p_type |
text |
Type of partitioning to be used. |
p_interval |
text |
Interval for creating new partitions. |
p_template_table |
text |
Template table that defines the structure and settings for new partitions. |
p_premake |
integer |
Number of future partitions to create in advance. |
p_start_partition |
text |
Starting point for creating partitions. |
p_automatic_maintenance |
text |
Enables or disables automatic maintenance tasks for partitioned tables. |
p_epoch |
text |
Epoch for time-based partitioning. |
p_default_table |
boolean |
Default table where data that doesn’t fit into any of the existing partitions will be stored. |
p_constraint_cols |
text |
Additional columns that should have constraints applied to in each partition. |
p_jobmon |
boolean |
Integration with the pg_jobmon extension. When enabled, it allows pg_partman to log its maintenance jobs to pg_jobmon, providing better tracking and monitoring of these jobs. |
p_date_trunc_interval |
text |
Interval for truncating dates when creating partitions. |
p_control_not_null |
boolean |
Ensures the control column used for partitioning is set to NOT NULL. |
p_time_encoder |
text |
Custom function for encoding time-based partition names. |
p_time_decoder |
text |
Custom function for decoding partition names back into their corresponding time intervals. |
Inserting data
Once the partitioned table is set up, insert records as you would with any table. pg_partman will automatically route each record to the appropriate partition:
Viewing partitions
You can check the created partitions using a standard query on the pg_partman metadata tables:
test_partition=# SELECT * FROM partman.part_config WHERE parent_table = 'data_mart.events';
-[ RECORD 1 ]--------------+----------------------------------
parent_table | data_mart.events
control | created_at
time_encoder |
time_decoder |
partition_interval | 1 day
partition_type | range
premake | 10
automatic_maintenance | on
template_table | partman.template_data_mart_events
retention |
retention_schema |
retention_keep_index | t
retention_keep_table | t
epoch | none
constraint_cols |
optimize_constraint | 30
infinite_time_partitions | f
datetime_string | YYYYMMDD
jobmon | t
sub_partition_set_full | f
undo_in_progress | f
inherit_privileges | f
constraint_valid | t
ignore_default_data | t
date_trunc_interval |
maintenance_order |
retention_keep_publication | f
maintenance_last_run |
test_partition=#
Automating partition management
After creating the partitioned table, you’ll want pg_partman to manage partitions automatically. This includes creating new partitions and dropping old ones based on a retention policy.
- Schedule partition maintenance
Run the following function to manage new partition creation and old partition removal. You can automate this process with a cron job or a PostgreSQL job scheduler like pg_cron.
test_partition=# SELECT partman.run_maintenance();
run_maintenance
-----------------
(1 row)
test_partition=#Adding a cron job
Create partitions every day at midnight
0 0 * * * psql -U fepuser -d test_partition -c "SELECT partman.run_maintenance();"
- Set retention policy
If you want to keep only the last 30 days of data, you can set a retention policy:
test_partition=# UPDATE partman.part_config SET retention = '30 days' WHERE parent_table = 'data_mart.events';
UPDATE 1
test_partition=#
Run run_maintenance() after updating the retention setting to apply it immediately.
Conclusion
pg_partman is a powerful tool for automating and simplifying table partitioning. By using this extension, you can minimize administrative overhead, optimize table performance, and implement effective data retention policies. Whether working with time-based or serial key partitioning, pg_partman offers the flexibility and automation needed to scale partitioned tables for high-volume applications.
Setting up pg_partman might take some initial effort, but its benefits in terms of performance, scalability, and maintainability make it an invaluable addition to any Fujitsu Enterprise Postgres environment managing large datasets.