Users who made the open source trip over to PostgreSQL have been requesting the merge feature for quite some time. And now we have it. Today we have data coming in with this same speed sitting in multiple tables and a need to manipulate it without causing a performance impact with our code.
The case for the MERGE command
In most cases, our tables are larger due to ETL jobs bringing in data from various sources. The merge feature offers the ability to code more efficiently. Imagine combining insert, update, or delete statements, eliminating the need to use multiple statements when accessing one of these single large tables.
Until now, coders would need to write separate statements to check if a row exists, then take separate actions depending on the result.
An example of how the new command streamlines and simplifies the process is given below, with just one statement being used to maintain a bank loan ledger table based on loan records.
Example
Table "public.bank_loan_ledger"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------------+---------+-----------+----------+---------+---------+--------------+--------------
bank_id | integer | | | | | |
bank_loan_amt | numeric | | | | | |
postgres=# \d+ loan_entry
Table "public.loan_entry"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-------------+---------+-----------+----------+---------+---------+--------------+--------------
bank_id | integer | | | | | |
cust_id | integer | | | | | |
entry_amt | numeric | | | | | |
postgres=# MERGE INTO bank_loan_ledger A
postgres-# USING loan_entry B
postgres-# ON B.bank_id = A.bank_id
postgres-# WHEN MATCHED THEN
postgres-# UPDATE SET bank_loan_amt = bank_loan_amt + entry_amt
postgres-# WHEN NOT MATCHED THEN
postgres-# INSERT (bank_id, bank_loan_amt) VALUES (B.bank_id, B.entry_amt);
Powerful syntax, support for complex use cases
We are talking about PostgreSQL, so there are several options available such as using it from PL/pgSQL to help building a business transaction that needs to process multiple steps.
The syntax of the command allows you to specify more than one MATCHED and NOT MATCHED conditions, extending your ability to manipulate the data as per your needs.
Each WHEN condition is evaluated in the same order as in the command specified.
Example
Table "public.product_stock"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------+---------+-----------+----------+---------+---------+--------------+--------------
prod_id | integer | | | | | |
qty_in_stock | integer | | | | | |
postgres=# \d+ stock_adjustment
Table "public.stock_adjustment"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
------------+---------+-----------+----------+---------+---------+--------------+--------------
prod_id | integer | | | | | |
adjust_qty | integer | | | | | |
postgres=# MERGE INTO product_stock A
postgres-# USING stock_adjustment B
postgres-# ON B.prod_id = A.prod_id
postgres-# WHEN MATCHED AND qty_in_stock + adjust_qty > 0 THEN A
postgres-# UPDATE SET qty_in_stock = qty_in_stock + adjust_qty
postgres-# WHEN MATCHED THEN B
postgres-# DELETE
postgres-# WHEN NOT MATCHED AND adjust_qty > 0 THEN C
postgres-# INSERT VALUES (B.prod_id, B.adjust_qty)
postgres-# WHEN NOT MATCHED THEN D
postgres-# DO NOTHING;
In the example above, for each candidate row, we have:
A If there is a matching row, and the result adjustment will result in stock amount greater than zero, then update matching row
B If there is a matching row, but the result adjustment will result in stock amount of zero or less, than delete the matching row
C If there is no matching row, and the stock adjust amount is greater than zero, then insert row into target table
D If there is no matching row, but the stock adjust amount is zero or negative, then do nothing.
Before we go
The maturity of PostgreSQL continues to impress, and the MERGE command is now another highlight to compete with proprietary databases, offering flexibility to handle large volumes of data under different conditions.