
Before we start…
It is widely known that PostgreSQL is a popular open-source relational database management system, and before I share the details about the support for load balancing in libpq, I would just like to mention that PostgreSQL was voted DBMS of the Year 2023.
A huge contributing factor for this impressive achievement is surely the array of improvements added to PostgreSQL 16.
This version comes with a series of improvements such as query parallelism, bulk data loading, and logical replication, along with numerous features for both developers and administrators. These include expanded SQL/JSON syntax, new monitoring statistics for workloads, and increased flexibility in setting access control rules for management of policies across large fleets. You can view the full list in the PostgreSQL website here.
Introducing load balancing in libpq
Here, I'll discuss one of the most eagerly awaited features of this release: support for load balancing in libpq, which will facilitate PostgreSQL read query scaling.
Now, let's dive in….
What is libpq?
libpq is the official C application programming interface to PostgreSQL. It provides a set of library functions that allow client applications to communicate with a PostgreSQL server. Popular utilities like psql, pgbench, and many more use libpq to communicate with PostgreSQL.
Other application interfaces, such as C++, Perl, Python, TCL, and ECPG, are built on libpq and inherit some of its behaviour and environment configuration.
What is load balancing?
To put it simply, load balancing is the process of distributing the connections between multiple read replicas while handling a large number of read requests.
It is possible to achieve load balancing in two primary ways:
- At the client level by using the JDBC, which is a very popular Java API, to establish the connection and execute the query with the database.
- By using a middleware load balancer like HAproxy or pgpool-II, which distribute the requests across multiple read replicas.
To use libpq to distribute the load across multiple servers, a new variable called load_balance_hosts has been introduced in PostgreSQL 16. So, effective load balancing across multiple PostgreSQL instances becomes possible by this functionality.
This feature ensures that PostgreSQL deployments are more robust and efficient than ever before, whether it is through distributing read operations or ensuring that no single instance is overloaded. There are 2 possible values for load_balance_hosts, as follows:
- disable: No load balancing is performed across the multiple hosts.
- random: Hosts and addresses are tried in random order.
This value is mostly useful when opening multiple connections at the same time, possibly from different machines. This way connections can be load balanced across multiple PostgreSQL servers.
Putting it to the test
Let's test this functionality with 3 nodes of PostgreSQL 16 and with the default postgres database as below. This describes the current support model and soon we will release a new support model with different support options.
Host name | IP address | Postgres port | Database user | Role |
test_node1 | 192.168.10.107 | 27500 | test_user | Primary |
test_node2 | 192.168.10.102 | 27501 | test_user | Standby1 |
test_node3 | 192.168.10.100 | 27502 | test_user | Standby2 |
Test scenario1: load_balance_hosts=disable
Let's try to establish the connections with all three active hosts provided in the connection string.
[postgres@test-nodel ~]$ psql 'host=192.168.10.107, 192.168.10.102, 192.168.10.100
port=27500,27501,27502 dbname=postgres user=postgres'
-c "SELECT inet server addr(), inet server port();"
inet_server addr | inet_server port
------------------+------------------
192.168.10.107 | 27500
(1 row)
The result is quite as expected: PostgreSQL only made a successful connection to the first available destination - test_node1 (192.168.10.107:27500).
Now, let's stop the PostgreSQL service on test_node 1 (host 192.168.10.107) and observe the connection attempt behavior.
[postgres@test-nodel ~]$ pg_ctl -D /opt/pgsqll6 stop
waiting for server to shut down.... done
server stopped
[postgres@test-nodel ~]$ psql 'host=192.168.10.107,192.166.10.102, 192.168.10.100
port=27500,27501,27502 dbname=postgres user=postgres'
-c "SELECT inet_server_addr(), inet_server_port();"
inet_server addr | inet_server port
------------------+------------------
192.168.10.102 | 27501
(1 row)
Again, the result is quite as expected: PostgreSQL only made a successful connection to the next available destination - test_node2 (192.168.10.102:27501).
Finally, let's stop the PostgreSQL service on test_node2 (host 192.168.10.102) and try to connect again.
[postgres@test-node2 ~]$ pg_ctl -D /opt/pgsqll6 stop
waiting for server to shut down.... done
server stopped
[postgres@test-nodel ~]$ psql 'host=192.168.10.107,192.166.10.102, 192.168.10.100
port=27500,27501,27502 dbname=postgres user=postgres'
-c "SELECT inet_server_addr(), inet_server_port();"
inet_server addr | inet_server port
-------------------+-------------------
192.168.10.100 | 27502
(1 row)
We can see that we manage a successful connection with the only remaining host - test_node3 (192.168.10.100:27502).
Test scenario 2: load_balance_hosts=random
As we set load_balance_hosts to random, the connections are randomly assigned to the available destinations, resulting in connection distribution across the multiple available destinations.
[postgres@test-nodel ~]$ psql 'host=192.168.10.107,192.168.10.102,192.168.10.100
port=27500,27501,27502 dbname=postgres user=postgres
load_balance_hosts=random' -c "SELECT inet_server_addr(), inet_server port();"
inet_server_ addr | inet server port
-------------------+------------------
192.168.10.102 | 27501
(1 row)
[postgres@test-nodel ~]$ psql 'host=192.168.10.107,192.168.10.102,192.168.10.100
port=27500,27501,27502 dbname=postgres user=postgres
load_balance_hosts=random' -c "SELECT inet_server_addr(), inet_server port();"
inet_server_ addr | inet server port
-------------------+------------------
192.168.10.107 | 27500
(1 row)
[postgres@test-nodel ~]$ psql 'host=192.168.10.107,192.168.10.102,192.168.10.100
port=27500,27501,27502 dbname=postgres user=postgres
load_balance_hosts=random' -c "SELECT inet_server_addr(), inet_server port();"
inet_server_ addr | inet server port
-------------------+------------------
192.168.10.100 | 27502
(1 row)
If we stop the PostgreSQL service on test_node1 (host 192.168.10.107), then connections will be randomly assigned to the remaining available hosts - test_node2 (192.168.10.102) and test_node3 (192.168.10.100).
Summary
By distributing database connections across multiple servers, developers and database administrators will be able to reduce the load on individual servers particularly replicas and improve system availability and performance.