OS commands and PostgreSQL commands used in life check monitoring and status monitoring for the server and PostgreSQL are shown below.
|Server OS response||ping||-|
|PostgreSQL process status||ps||pg_isready|
|SQL execution||-||Execute simple SQL such as "SELECT 1".|
|I/O busy rate||sar
|Network bandwidth usage||sar||-|
Next, we will look at how to execute these commands. The options used with each command are examples only. There are other useful options as well that can be used together, so consider what kind of options you need according to your monitoring policy. Use the return value of the command when checking with the shell script.
Use the ping command to test is a server is reachable. Use the -c option to specify the number of ping requests to attempts - for example, ping -c 1 will try to reach the server only once. To avoid false negatives caused by temporary network disconnections, make a few retries if the ping fails.
If the server is running, you will see the line 1 packets transmitted, 1 received, indicating that there was a response. The command returns 0 if it executed successfully, or another value if a problem occurred, such as no response was received.
Execute ping from another server such as the admin server.
Execute the operating system's ps command on the PostgreSQL database server to confirm that the specified PostgreSQL process is running. For PostgreSQL, monitoring the master process (postmaster) is recommended - there is no need to monitor Postgres child processes. One or more master processes may be temporarily running , so it is normal to see more than one master process listed.
You can also use PostgreSQL's utility pg_isready for life check monitoring. This is a simple command that checks the connection status of a PostgreSQL server, which allows us to know if PostgreSQL processes are running.
The return values are 0 if the server is accepting connections normally, 1 if the server is rejecting connections, 2 if there is no response to the connection attempt, or 3 if no attempt was made (due to invalid parameters, for example).
pg_isready can also be executed from a PostgreSQL client (separate machine).
Execute a simple SQL statement to check if SQL can be executed normally. For example, execute SELECT 1 with psql –c and check that the result is as expected.
The command returns 0 if successful, or another value otherwise.
psql can also be executed from a PostgreSQL client (separate machine).
Use the operating system's sar command on the PostgreSQL database server to check the CPU usage rate. The -u option shows cumulative usage of all CPUs, and -P ALL shows CPU usage of individual cores.
In the example below, using sar -u 1 60 will get you data 60 times every 1 second. You can check the CPU usage rate by referring to %user (CPU usage running applications at the user level), %nice (CPU usage running applications at the user level with nice priority), and %system (CPU usage running process at the system level).
If %user is increasing, it is possible that a specific application is consuming CPU, so check which application it is by using a command such as ps.
Use the operating system's free command on the PostgreSQL database server to check the memory usage rate. The -m option displays the values in MB, which is useful since it may be harder to read when displayed in bytes, and the -t option displays the total of physical memory and swap memory.
To calculate memory usage, use the values in columns total (total capacity) and available (memory estimated to be available without swapping).
memory usage = (total - available) ÷ total × 100
Use the operating system's iostat command on the PostgreSQL database server to check I/O data for each disk. By default, CPU statistics and disk statistics are displayed. To display more detailed statistical data, specify the -x option.
To check network bandwidth, use the operating system's sar command with the -n DEV option on the PostgreSQL database server.
The transmission line usage status can be checked in the following columns.
PostgreSQL outputs messages to the server log. Analyse the cause of an error from these messages and take appropriate actions.
Each message has a severity level to indicate the seriousness of the problem. We recommend attending to messages with PANIC, FATAL, and ERROR levels with priority, as they may affect the stable operation of your database. Messages with LOG level also need to be monitored for performance monitoring.
The log output destination can be changed to syslog by a setting in postgresql.conf, in which case the message level is converted to the levels listed under the syslog column below.
|PANIC||Information about an error that caused all database sessions to be interrupted||CRIT|
|FATAL||Information about an error that caused the current session to be interrupted||ERR|
|ERROR||Information about an error that caused the current command to be interrupted||WARNING|
|LOG||Information that administrators needs to know, such as checkpoint activities||INFO|
|WARNING||Warning information for users, like COMMIT outside the transaction block||NOTICE|
|NOTICE||Information to help the user, such as notes about truncating long identifiers||NOTICE|
|INFO||Information implicitly requested by the user, such as the output of VACUUM VERBOSE||INFO|
|Continuous, detailed information used by developers||DEBUG|
Below are example messages.
Connection refused (message for connection from outside pg_hba.conf)
Connection violating a rule (no table access privilege)
If an error occurs, logged messages will help you identify and analyze the cause. In addition to error messages, PostgreSQL also outputs error location and hints to the log. Use these to analyze and investigate the error.
|STATEMENT||The actual process that caused the error|
|LOCATION||The position in the code where the error occurred|
|HINT||Cause and workaround of the error that occurred|
|CONTEXT||Context in which the error occurred (functions, etc.)|
In order to monitor messages, configure the settings so that messages are output to the log file. The main postgresql.conf parameters required for message monitoring are below.
It is essential to perform capacity monitoring for database cluster areas (directories specified by the environment variable PGDATA), TABLESPACE areas, and areas for storing archived transaction logs (WAL: Write-Ahead Log). Insufficient capacity of these areas can lead to the issues below:
To avoid these issues, monitor disk usage and free space, and take the following actions before running out of space:
By monitoring the capacity, you can take preventative measures such as adding more disks before running out of space. For example, by coordinating with other tools, you can configure the system to raise an alert when free disk space falls below 20%. Remember to set this threshold to a value that allows sufficient time to handle it.
The following are the files in PostgreSQL that should be monitored.
|Monitored data||Target directory||Commands/functions|
|Directory specified for tablespace (TABLESPACE)||pg_tablespace_size|
|Temporary areas for disk sort, hash processing, etc.||$PGDATA/base/pgsql_tmp||-|
|WAL archive||Directory specified for WAL archive|
Key points during operation
Pay attention to free disk space when executing particular SQL commands, including below.
Now, we will look at examples using the du command and the pg_database_size function.
Use the operating system's du command to check disk usage. Specify the -L option to follow all symbolic links, and either the -k or -m option, to show the size in KB or MB, respectively. The example below displays the disk usage under $PGDATA in MB.
Use PostgreSQL's pg_database_size function to check disk space used by the database. The disk capacity used by the database is acquired from the system catalog pg_database, which stores database information.
When using the du command to check disk capacity used by databases, check the oid of the databases in advance to identify the target database.
In the example output above, we can see that the oid of testdb is 16384. Looking at the result of the du command listed before that, the disk usage of ./$PGDATA/base/16384 is 23, which matches the result 23 MB of pg_database_size below.
Regularly obtaining performance information will allow you to identify differences in a meaningful timeframe. The recommended frequency for this is at least once a day. For details on monitoring intervals, refer to "Overview of database system monitoring - what and how to monitor > Monitoring intervals and thresholds".
PostgreSQL's statistics view collects and accumulates various information related to operating conditions. You can monitor performance here.
|Metrics||Type of view/catalogue|
|Transactions that have been pending for a long time||pg_stat_activity|
|Queries with long lock wait time||pg_locks
|Number of simultaneous connections||pg_stat_activity|
|Throughput (commit/rollback count)||pg_stat_database|
|Replication delay (when replication setup is used)||pg_stat_replication
|Unused space (free space created by inserting or deleting data)||pg_stat_all_tables|
|Index dispersion (whether the order of indexes and the order of tables are the same)||pg_stats|
When statistics views are displayed, the default display format is landscape, which wraps lines when there are a large number of columns. For an easier view, it is possible to change to the vertical display. Enable the extended table format mode with the meta command \x. To revert the display setting, execute \x again.
Here, we take a look at examples of views with pg_stat_statements, pg_stat_activity, and pg_stat_database.
This view is useful to investigate slow queries. Columns of note are userid (OID of the user who executed the SQL), query (text of the SQL executed), calls (number of the the SQL was executed), and total_exec_time (total time spent executing the SQL, in ms).
Use this view to check if there are transactions pending for a long time or to check the number of simultaneous connections. Check xact_start (transaction start time) to see if there are any transactions pending for a long time. No value is output when no transaction is being executed at that time. If any value is shown from a while ago, it means that a transaction has been pending. To check the number of simultaneous connections, check the current number of rows in the view.
Other important data is also provided by the pg_stat_activity view, such as backend_start (time when the process started), query_start (time when the query started), wait_event_type (status of waiting), wait_event (type of event that the backend is waiting), state (current status of backend), query (last executed query), and backend_type (type of current backend). Look at these data comprehensively for best practice.
The values displayed in state and their status are listed below.
|active||Query is in progress.|
|idle||Waiting for a new client command.|
|idle in transaction||In a transaction block, but there are no queries currently executing.|
|idle in transaction (aborted)||Similar to idle in transaction, but there is a statement with an error in the transaction.|
|fastpath function call||The backend is executing a fast-path function.|
|disabled||This value is reported if the track_activities parameter is disabled in this backend.|
This view provides, among other information, xact_commit (number of transactions committed in the database) and xact_rollback (number of transactions rolled back in the database). Throughput can be calculated by periodically acquiring xact_commit and using the formula below:
((xact_commit read this time) - (xact_commit previously read)) ÷ interval between reads
Similarly, the number of rollbacks per unit of time can also be calculated.
Cache hit rate can be calculated by using blks_read (number of blocks read in this database) and blks_hit (number of times disk blocks were found already in the buffer cache, so that a read was not necessary). In memory tuning, the goal is to get this rate close to 100.
In this article, we explained what kind of data in the database system should be monitored, focusing on metrics and types of monitoring. By monitoring from various viewpoints according to the requirements of your database system, anomalies and their signs can be detected earlier.
Knowing the status of your database system when stable, monitoring to maintain that status, and taking optimal actions against abnormal conditions are key to a highly available system.
Fujitsu Enterprise Postgres is designed for developers, data architects, and DBAs seeking the efficiency of a leading open source based database technology with the enhanced security and performance features plus support required in a serious business environment.
Read our most recent articles regarding all aspects of PostgreSQL and Fujitsu Enterprise Postgres.