The purpose of SQL tuning is to improve the SQL performance. You analyze the internal processing of SQL and fix statements that do not meet the expected standard during SQL execution. Since the improvement is targeted to SQL statements that can have their performance improved, this work is done by testing the applications you have developed. Often you will repeat tuning and verification to ensure optimum performance.
The flow of SQL tuning is shown below.
We first need to understand how the SQL is executed on the database server. We will go through the basics in this section, describing the flow from the start of the execution of a SQL statement to the return of results.
When the SQL is executed from the client, the string is parsed to check if correct syntax is used. This is performed by the parser. Next, if there is a rule (to replace SQL) defined in the database, the rewriter rewrites the statements with another SQL according to that rule. The planner (optimizer) then determines how this parsed and rewritten SQL is to be executed by referring to statistics. The statistics include summary information about how much data is stored in which table. This information and the settings in postgresql.conf are used to create an optimal SQL execution plan." The executor executes the execution plan and returns the result.
The key factor which shortens the time it takes to execute SQL and obtain the return result depends on whether a good execution plan was created from the statistics. We will discuss this topic further.
PostgreSQL determines the optimal SQL execution plan (method) by referring to information such as the number of rows of tables/indexes, their size, the degree of duplication and frequency of data in each column, and the distribution status. The information for determining the optimal execution procedure (method) is called statistics, and is collected and recorded by PostgreSQL.
Gathering and refreshing statistics are done automatically during the automatic vacuum process (the autovacuum parameter in postgresql.conf is enabled by default since PostgreSQL 8.3).
Naturally, the content of the statistics will become outdated as the database is operated and data is updated. Stale statistics should not be used to determine the best way to execute SQL because the plan will likely be inadequate and inefficient to access current data. Therefore, whenever a large amount of data is loaded or updated by batch processing, use the ANALYZE command (SQL command) to refresh the statistics, instead of waiting for the automatic vacuum, since a large discrepancy between the statistics and the actual data is expected.
PostgreSQL determines the fastest and most cost-effective method based on the statistics, choosing from multiple patterns of search/join methods and different order combination of join, to minimize the time to execute SQL from the clients. This selected method is called an execution plan.
However, the execution plan created by PostgreSQL may not necessarily be the optimal plan. As mentioned earlier, the statistics may be out of date, or it could be due to unexpected loads/behaviors of applications and lack of various resources. Therefore, SQL tuning is a very important task in building a database. We will explain how to check whether the execution plan is suitable later in this article ("How to investigate the cause").
This section describes two ways to detect a slow SQL statement. Prior settings are required to use these methods so we will guide you through how to apply the settings.
The pg_stat_statements view stores runtime statistics of all SQL statements executed on the server. It stores information such as the OID (userid) of the user who executed the SQL statement, the executed SQL statement (query), the number of times the SQL statement was executed (calls), and the total time spent executing the SQL statement (total_exec_time, or total_time before PostgreSQL 13).
To use the pg_stat_statements view, it is necessary to install the additional module from a public site as follows:
Key point
Fujitsu Enterprise Postgres is shipped the module already pre-installed, so the step above is not necessary.
Also, it is necessary to add pg_stat_statements to the shared_preload_libraries parameter of postgresql.conf and set the following related parameters. After setting, restart the server and the pg_stat_statements view will be available.
Parameter | Description |
shared_preload_libraries | Specify pg_stat_statements |
pg_stat_statements.max | Maximum number of SQL statements that retain information |
pg_stat_statements.track | Specify one of the following:
|
pg_stat_statements.track_utility | Specify one of the following:
|
pg_stat_statements.save | Specify one of the following:
|
Example
When you examine the SQL with the longest average execution time using the pg_stat_statements view, the result will look something like below.
If you continue to record information about SQL statements with pg_stat_statements, performance will be affected. After tuning is completed, set pg_stat_statements.track=none (do not record) and restart the server.
If a SQL statement takes longer than the set time, the SQL statement itself and the time it took to execute it can be output to the server log as a message. By default, information about slow SQL statements is not output to the server log. To output the information, set the log_min_duration_statement parameter in postgresql.conf. SQL statements that took longer than the value set in this parameter will be output in a message with the execution time.
Parameter | Description and values to set |
log_min_duration_statement | Specify one of the following
|
For example, to output SQL that took more than 3 seconds to execute, set log_min_duration_statement= 3s.
Keep in mind that the shorter the time you set, the more messages will be output. A large amount of server log message output may cause the performance of the entire system to deteriorate, so be careful with the value you set.
Example
The following is a sample output to the server log.
In addition, by using the module auto_explain, it is also possible to output the execution plan of the SQL statement that took longer than the set value to the server log. For more information on auto_explain, refer to PostgreSQL documentation > Appendixes > Additional supplied modules > auto_explain.
Now that we have detected slow SQL statements, it's time to investigate their cause. You can display SQL execution plans by adding EXPLAIN command to the beginning of the detected SQL statement. Refer to this execution plan to identify the cause.
The EXPLAIN command shows you how referenced tables are scanned in a SQL statement and how the fetched rows are joined.
If you execute it only with EXPLAIN, the estimates for the cost, the number of rows, and the input size for executing the SQL statement are displayed. They are estimated values, since the SQL statement is not actually executed.
To display measured values in addition to the estimated values, you can add the ANALYZE option to the EXPLAIN command. With the actual value obtained by executing the SQL statement, you can check whether the estimated value is close to the measured value.
Since the SQL statement will be actually executed, when you execute INSERT statement or DELETE statement, please specify as follows so as not to affect the data.
This article has given an overview of SQL tuning and explained how to find slow SQL statements and investigate their causes. This information will help you identify what kind of tuning is required to the system to achieve an even more cost-effective database system.
Fujitsu Enterprise Postgres is designed for developers, data architects, and DBAs seeking the efficiency of a leading open source based database technology with the enhanced security and performance features plus support required in a serious business environment.
Read our most recent articles regarding all aspects of PostgreSQL and FUJITSU Enterprise Postgres.