As a member of this team, I am actively working with talented and passionate community members around the world to advance PostgreSQL.
Our team will publish blog posts focusing on the features and patches that members have worked on. In this article, I will discuss about a new syntax for ECPG - DECLARE STATEMENT - which was committed for PostgreSQL14.
Embedded SQL program in C is a method to connect to databases, which has been specified in the SQL standard since 1989 (SQL 89). Various vendors, such as Oracle, Microsoft, and IBM, provide this functionality. PostgreSQL also provides this capability via the ECPG preprocess. The basic syntax and rules are specified in the SQL standard, but the degree of compliance depends on each vendor. Because vendors have also added their own embedded SQL syntax, the differences may cause a barrier in database product migration.
I proposed a new embedded SQL syntax - DECLARE STATEMENT. This feature is originally provided by Oracle. I believe that this syntax will make it easier to migrate from Oracle and other databases to PostgreSQL, and thus increase PostgreSQL adoption.
Functional overview – What is DECLARE STATEMENT?
DECLARE STATEMENT is an embedded SQL syntax that declares an identifier for a prepared query. This syntax is introduced to improve compatibility with Oracle's embedded SQL programs, but it also benefits original ECPG users. When cursors are used in Oracle’s embedded SQL programs, identifiers for a prepared query must be declared before they are used in the DECLARE CURSOR syntax. In other words, precompilation in the following example will fail.
EXEC SQL DECLARE my_cursor CURSOR FOR my_stmt; // precompilation error
EXEC SQL PREPARE my_stmt FROM "SELECT …";
So, how are cursors used in Oracle? One of the solutions is to use DECLARE STATEMENT. The syntax is used for declaring the identifier on the precompiler. When the precompiler reads the syntax, it understands that my_stmt is the identifier for a prepared query and can successfully precompile all embedded SQLs. Based on this solution, the modified embedded SQL example is as follows:
EXEC SQL DECLARE my_stmt STATEMENT;
EXEC SQL DECLARE my_cursor CURSOR FOR my_stmt;
EXEC SQL PREPARE my_stmt FROM "SELECT …";
DECLARE STATEMENT was not available in PostgreSQL13 or earlier, which means users running embedded SQL programs like in example 2 had to change their applications. From PostgreSQL14 onwards, the syntax can be utilized, so Oracle users can easily migrate to PostgreSQL. Note that ECPG allows implicit declarations of identifiers. Therefore, example 1 can be precompiled in the ECPG and DECLARE STATEMENT does not impact this behavior.
There is also another use for this syntax - DECLARE STATEMENT can be used for associating an identifier with the connection. When an embedded SQL statement with an identifier is executed, the associated connection is chosen instead of the current connection. To associate an identifier with a connection you will need to explicitly specify the connection by adding the AT clause for DECLARE STATEMENT. Let’s take a look into the following sample program:
EXEC SQL END DECLARE SECTION;
EXEC SQL CONNECT TO my_db1 AS my_conn1;
EXEC SQL CONNECT TO my_db2 AS my_conn2;
EXEC SQL SET CONNECTION TO my_conn2; 1
EXEC SQL AT my_conn1 DECLARE my_stmt STATEMENT; 2
EXEC SQL PREPARE my_stmt FROM "SELECT current_database()"; 3
EXEC SQL EXECUTE my_stmt INTO :my_dbname; 3
Now, which connection will be used for PREPARE and EXECUTE in 3?
At first glance, it would seem that my_conn2 would be used, because SET CONNECTION is set to my_conn2 in 1.
However, the identifier my_stmt is associated with my_conn1 in the DECLARE STATEMENT 2.
Therefore, my_conn1 will be chosen when running 3 and these statements will be executed on database my_db1.
What are the benefits of using DECLARE STATEMENT? The first benefit is the ease of migration. Decreasing incompatibilities between products allows users to avoid rewriting their applications for migration and spend 100% of their time on the services they want to provide.
The second benefit is that the source code can be shortened by reducing the number of AT <connection> clauses in your code. Without using DECLARE STATEMENT, connections need to be specified carefully in every PREPARE, EXECUTE, and cursor operations in order to establish multiple connections in a program. It not only makes your source code longer, but may also lead to bugs caused by choosing wrong connections. Using DECLARE STATEMENT to associate a connection with an identifier will prevent such connection-related bugs by fixing the appropriate connection to be used in subsequent statements.
Today, using drivers such as psqlODBC and PostgreSQL JDBC Driver to connect to databases from user applications is mainstream. ECPG is not so actively discussed in the PostgreSQL community, and Commitfest has a small number of ECPG related entries. However, past assets are never lost, and many Fujitsu customers are leveraging embedded SQL programs in C and COBOL. Based on our experience and feedback from the users, our team will continue to improve ECPG for enhanced usability.
Personally, I am very happy that my first assignment at Fujitsu has paid off - you can check my commit to the source code here . In addition to ECPG, I look forward to improving scale-out capabilities of PostgreSQL with sharding.