Lightweight Formatter for PL/SQL & SQL

blank

TL;DR

Bye bye, heavyweight formatter. Long live the lightweight formatter. Are you using Oracle’s SQL Developer or SQLcl? Then install these settings and press Ctrl-F7 to produce code that conforms to the Trivadis PL/SQL & SQL Coding Style. A compromise between conformity and individuality.

Heavyweight Formatter

A typical PL/SQL & SQL formatter replaces whitespace between lexical tokens by default with a single space. A whitespace consists of a series of spaces, tabs and line breaks. As a result, the original whitespace between the tokens is lost. The grammars for SQL*Plus, PL/SQL and SQL are huge. As a consequence, a single space is not the desired result in many cases. Therefore, a formatter comes with a large set of rules and options to override the default (single space between tokens).

A key feature of a heavyweight formatter is that it produces the same result regardless of how the original code was formatted. There is no room for individuality unless it is part of a rule and its configuration. This makes a complete formatter a heavyweight.

Here is a create view example. Once formatted with spaces between the tokens and once with line breaks between the tokens.

create view v as select empno , ename from emp ;

create 
view 
v 
as 
select 
empno
, 
ename 
from 
emp
;

Here are the formatter results for some popular integrated development environments for PL/SQL & SQL. I configured the tools with these settings based on the Trivadis PL/SQL & SQL Coding Style. For SQL Developer I loaded only the .xml file and used the default custom formatting rules (Arbori program).

create view v as
   select empno,
          ename
     from emp;
 
create view v as
   select empno,
          ename
     from emp;

select empno, ename
  from emp;
 
select empno, ename
  from emp;

create view v
as
   select empno, ename from emp;
 
create view v
as
   select empno, ename from emp;

All tools produced the same result for both variants of the statement. However, the result differs between the tools, although the configuration is based on the same code style. Why is that so? Let’s look at the code style to answer this question.

Trivadis Formatting Rules

The Trivadis PL/SQL & SQL Coding Guidelines contain the following rules for code formatting in the Code Style chapter:

  1. Keywords and names are written in lowercase
  2. 3 space indention.
  3. One command per line.
  4. Keywords loop, else, elsif, end if, when on a new line.
  5. Commas in front of separated elements.
  6. Call parameters aligned, operators aligned, values aligned.
  7. SQL keywords are right aligned within a SQL command.
  8. Within a program unit only line comments — are used.
  9. Brackets are used when needed or when helpful to clarify a construct.

When you go through the list, you find out that only rule 5 has been violated. However, this violation was intentional. I use trailing commas whenever I’m allowed to and therefore I changed the default. A privilege of the maintainer. You like leading commas? No problem. You can configure whatever you want in the preferences of SQL Developer.

blank

The point is, all rules are vaguely worded and leave a lot of room for interpretation. Furthermore, rules 1, 8 and 9 are about code style, but not about formatting of code. Code formatting should be exclusively about whitespace between tokens. Extending the scope can be dangerous and break the code, e.g. when using JSON dot notation, which uses case-sensitive identifiers.

These rules are a good starting point for a developer who knows PL/SQL & SQL. However, they leave a lot of freedom when configuring a formatter. And they are for sure not suitable as a specification for a formatter.

Lightweight Formatter

A lightweight formatter preserves whitespace between lexical tokens by default. Based on a set of rules and options, the whitespace between tokens is then fixed. This allows the lightweight formatter to produce a reasonable result with a small set of rules.

Let’s compare a minimalistic heavyweight and lightweight formatter. Both formatter do not implement any rules. The formatter just returns the default whitespace between tokens.

Here’s the formatter input based on an example from the SQL Language Reference:

Unformatted SQL statement
CREATE TABLESPACE auto_seg_ts DATAFILE 'file_2.dbf' SIZE 1M
   EXTENT MANAGEMENT LOCAL
   SEGMENT SPACE MANAGEMENT AUTO;

The minimalistic heavyweight formatter produces this result:

Result by a Minimalistic Heavyweight Formatter
create tablespace auto_seg_ts datafile 'file_2.dbf' size 1 M extent management local segment space management auto ;

And the minimalistic lightweight formatter produces this result:

Result by a Minimalistic Lightweight Formatter
create tablespace auto_seg_ts datafile 'file_2.dbf' size 1M
   extent management local
   segment space management auto;

Both formatter changed the case of the keywords and preserved the case of the identifiers. The heavyweight formatter placed a single space between each lexical token, while the lightweight formatter preserved the whitespace. The result of the lightweight formatter looks good because the input was formatted reasonably.

Advantages of a Lightweight Formatter

I’m pretty sure that there are no formal formatting rules for a create tablespace statement. As a developer, I write and read very seldom such statements. The formatting of this code is not that important to me. Both formatter outputs are acceptable, even if I like the second one better. However, when the create tablespace statement contains several file_specification clauses then some line breaks would indeed help to improve the readability.

For me, it is completely okay to preserve the original format for a lot of statements such as create tablespace, create database, create user, etc.

However, I’d like to format code within the following SQL statements:

  • create function
  • create package
  • create package body
  • create procedure
  • create trigger
  • create type
  • create type body
  • create view
  • delete
  • insert
  • merge
  • select
  • update

An advantage of a lightweight formatter is that you can implement it incrementally. It is similar to a linter with automatic correction capabilities. Ok, this is probably not that interesting from the user’s point of view.

Another advantage of a lightweight formatter is that you can support various code styles. For example, you can accept select empno, ename from emp; on a single line. But you can also accept optional line breaks before the from_clause or between elements in the select_list. All variants are compliant with the mentioned code style.

Disadvantages of a Lightweight Formatter

Simply put a lightweight formatter is a compromise between conformity and individuality. You cannot call the formatter to ensure conformity regardless of the input.

There are a lot of undefined areas where no rules exist. And the developer has in fact the freedom to choose a fitting formatting style in such cases. In my opinion, that’s fine. And I hope that most developers will love it.

State of the Trivadis PL/SQL & SQL Formatter Settings

Originally the Trivadis PL/SQL & SQL Formatter Settings were based on the heavyweight formatter provided by the SQL Developer team. 90% of the Arbori code was their code. I added and changed some Arbori code to amend the formatter result to match my expectations better.

However, I was never really happy with this approach. Why? Because I had to compare the original Arbori code with every new SQL Developer and SQLcl version. Identifying changes was easy. However, understanding the reason for a change was usually a challenge. Some changes conflicted with “my” code base. As a result, maintenance became more and more cumbersome.

SQL Developer 20.4.1 and SQLcl 21.1.0 are the first versions for which the lightweight formatter settings are available. At the same time, we stopped to provide settings based on the heavyweight formatter. If you need a heavyweight formatter to enforce conformity of your code then you have to rely on the code base provided by the SQL Developer team.

The main branch requires the latest versions of SQL Developer and SQLcl. This is currently version 21.2.0 for both products. In my opinion, the lightweight formatter produces reasonable code. It is much easier to identify which rule is responsible for a particular whitespace change due to the rule-based implementation and a unified token-based logging strategy. There are test cases for each rule and a first set of test cases for major grammar elements. At the moment there are more than 470 test cases for about 4000 lines of Arbori code. It’s not perfect, but I really think that the state of the formatting settings is much better than it ever was before.

If you find strange formatter results then please let us know by opening a GitHub issue. Thank you.

Really Lightweight?

In some areas, the formatter behaves like a heavyweight formatter without tolerance for individuality.

One reason is that strangely formatted input code should produce reasonably formatted code. See these JUnit test cases for some examples.

Another reason is that we wanted to apply the calculated indentation for relevant parts of the parse tree. This really helps while writing code. Nobody wants to count spaces. Pressing Ctrl-F7 to format the code from time to time is much easier. The calculation of the indentation is the most elaborate and extensive code in the current Arbori code base. As a result, some individuality is lost.

Don’t worry, there is enough individuality left. The following examples show different formatting results using the same formatter settings (Trivadis defaults, “Line Breaks On subqueries” unchecked).

blank

The reason for the different results is additional line breaks in the formatter input.

select e.empno, e.ename, e.job from emp e where e.deptno in (select d.deptno from dept d where d.loc in ('DALLAS', 'CHICAGO'));

select e.empno, e.ename, e.job
  from emp e
 where e.deptno in (select d.deptno from dept d where d.loc in ('DALLAS', 'CHICAGO'));

select e.empno, e.ename, e.job
  from emp e
 where e.deptno in (select d.deptno
                      from dept d
                     where d.loc in ('DALLAS', 'CHICAGO'));

select e.empno, e.ename, e.job
  from emp e
 where e.deptno in (
          select d.deptno
            from dept d
           where d.loc in ('DALLAS', 'CHICAGO'));

select e.empno,
       e.ename,
       e.job
  from emp e
 where e.deptno
       in
       (
          select d.deptno
            from dept d
           where d.loc
                 in
                 (
                    'DALLAS', 'CHICAGO'
                 )
       );

The “tokenized” result is based on an input where each token is placed in a separate line. It shows where line breaks are lost. For example, the second list entry 'CHICACO' cannot be on a separate line. Short expressions are kept on the same line. Short means less than 50% of the configured max. line width.

For create view, select, insert, update, delete, merge statements and PL/SQL code I consider the formatter a middleweight. For all other statements (e.g. create tablespace) it is really lightweight.

The formatter is also capable of indenting single-line and multi-line comments. This is something that SQL Developer’s default formatter cannot do yet.

I hope you like the mix of conformity and individuality.

2 Comments

  1. blank Michel Lessard says:

    Do you have a extension format code for vscode?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.