Quoted Identifiers #JoelKallmanDay

2022-10-11

Background and TL;DR

Connor McDonald wrote a blog post named Cleaner DDL than DBMS_METADATA. Back then he asked me if it would be possible to let the formatter remove unnecessary double quotes from quoted identifiers. Yes, of course. Actually, the current version of the PL/SQL & SQL Formatter Settings does exactly that. And no, you cannot do that with dbms_metadata in the Oracle Database versions 19c and 21c. Read on if you are interested in the details.

The Problem with DBMS_METADATA …

When you execute a DDL against the Oracle Database the database stores some metadata in the data dictionary. The DDL statement itself is not stored, at least not completely. This is one reason why it is a good idea to store DDL statements in files and manage these files within a version control system. Nevertheless, the Oracle Database provides an API – dbms_metadata – to reconstruct a DDL based on the information available in the data dictionary.

Let’s create a view based on the famous dept and emp tables:

1) Initial DDL

create or replace view deptsal as
   select d.deptno,
          d.dname,
          nvl(sum(e.sal), 0) as sum_sal,
          nvl(count(e.empno), 0) as num_emps,
          nvl(round(avg(e.sal), 2), 0) as avg_sal
     from dept d
     left join emp e
       on e.deptno = d.deptno
    group by d.deptno, d.dname;

and retrieve the DDL like this:

2) Get DDL

select dbms_metadata.get_ddl('VIEW', 'DEPTSAL', user) from dual;

to produce this DDL:

3) DDL by dbms_metadata

CREATE OR REPLACE FORCE EDITIONABLE VIEW "REDSTACK"."DEPTSAL" ("DEPTNO", "DNAME", "SUM_SAL", "NUM_EMPS", "AVG_SAL") DEFAULT COLLATION "USING_NLS_COMP"  AS 
  select d.deptno,
          d.dname,
          nvl(sum(e.sal), 0) as sum_sal,
          nvl(count(e.empno), 0) as num_emps,
          nvl(round(avg(e.sal), 2), 0) as avg_sal
     from dept d
     left join emp e
       on e.deptno = d.deptno
    group by d.deptno, d.dname

We see that the subquery part of the view has been preserved, except the first line which has a different indentation. Actually, the indentation of the first line is not stored in the data dictionary (see user_views.text). The two spaces are produced by the default pretty option of dbms_metadata. So far, so good.

In many cases, the Oracle Data Dictionary explicitly stores default values. For instance, Y for editionable or USING_NLS_COMP for default_collation. This fact alone makes it impossible to reconstruct the original DDL reliably. The database simply does not know whether an optional clause such as editionable or default collation has been specified or omitted. Moreover, some optional DDL clauses such as or replace or force are simply not represented in the data dictionary.

… Especially with Quoted Identifiers

And last but not least, identifiers such as columns names, table names or view names are stored without double quotes. Therefore, the database knows nothing about the use of double quotes in the original DDL. However, the database knows exactly when double quotes are required. As a result, dbms_metadata could emit only necessary double quotes. This would result in a more readable DDL and would probably also be more similar to the original DDL.

The reality is that code generators such as dbms_metadata often use double quotes for all identifiers. It’s simply easier for them, because this way the generated code works for all kind of strange identifiers.

However, using quoted identifiers is a bad practice. It is, in fact, a very bad practice when they are used unnecessarily.

Shaping the DDL

So what can we do? We can configure dbms_metadata to produce a DDL which is more similar to our original one. In this case we can change the following:

remove the schema of the view (owner)
remove the force keyword
remove the default collation clause
add the missing SQL terminator (;)

This query

4) Get customized DDL

with
   function view_ddl(in_name in varchar2) return clob is
      l_main_handle   integer;
      l_modify_handle integer;
      l_ddl_handle    integer;
      l_ddl           clob;
   begin
      -- initialize dbms_metadata for view based on current schema
      l_main_handle   := sys.dbms_metadata.open('VIEW');
      sys.dbms_metadata.set_filter(l_main_handle, 'SCHEMA', user);
      sys.dbms_metadata.set_filter(l_main_handle, 'NAME', in_name);
      -- remove schema name from input structure
      l_modify_handle := sys.dbms_metadata.add_transform(l_main_handle, 'MODIFY');
      sys.dbms_metadata.set_remap_param(l_modify_handle, 'REMAP_SCHEMA', user, null);
      -- non-default transformations to improve DDL
      l_ddl_handle    := sys.dbms_metadata.add_transform(l_main_handle, 'DDL');
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'FORCE', false);
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'COLLATION_CLAUSE', 'NO_NLS');
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'SQLTERMINATOR', true);
      -- get DDL
      l_ddl           := sys.dbms_metadata.fetch_clob(l_main_handle);
      -- free sys.dbms_metadata resources
      sys.dbms_metadata.close(l_main_handle);
      -- return result
      return l_ddl;
   end view_ddl;
select view_ddl('DEPTSAL')
  from dual
/

produces this result:

5) Customized DDL via dbms_metadata

CREATE OR REPLACE EDITIONABLE VIEW "DEPTSAL" ("DEPTNO", "DNAME", "SUM_SAL", "NUM_EMPS", "AVG_SAL") AS 
  select d.deptno,
          d.dname,
          nvl(sum(e.sal), 0) as sum_sal,
          nvl(count(e.empno), 0) as num_emps,
          nvl(round(avg(e.sal), 2), 0) as avg_sal
     from dept d
     left join emp e
       on e.deptno = d.deptno
   group by d.deptno, d.dname;

This looks better. However, I would like to configure dbms_metadata to omit the default editionable clause. Furthermore, I do not like the column alias list, which is unnecessary in this case. And of course, I’d like to suppress unnecessary double quotes around identifiers. Is that possible with dbms_metadata?

Shaping the DDL from (S)XML

Well, we can try. The dbms_metadata API is very extensive. Besides other things, it can also represent metadata as an XML document. There are two formats.

XML – An extensive XML containing internals such as object number, owner number, creation date, etc.
SXML – A simple and terse XML that contains everything you need to produce a DDL. The SXML format is therefore very well suited for schema comparison.

It’s possible to produce a DDL from both formats. We can also change the XML beforehand.

Let’s look at both variants in the next two subchapters.

Important: I consider the changes to the XML document and configuration of dbms_metadata in the following subchapters as experimental. The purpose is to show what is doable. They are not good examples of how it should be done. Even though the unnecessary list of column aliases annoys me, I would leave them as they are. I also think that overriding the default VERSION is a very bad idea in the long run.

Convert XML to DDL

6) Get customized DDL from XML

with
   function view_ddl(in_name in varchar2) return clob is
      l_xml         xmltype;
      l_main_handle integer;
      l_ddl_handle  integer;
      l_ddl         clob;
   begin
      -- create XML document and remove unwanted nodes
      l_xml         := xmltype(sys.dbms_metadata.get_xml('VIEW', in_name, user));
      l_xml         := l_xml.deletexml('/ROWSET/ROW/VIEW_T/SCHEMA_OBJ/OWNER_NAME');
      l_xml         := l_xml.deletexml('/ROWSET/ROW/VIEW_T/COL_LIST');
      -- initialize dbms_metadata for view based on XML input
      l_main_handle := sys.dbms_metadata.openw('VIEW');
      -- non-default transformations to improve DDL
      l_ddl_handle  := sys.dbms_metadata.add_transform(l_main_handle, 'DDL');
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'FORCE', false);
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'COLLATION_CLAUSE', 'NO_NLS');
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'SQLTERMINATOR', true);
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'VERSION', 1120000000);
      -- get DDL
      sys.dbms_lob.createtemporary(l_ddl, false, sys.dbms_lob.session);
      sys.dbms_metadata.convert(l_main_handle, l_xml, l_ddl);
      -- free dbms_metadata resources
      sys.dbms_metadata.close(l_main_handle);
      -- return result
      return l_ddl;
   end view_ddl;
select xmlserialize(document xmltype(sys.dbms_metadata.get_xml('VIEW', 'DEPTSAL', user))
          as clob indent size = 4)
  from dual
union all
select view_ddl('DEPTSAL')
  from dual
/

The query produces the following two rows (CLOBs):

7) XML input

<?xml version="1.0"?>
<ROWSET>
    <ROW>
        <VIEW_T>
            <VERS_MAJOR>1</VERS_MAJOR>
            <VERS_MINOR>4 </VERS_MINOR>
            <OBJ_NUM>232322</OBJ_NUM>
            <SCHEMA_OBJ>
                <OBJ_NUM>232322</OBJ_NUM>
                <OWNER_NUM>501</OWNER_NUM>
                <OWNER_NAME>REDSTACK</OWNER_NAME>
                <NAME>DEPTSAL</NAME>
                <NAMESPACE>1</NAMESPACE>
                <TYPE_NUM>4</TYPE_NUM>
                <TYPE_NAME>VIEW</TYPE_NAME>
                <CTIME>2022-10-04 12:11:25</CTIME>
                <MTIME>2022-10-04 12:11:25</MTIME>
                <STIME>2022-10-04 12:11:25</STIME>
                <STATUS>1</STATUS>
                <FLAGS>0</FLAGS>
                <FLAGS2>0</FLAGS2>
                <SPARE1>6</SPARE1>
                <SPARE2>65535</SPARE2>
                <SPARE3>501</SPARE3>
                <OWNER_NAME2>REDSTACK</OWNER_NAME2>
                <SIGNATURE>76DCDE35671FAA6AF576D6A6B4D97D48</SIGNATURE>
                <SPARE7>134233583</SPARE7>
                <SPARE8>0</SPARE8>
                <SPARE9>0</SPARE9>
                <DFLCOLLNAME>USING_NLS_COMP</DFLCOLLNAME>
            </SCHEMA_OBJ>
            <AUDIT_VAL>--------------------------------------</AUDIT_VAL>
            <COLS>5</COLS>
            <INTCOLS>5</INTCOLS>
            <PROPERTY>0</PROPERTY>
            <PROPERTY2>0</PROPERTY2>
            <FLAGS>0</FLAGS>
            <TEXTLENGTH>270</TEXTLENGTH>
            <TEXT>select d.deptno,
          d.dname,
          nvl(sum(e.sal), 0) as sum_sal,
          nvl(count(e.empno), 0) as num_emps,
          nvl(round(avg(e.sal), 2), 0) as avg_sal
     from dept d
     left join emp e
       on e.deptno = d.deptno
   group by d.deptno, d.dname</TEXT>
            <COL_LIST>
                <COL_LIST_ITEM>
                    <OBJ_NUM>232322</OBJ_NUM>
                    <COL_NUM>1</COL_NUM>
                    <INTCOL_NUM>1</INTCOL_NUM>
                    <SEGCOL_NUM>1</SEGCOL_NUM>
                    <PROPERTY>0</PROPERTY>
                    <PROPERTY2>0</PROPERTY2>
                    <NAME>DEPTNO</NAME>
                    <TYPE_NUM>2</TYPE_NUM>
                </COL_LIST_ITEM>
                <COL_LIST_ITEM>
                    <OBJ_NUM>232322</OBJ_NUM>
                    <COL_NUM>2</COL_NUM>
                    <INTCOL_NUM>2</INTCOL_NUM>
                    <SEGCOL_NUM>2</SEGCOL_NUM>
                    <PROPERTY>0</PROPERTY>
                    <PROPERTY2>0</PROPERTY2>
                    <NAME>DNAME</NAME>
                    <TYPE_NUM>1</TYPE_NUM>
                </COL_LIST_ITEM>
                <COL_LIST_ITEM>
                    <OBJ_NUM>232322</OBJ_NUM>
                    <COL_NUM>3</COL_NUM>
                    <INTCOL_NUM>3</INTCOL_NUM>
                    <SEGCOL_NUM>3</SEGCOL_NUM>
                    <PROPERTY>14336</PROPERTY>
                    <PROPERTY2>0</PROPERTY2>
                    <NAME>SUM_SAL</NAME>
                    <TYPE_NUM>2</TYPE_NUM>
                </COL_LIST_ITEM>
                <COL_LIST_ITEM>
                    <OBJ_NUM>232322</OBJ_NUM>
                    <COL_NUM>4</COL_NUM>
                    <INTCOL_NUM>4</INTCOL_NUM>
                    <SEGCOL_NUM>4</SEGCOL_NUM>
                    <PROPERTY>14336</PROPERTY>
                    <PROPERTY2>0</PROPERTY2>
                    <NAME>NUM_EMPS</NAME>
                    <TYPE_NUM>2</TYPE_NUM>
                </COL_LIST_ITEM>
                <COL_LIST_ITEM>
                    <OBJ_NUM>232322</OBJ_NUM>
                    <COL_NUM>5</COL_NUM>
                    <INTCOL_NUM>5</INTCOL_NUM>
                    <SEGCOL_NUM>5</SEGCOL_NUM>
                    <PROPERTY>14336</PROPERTY>
                    <PROPERTY2>0</PROPERTY2>
                    <NAME>AVG_SAL</NAME>
                    <TYPE_NUM>2</TYPE_NUM>
                </COL_LIST_ITEM>
            </COL_LIST>
            <COL_LIST2/>
            <CON1_LIST/>
            <CON2_LIST/>
        </VIEW_T>
    </ROW>
</ROWSET>

8) Customized DDL from XML

    CREATE OR REPLACE VIEW "DEPTSAL" () AS 
  select d.deptno,
          d.dname,
          nvl(sum(e.sal), 0) as sum_sal,
          nvl(count(e.empno), 0) as num_emps,
          nvl(round(avg(e.sal), 2), 0) as avg_sal
     from dept d
     left join emp e
       on e.deptno = d.deptno
    group by d.deptno, d.dname;

We removed the OWNER_NAME node (on line 11) from the XML document. As a result, the schema was removed in the DDL. The result is the same as with the REMAP_SCHEMA transformation. Perfect.

We also removed the COL_LIST node (lines 48-99) from the XML document. However, the result in the DDL regarding the column alias list does not look good. The columns are gone, but the surrounding parentheses survived, which makes the DDL invalid. IMO this is a bug in the $ORACLE_HOME/rdbms/xml/xsl/kuview.xsl script. It’s handled correctly in the SXML script as we will see later. However, we can fix that by calling replace(..., '" () AS', '" AS') . Please note that a complete solution should do some checks to ensure that the COL_LIST is really not required.

When you look at line 12 in the XML document (<NAME>DEPTSAL</NAME>), you see that the view name does not contain double quotes. This is a strong indicator, that there is no way to remove the double quotes by manipulating the input XML document. In fact, the double quotes are hard-coded in all XSLT scripts. No way to override this behavior via dbms_metadata.

Furthermore, you do not find a node named like EDITIONABLEwith a value Y as in all_objects. Why? Because this information is stored in the node FLAGS. 0 means editionable and 1048576 means noneditionable. To be precise 1048576 represents bit number 21. If this bit is set then the view is noneditionable. You find the proof for this statement in the dba_objects view, where the expression for the editionable column looks like this:

9) Editionable column in dba_objects (formatted)

case
   when o.type# in (
      4, 5, 7, 8, 9, 11, 12, 13, 14, 22, 87, 114
   )
   then
      decode(
         bitand(o.flags, 1048576),
            0,       'Y', 
            1048576, 'N', 
                     'Y'
      )
   else
      null
end

The $ORACLE_HOME/rdbms/xml/xsl/kucommon.xslscript (see template Editionable) is evaluating this flag and either emitting a EDITIONABLE or NONEDITIONABLE text. These keywords were introduced in version 12.1. Since dbms_metadata can produce version-specific DDL, we set the version to 11.2 to suppress EDITIONABLE in the resulting DDL.

Convert SXML to DDL

10) Get customized DDL from SXML

with
   function view_ddl(in_name in varchar2) return clob is
      l_sxml        xmltype;
      l_main_handle integer;
      l_ddl_handle  integer;
      l_ddl         clob;
   begin
      -- create SXML document and remove unwanted nodes
      l_sxml        := xmltype(sys.dbms_metadata.get_sxml('VIEW', in_name, user));
      l_sxml        := l_sxml.deletexml('/VIEW/SCHEMA', 'xmlns=http://xmlns.oracle.com/ku');
      l_sxml        := l_sxml.deletexml('/VIEW/COL_LIST', 'xmlns=http://xmlns.oracle.com/ku');
      -- initialize dbms_metadata for view based on SXML input
      l_main_handle := sys.dbms_metadata.openw('VIEW');
      -- non-default transformations to improve DDL
      l_ddl_handle  := sys.dbms_metadata.add_transform(l_main_handle, 'SXMLDDL');
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'FORCE', false);
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'COLLATION_CLAUSE', 'NO_NLS');
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'SQLTERMINATOR', true);
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'VERSION', 1120000000);
      -- get DDL
      sys.dbms_lob.createtemporary(l_ddl, false, sys.dbms_lob.session);
      sys.dbms_metadata.convert(l_main_handle, l_sxml, l_ddl);
      -- free dbms_metadata resources
      sys.dbms_metadata.close(l_main_handle);
      -- return result
      return l_ddl;
   end view_ddl;
select xmlserialize(document xmltype(sys.dbms_metadata.get_sxml('VIEW', 'DEPTSAL', user))
          as clob indent size = 4)
  from dual
union all
select view_ddl('DEPTSAL')
  from dual
/

11) SXML input

<VIEW xmlns="http://xmlns.oracle.com/ku" version="1.0">
    <SCHEMA>REDSTACK</SCHEMA>
    <NAME>DEPTSAL</NAME>
    <DEFAULT_COLLATION>USING_NLS_COMP</DEFAULT_COLLATION>
    <COL_LIST>
        <COL_LIST_ITEM>
            <NAME>DEPTNO</NAME>
        </COL_LIST_ITEM>
        <COL_LIST_ITEM>
            <NAME>DNAME</NAME>
        </COL_LIST_ITEM>
        <COL_LIST_ITEM>
            <NAME>SUM_SAL</NAME>
        </COL_LIST_ITEM>
        <COL_LIST_ITEM>
            <NAME>NUM_EMPS</NAME>
        </COL_LIST_ITEM>
        <COL_LIST_ITEM>
            <NAME>AVG_SAL</NAME>
        </COL_LIST_ITEM>
    </COL_LIST>
    <SUBQUERY>select d.deptno,
          d.dname,
          nvl(sum(e.sal), 0) as sum_sal,
          nvl(count(e.empno), 0) as num_emps,
          nvl(round(avg(e.sal), 2), 0) as avg_sal
     from dept d
     left join emp e
       on e.deptno = d.deptno
   group by d.deptno, d.dname</SUBQUERY>
</VIEW>

12) Customized DDL from SXML

CREATE OR REPLACE VIEW ""."DEPTSAL" 
  AS 
  select d.deptno,
          d.dname,
          nvl(sum(e.sal), 0) as sum_sal,
          nvl(count(e.empno), 0) as num_emps,
          nvl(round(avg(e.sal), 2), 0) as avg_sal
     from dept d
     left join emp e
       on e.deptno = d.deptno
    group by d.deptno, d.dname;

The SXML document is smaller. It contains just the nodes to produce a DDL. That makes it easier to read.

We removed the SCHEMA node (on line 2) from the SXML document. As a result, the schema was removed in the DDL. But not completely. Two double quotes and one dot survived, which makes the DDL invalid. IMO this is a bug in the $ORACLE_HOME/rdbms/xml/xsl/kusviewd.xsl script. It’s handled correctly in the XML script. We could fix that with a replace(..., 'VIEW ""."', 'VIEW "') call. As long as the search term is not ambiguous, everything should be fine.

We also removed the COL_LIST node (lines 5-21) from the SXML document. In this case, the column alias list is completely removed from the DDL. Including the parentheses. Nice.

Maybe you wonder how editionable is represented in the SXML document. – With a NONEDITIONABLE node if the view is noneditionable.

How Can We Work Around the Limitations?

We’ve seen the limitations of the current dbms_metadata API and the necessity to use string manipulation functions to fix invalid DDL.

There is no way to remove double quotes from quoted identifiers with dbms_metadata. However, as Connor McDonald demonstrated in his blog post we can remove them with some string acrobatics. Why not use a simple replace call? Because there are some rules to follow. A globally applied replace(..., '"', null) call would produce invalid code in many real-life scenarios. We need a more robust solution.

Applying the rules in a code formatter can be such a robust solution.

Rules for Safely Removing Double Quotes from Quoted Identifiers

What are the rules to follow?

1. Is a SQL or PL/SQL Identifier

You have to make sure that the double quotes surround a SQL or PL/SQL identifier. Sounds logical. However, it is not that simple. Here are some examples:

13) PL/SQL comment

create or replace procedure plsql_comment is
begin
   -- "NOT_AN_IDENTIFIER"
   /*
      "NOT_AN_IDENTIFIER"
   */
   null;
end;
/

14) PL/SQL string

create or replace procedure plsql_string is
   l_string1 varchar2(100 char);
   l_string2 varchar2(100 char);
begin
   l_string1 := '"NOT_AN_IDENTIFIER"';
   l_string2 := q'[
                   "NOT_AN_IDENTIFIER"
                ]';
end;
/

15) PL/SQL conditional compilation text

create or replace procedure plsql_conditional_compilation_text is
begin
   $if false $then
      Conditional compilation blocks can contain any text.
      It does not need to be valid PL/SQL.
      "NOT_AN_IDENTIFIER"
      FTLDB and tePLSQL use this construct to store code templates in such blocks.
   $end
   null;
end;
/

16) Java string

create or replace and resolve java source named "JavaString" as
public class JavaString {
  public static String hello() {
     return "NOT_AN_IDENTIFIER";
  }
}
/

You can solve the first three examples easily with a lexer. A lexer groups a stream of characters. Such a group of characters is called a lexer token. A lexer token knows the start and end position in a source text and has a type. The lexer in SQL Developer and SQLcl produces the following types of tokens:

COMMENT (/* ... */)
LINE_COMMENT (-- ...)
QUOTED_STRING ('string' or q'[string]')
DQUOTED_STRING ("string" )
WS (space, tab, new line, carriage return)
DIGITS (0123456789 plus some special cases)
OPERATION (e.g. ()[]^-|!*+.><=,;:%@?/~)
IDENTIFIER (words)
MACRO_SKIP (conditional compilation tokens such as $if, $then, etc.)

We can simply focus on tokens of the type DQUOTED_STRING and ignore tokens that are within conditional compilation tokens $ifand $end.

To find out if a DQUOTED_STRING is part of a Java stored procedure is more difficult. Luckily SQL Developer’s parser cannot deal with Java stored procedures and produces a parse error. As a result, we just have to keep the code “as is” in such cases.

2. Consists of Valid Characters

According to the PL/SQL Language Reference a nonquoted identifier must comply with the following rules:

An ordinary user-defined identifier:

Begins with a letter

Can include letters, digits, and these symbols:

Dollar sign ($)

Number sign (#)

Underscore (_)

What is a valid letter in this context? The SQL Language Reference defines a letter as an “alphabetic character from your database character set”. Here are some examples of valid letters and therefore valid PL/SQL variable names or SQL column names:

Latin letters (AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz)
Umlauts (ÄäËëÏïÖöÜüŸÿ)
German Esszett (ẞß), please note that the Oracle Database does not convert the case of an Esszett, because the uppercase Esszett exists offically only since 2017-03-29
C cedilla (Çç)
Grave accented letters (ÀàÈèÌìÒòÙù)
Acute accented letters (ÁáĆćÉéÍíÓóÚúÝý)
Circumflex accented letters (ÂâÊêÎîÔôÛû)
Tilde accented letters (ÃãÑñÕõ)
Caron accented letters (ǍǎB̌b̌ČčĚěF̌f̌ǦǧȞȟǏǐJ̌ǰǨǩM̌m̌ŇňǑǒP̌p̌Q̌q̌ŘřŠšǓǔV̌v̌W̌w̌X̌x̌Y̌y̌ŽžǮǯ)
Ring accented letters (ÅåŮů)
Greek letters (ΑαΒβΓγΔδΕεΖζΗηΘθΙιΚκΛλΜμΝνΞξΟοΠπΡρΣσΤτΥυΦφΧχΨψΩω)
Common Cyrillic letters (АаБбВвГгДдЕеЁёЖжЗзИиЙйКкЛлМмНнОоПпРрСсТтУуФфХхЦцЧчШшЩщЪъЫыЬьЭэЮюЯя)
Hiragana letters (ぁあぃいぅうぇえぉおかがきぎくぐけげこごさざしじすずせぜそぞただちぢっつづてでとどなにぬねのはばぱひびぴふぶぷへべぺほぼぽまみむめもゃやゅゆょよらりるれろゎわゐゑをんゔゕゖゝゞゟ)

The Oracle Database throws an ORA-00911: invalid character when this rule is violated.

Cause: The identifier name started with an ASCII character other than a letter or a number . After the first character of the identifier name, ASCII characters are allowed including “$”, “#” and “_”. Identifiers enclosed in double quotation marks may contain any character other than a double quotation. Alternate quotation marks (q’#…#’) cannot use spaces, tabs, or carriage returns as delimiters. For all other contexts, consult the SQL Language Reference Manual.

The cause for this error message seems to be outdated, inaccurate and wrong. Firstly, it limits letters to those contained in an ASCII character set. This limitation is not generally valid anymore. Secondly, it claims that an identifier can start with a number, which is simply wrong. Thirdly, ASCII characters and letters are used as synonyms, which is misleading.

However, there are still cases where an identifier is limited to ASCII characters or single byte characters. For example, a database name or a database link name. In the projects I know, the reduction of letters to A-Z for identifiers is not a problem. The use of accented letters in identifiers are typically oversights. Therefore, I recommend limiting the range of letters in identifiers to A-Z .

Checking this rule is quite simple. We just have to make sure that the quoted identifier matches this regular expression: ^"[A-Z][A-Z0-9_$#]*"$. This works with any regular expression engine, unlike ^"[[:alpha:]][[:alpha:]0-9_$#]*"$.

3. Is Not a Reserved Word

According to the PL/SQL Language Reference and the SQL Language Reference a nonquoted identifier must not be a reserved word.

If you are working with 3rd party parsers, the list of reserved words might not match the ones defined for the Oracle Database. In my case, I also want to consider the reserved words defined by db* CODECOP. I’m using the following query to create a JSON array with currently 260 keywords:

17) JSON array of keywords

select json_arrayagg(keyword order by keyword) as keywords
  from (
          -- reserved keywords in Oracle database 21.3.0.0.0 (ATP)
          select keyword
            from v$reserved_words
           where (reserved = 'Y' or res_type = 'Y' or res_attr = 'Y' or res_semi = 'Y')
             and keyword is not null
             and regexp_like(keyword, '^[A-Z][A-Z0-9_$#]*$') -- valid nonquoted identifier
          union
          -- reserved keywords in db* CODECOP's PL/SQL parser 4.2.0
          select keyword
            from json_table(
                    '["AFTER","ALL","ALLOW","ALTER","ANALYTIC","AND","ANYSCHEMA","AS","ASC","ASSOCIATE","AUTHID","AUTOMATIC",
                      "AUTONOMOUS_TRANSACTION","BEFORE","BEGIN","BETWEEN","BULK","BY","BYTE","CANONICAL","CASE",
                      "CASE-SENSITIVE","CHECK","CLUSTER","COMPOUND","CONNECT","CONNECT_BY_ROOT","CONSTANT","CONSTRAINT",
                      "CONSTRUCTOR","CORRUPT_XID","CORRUPT_XID_ALL","CREATE","CROSSEDITION","CURRENT","CUSTOMDATUM",
                      "CYCLE","DB_ROLE_CHANGE","DECLARE","DECREMENT","DEFAULTS","DEFINE","DEFINER","DETERMINISTIC",
                      "DIMENSION","DISALLOW","DISASSOCIATE","DISTINCT","DROP","EACH","EDITIONING","ELSE","ELSIF",
                      "END","EVALNAME","EXCEPTION","EXCEPTION_INIT","EXCEPTIONS","EXCLUSIVE","EXTERNAL","FETCH",
                      "FOLLOWING","FOLLOWS","FOR","FORALL","FROM","GOTO","GRANT","GROUP","HAVING","HIDE","HIER_ANCESTOR",
                      "HIER_LAG","HIER_LEAD","HIER_PARENT","IF","IGNORE","IMMUTABLE","IN","INCREMENT","INDEX","INDICATOR",
                      "INDICES","INITIALLY","INLINE","INSERT","INSTEAD","INTERSECT","INTO","INVISIBLE","IS","ISOLATION",
                      "JAVA","JSON_EXISTS","JSON_TABLE","LATERAL","LIBRARY","LIKE","LIKE2","LIKE4","LIKEC","LOCK","LOGON",
                      "MAXVALUE","MEASURES","MERGE","MINUS","MINVALUE","MULTISET","MUTABLE","NAN","NAV","NCHAR_CS","NOCOPY",
                      "NOCYCLE","NONSCHEMA","NORELY","NOT","NOVALIDATE","NOWAIT","OF","ON","ONLY","OPTION","OR","ORADATA",
                      "ORDER","ORDINALITY","OVER","OVERRIDING","PARALLEL_ENABLE","PARTITION","PASSING","PAST","PIPELINED",
                      "PIVOT","PRAGMA","PRECEDES","PRECEDING","PRESENT","PRIOR","PROCEDURE","REFERENCES","REFERENCING",
                      "REJECT","RELY","REPEAT","RESPECT","RESTRICT_REFERENCES","RESULT_CACHE","RETURNING","REVOKE",
                      "SELECT","SEQUENTIAL","SERIALIZABLE","SERIALLY_REUSABLE","SERVERERROR","SETS","SHARE","SIBLINGS",
                      "SINGLE","SOME","SQL_MACRO","SQLDATA","STANDALONE","START","SUBMULTISET","SUBPARTITION",
                      "SUPPRESSES_WARNING_6009","THE","THEN","TO","TRIGGER","UDF","UNBOUNDED","UNDER","UNION",
                      "UNIQUE","UNLIMITED","UNPIVOT","UNTIL","UPDATE","UPSERT","USING","VALUES","VARRAY","VARYING",
                      "VIEW","WHEN","WHERE","WHILE","WINDOW","WITH","XMLATTRIBUTES","XMLEXISTS","XMLFOREST",
                      "XMLNAMESPACES","XMLQUERY","XMLROOT","XMLSCHEMA","XMLSERIALIZE","XMLTABLE"]',
                    '$[*]' columns (keyword path '$')
                 )
       );

The result can be used to populate a HashSet. This allows you to check very efficiently whether an identifier is a keyword.

Of course, such a global list of keywords is a simplification. In reality, the restrictions are context-specific. However, I consider the use of keywords for identifiers in any context a bad practice. Therefore, I can live with some unnecessarily quoted identifiers.

4. Is in Upper Case

This means that the following condition must be true: quoted_identifier = upper(quoted_identifier).

This does not necessarily mean that the identifier is case-insensitive as the following examples show:

18) JSON dot notation is case-sensitive

set pagesize 100
column key format A3
column value format A5
set null "(-)"
 
-- OK, KEY/VALUE are clearly case-sensitive
select j.pair."KEY", j.pair."VALUE"
  from json_table('[{KEY:1, VALUE:"One"},{KEY:2, VALUE:"Two"}]',
          '$[*]' columns (pair varchar2(100) format json path '$')) j;
 
KEY VALUE
--- -----
1   One  
2   Two  
 
-- OK, KEY/VALUE are case-sensitive, but you have to "know" that
select j.pair.KEY, j.pair.VALUE
  from json_table('[{KEY:1, VALUE:"One"},{KEY:2, VALUE:"Two"}]',
          '$[*]' columns (pair varchar2(100) format json path '$')) j;
 
KEY VALUE
--- -----
1   One  
2   Two  
          
-- Oups, no error, but the result is wrong (all NULLs)
-- This is why you should not let the formatter change the case of your identifiers!
select j.pair.key, j.pair.value
  from json_table('[{KEY:1, VALUE:"One"},{KEY:2, VALUE:"Two"}]',
          '$[*]' columns (pair varchar2(100) format json path '$')) j;
 
KEY VALUE
--- -----
(-) (-)  
(-) (-)

You can check this rule in combination with the previous rule by using a case-insensitive regular expression, which is the default.

5. Is Not Part of a Code Section for Which the Formatter Is Disabled

When you use a formatter there are some code sections where you do not want the formatter to change it. Therefore we want to honor the marker comments that disable and enable the formatter.

Here is an example:

19) Disable formatter – input

create or replace procedure disable_enable_formatter is
   l_dummy sys.dual.dummy%type;
begin
   -- @formatter:off
   select decode(dummy, 'X', 1 
                      , 'Y', 2
                      , 'Z', 3
                           , 0) "DECODE_RESULT" /* @formatter:on */
     into "L_DUMMY"
     from "SYS"."DUAL";
 
   select "DUMMY" -- noformat start
     into "L_DUMMY"
     from "SYS"."DUAL" -- noformat end
    where "DUMMY" is not null;
end;
/

After calling the formatter we expect the following output (when changing the identifier case to lower is enabled):

20) Disable formatter – result

create or replace procedure disable_enable_formatter is
   l_dummy sys.dual.dummy%type;
begin
   -- @formatter:off
   select decode(dummy, 'X', 1 
                      , 'Y', 2
                      , 'Z', 3
                           , 0) "DECODE_RESULT" /* @formatter:on */
     into l_dummy
     from sys.dual;
 
   select dummy -- noformat start
     into "L_DUMMY"
     from "SYS"."DUAL" -- noformat end
    where dummy is not null;
end;
/

To check this we can reuse the approach for quoted identifiers in conditional compilation text.

Removing Double Quotes from Quoted Identifiers with Arbori

As mentioned at the beginning of the post, the current version of the PL/SQL & SQL Formatter Settings can safely remove double quotes from PL/SQL and SQL code.

Here are simplified formatter settings which can be imported into SQL Developer 22.2.1. The formatter with these settings only removes the double quotes from identifiers in a safe way and leaves your code “as is”. You can download these settings from this Gist.

21) Advanced Settings

<options>
    <adjustCaseOnly>false</adjustCaseOnly>
    <singleLineComments>oracle.dbtools.app.Format.InlineComments.CommentsUnchanged</singleLineComments>
    <maxCharLineSize>120000</maxCharLineSize>
    <idCase>oracle.dbtools.app.Format.Case.NoCaseChange</idCase>
    <kwCase>oracle.dbtools.app.Format.Case.lower</kwCase>
    <formatWhenSyntaxError>false</formatWhenSyntaxError>
</options>

Firstly, the option adjustCaseOnly ensures that the Arbori program is fully applied.

Secondly, the option singleLineComments ensures that the whitespace before single-line comments are kept “as is”.

Thirdly, the maxCharLineSize ensures that no line breaks are added. The value of 120000 seems to be ridiculously high. However, I’ve seen single lines of around a hundred thousand characters in the wild.

Fourthly, the option idCase ensures that the case of nonquoted identifiers is not changed. This is important for JSON dot notation.

Fifthly, the option kwCase ensures that the case of keywords is also kept “as is”.

And Finally, the option formatWhenSyntaxError ensures that the formatter does not change code that the formatter does not understand. This is important to keep Java strings intact.

The value of all other options is irrelevant for this Arbori program.

22) Custom Format

-- --------------------------------------------------------------------------------------------------------------------
-- Minimal Arbori program (expected by the formatter, also expected: "order_by_clause___0").
-- --------------------------------------------------------------------------------------------------------------------
include "std.arbori"
dummy: :indentConditions & [node) identifier;
skipWhiteSpaceBeforeNode: runOnce -> { var doNotCallCallbackFunction;}
dontFormatNode: [node) numeric_literal | [node) path ->;
 
-- --------------------------------------------------------------------------------------------------------------------
-- Keep existing whitespace.
-- --------------------------------------------------------------------------------------------------------------------
 
keep_significant_whitespace:
    runOnce
-> {
    var LexerToken = Java.type('oracle.dbtools.parser.LexerToken');
    var tokens = LexerToken.parse(target.input, true);  // include hidden tokens
    var hiddenTokenCount = 0;
    var wsBefore = "";
    var Token = Java.type('oracle.dbtools.parser.Token');
    for (var i in tokens) {
        var type = tokens[i].type;
        if (type == Token.LINE_COMMENT || type == Token.COMMENT || type == Token.WS ||
            type == Token.MACRO_SKIP || type == Token.SQLPLUSLINECONTINUE_SKIP)
        {
            hiddenTokenCount++;
            if (type == Token.WS) {
                wsBefore += tokens[i].content;
            } else {
                wsBefore = "";
            }
        } else {
            if (i-hiddenTokenCount == 0 && hiddenTokenCount == wsBefore.length) {
                struct.putNewline(0, "");
            } else if (wsBefore != " ") {
                struct.putNewline(i-hiddenTokenCount, wsBefore);
            }
            wsBefore = "";
        }
    }
}
 
-- --------------------------------------------------------------------------------------------------------------------
-- Enforce nonquoted identifiers.
-- --------------------------------------------------------------------------------------------------------------------
 
enforce_nonquoted_identifiers:
    runOnce
-> {
    var offOnRanges = [];
 
    var populateOffOnRanges = function(tokens) {
        var off = -1;
        for (var i in tokens) {
            var type = tokens[i].type;
            if (type == Token.LINE_COMMENT || type == Token.COMMENT) {
                if (tokens[i].content.toLowerCase().indexOf("@formatter:off") != -1 
                    || tokens[i].content.toLowerCase().indexOf("noformat start") != -1)
                {
                    off = tokens[i].begin;
                }
                if (off != -1) {
                    if (tokens[i].content.toLowerCase().indexOf("@formatter:on") != -1
                        || tokens[i].content.toLowerCase().indexOf("noformat end") != -1)
                    {
                        offOnRanges.push([off, tokens[i].end]);
                        off = -1;
                    }
                }
            }
        }
    }
 
    var inOffOnRange = function(pos) {
        for (var x in offOnRanges) {
            if (pos >= offOnRanges[x][0] && pos < offOnRanges[x][1]) {
                return true;
            }
        }
        return false;
    }
 
    var HashSet = Java.type('java.util.HashSet');
    var Arrays = Java.type('java.util.Arrays');
    var reservedKeywords = new HashSet(Arrays.asList("ACCESS","ADD","AFTER","ALL","ALLOW","ALTER","ANALYTIC","AND",
        "ANY","ANYSCHEMA","AS","ASC","ASSOCIATE","AUDIT","AUTHID","AUTOMATIC","AUTONOMOUS_TRANSACTION","BEFORE",
        "BEGIN","BETWEEN","BULK","BY","BYTE","CANONICAL","CASE","CASE-SENSITIVE","CHAR","CHECK","CLUSTER","COLUMN",
        "COLUMN_VALUE","COMMENT","COMPOUND","COMPRESS","CONNECT","CONNECT_BY_ROOT","CONSTANT","CONSTRAINT",
        "CONSTRUCTOR","CORRUPT_XID","CORRUPT_XID_ALL","CREATE","CROSSEDITION","CURRENT","CUSTOMDATUM","CYCLE",
        "DATE","DB_ROLE_CHANGE","DECIMAL","DECLARE","DECREMENT","DEFAULT","DEFAULTS","DEFINE","DEFINER","DELETE",
        "DESC","DETERMINISTIC","DIMENSION","DISALLOW","DISASSOCIATE","DISTINCT","DROP","EACH","EDITIONING","ELSE",
        "ELSIF","END","EVALNAME","EXCEPT","EXCEPTION","EXCEPTIONS","EXCEPTION_INIT","EXCLUSIVE","EXISTS","EXTERNAL",
        "FETCH","FILE","FLOAT","FOLLOWING","FOLLOWS","FOR","FORALL","FROM","GOTO","GRANT","GROUP","HAVING","HIDE",
        "HIER_ANCESTOR","HIER_LAG","HIER_LEAD","HIER_PARENT","IDENTIFIED","IF","IGNORE","IMMEDIATE","IMMUTABLE",
        "IN","INCREMENT","INDEX","INDICATOR","INDICES","INITIAL","INITIALLY","INLINE","INSERT","INSTEAD","INTEGER",
        "INTERSECT","INTO","INVISIBLE","IS","ISOLATION","JAVA","JSON_EXISTS","JSON_TABLE","LATERAL","LEVEL","LIBRARY",
        "LIKE","LIKE2","LIKE4","LIKEC","LOCK","LOGON","LONG","MAXEXTENTS","MAXVALUE","MEASURES","MERGE","MINUS",
        "MINVALUE","MLSLABEL","MODE","MODIFY","MULTISET","MUTABLE","NAN","NAV","NCHAR_CS","NESTED_TABLE_ID","NOAUDIT",
        "NOCOMPRESS","NOCOPY","NOCYCLE","NONSCHEMA","NORELY","NOT","NOVALIDATE","NOWAIT","NULL","NUMBER","OF",
        "OFFLINE","ON","ONLINE","ONLY","OPTION","OR","ORADATA","ORDER","ORDINALITY","OVER","OVERRIDING",
        "PARALLEL_ENABLE","PARTITION","PASSING","PAST","PCTFREE","PIPELINED","PIVOT","PRAGMA","PRECEDES",
        "PRECEDING","PRESENT","PRIOR","PROCEDURE","PUBLIC","RAW","REFERENCES","REFERENCING","REJECT","RELY",
        "RENAME","REPEAT","RESOURCE","RESPECT","RESTRICT_REFERENCES","RESULT_CACHE","RETURNING","REVOKE","ROW",
        "ROWID","ROWNUM","ROWS","SELECT","SEQUENTIAL","SERIALIZABLE","SERIALLY_REUSABLE","SERVERERROR","SESSION",
        "SET","SETS","SHARE","SIBLINGS","SINGLE","SIZE","SMALLINT","SOME","SQLDATA","SQL_MACRO","STANDALONE",
        "START","SUBMULTISET","SUBPARTITION","SUCCESSFUL","SUPPRESSES_WARNING_6009","SYNONYM","SYSDATE","TABLE",
        "THE","THEN","TO","TRIGGER","UDF","UID","UNBOUNDED","UNDER","UNION","UNIQUE","UNLIMITED","UNPIVOT","UNTIL",
        "UPDATE","UPSERT","USER","USING","VALIDATE","VALUES","VARCHAR","VARCHAR2","VARRAY","VARYING","VIEW",
        "WHEN","WHENEVER","WHERE","WHILE","WINDOW","WITH","XMLATTRIBUTES","XMLEXISTS","XMLFOREST","XMLNAMESPACES",
        "XMLQUERY","XMLROOT","XMLSCHEMA","XMLSERIALIZE","XMLTABLE"));
 
    var isKeyword = function(token) {
        return reservedKeywords.contains(token.content.replace('"', ""));
    }
 
    var isUnquotingAllowed = function(token) {
        var Pattern = Java.type("java.util.regex.Pattern");
        if (!Pattern.matches('^"[A-Z][A-Z0-9_$#]*"$', token.content)) {
            return false;
        }
        if (isKeyword(token)) {
            return false;
        }
        return true;
    }
 
    var findAndConvertQuotedIdentifiers = function() {
        var tokens = LexerToken.parse(target.input,true);  // include hidden tokens
        populateOffOnRanges(tokens);
        var StringBuilder = Java.type('java.lang.StringBuilder');
        var newInput = new StringBuilder(target.input);
        var delpos = [];
        var conditionalBlock = false;
        for (var i in tokens) {
            var type = tokens[i].type;
            if (type == Token.MACRO_SKIP) {
                var content = tokens[i].content.toLowerCase();
                if (content.indexOf("$if ") == 0) {
                    conditionalBlock = true;
                } else if (content.indexOf("$end") == 0) {
                    conditionalBlock = false;
                }
            }
            if (type == Token.DQUOTED_STRING && isUnquotingAllowed(tokens[i]) 
                && !inOffOnRange(tokens[i].begin) && !conditionalBlock) 
            {
                delpos.push(tokens[i].begin);
                delpos.push(tokens[i].end-1);
            }
        }
        var i = delpos.length - 1;
        while (i >= 0) {
            newInput.deleteCharAt(delpos[i]);
            i--;
        }
        target.input = newInput.toString();
    }
 
    var updateParseTreeAndTokenList = function() {
        var Parsed = Java.type('oracle.dbtools.parser.Parsed');
        var SqlEarley = Java.type('oracle.dbtools.parser.plsql.SqlEarley')
        var defaultTokens = LexerToken.parse(target.input);
        var newTarget = new Parsed(target.input, defaultTokens, SqlEarley.getInstance(), 
            Java.to(["sql_statements"], "java.lang.String[]"));            
        target.src.clear();
        target.src.addAll(newTarget.src);
    }
 
    // main
    findAndConvertQuotedIdentifiers();
    updateParseTreeAndTokenList();
}
 
-- --------------------------------------------------------------------------------------------------------------------
-- Define identifiers (relevant for keyword case and identifier case)
-- --------------------------------------------------------------------------------------------------------------------
 
analytics: [identifier) identifier & [call) analytic_function & [call = [identifier;
ids: [identifier) identifier;
identifiers: ids - analytics ->;

Firstly, the lines 1 to 8 are required by the formatter. They are not interesting in this context.

Secondly, lines 9 to 42 are the heart of a lightweight formatter. This code ensures that all whitespace between all tokens is kept. Therefore, the existing format of the code remains untouched. Read this blog post to learn how SQL Developer’s formatter works.

Thirdly, lines 43 to 173 remove unnecessary double quotes from identifiers. We store the position of double quotes to be removed on lines 147 and 148 in an array named delpos while processing all tokens from start to end. The removal of the double quotes happens on line 153 while processing delpos entries from end to start.

And finally, the lines 174-180 define an Arbori query named identifier. The formatter uses this query to divide lexer tokens of type IDENTIFIER into keywords and identifiers. This is important to ensure that the case of identifiers is left “as is” regardless of the configuration of kwCase.

Doesn’t Connor’s PL/SQL Function Do the Same?

No, when you look closely at the ddl_cleanup.sql script as of 2022-03-02, you will find out that the ddl function has the following limitations:

Quoted identifiers are not ignored in
- Single and multi-line comments
- Conditional compilation text
- Code sections for which the formatter is disabled
Java Strings are treated as quoted identifiers
Reserved keywords are not considered
Nonquoted identifiers are changed to lowercase, which might break code using JSON dot notation

It just shows that things become complicated when you don’t solve them in the right place. In this case dbms_metadata’s XSLT scripts. dbms_metadata knows what’s an identifier. It can safely skip the enquoting process if the identifier is in upper case, matches the regular expression ^[A-Z][A-Z0-9_$#]*$ and the identifier is not a reserved keyword. That’s all. The logic can be implemented in a single XSL template. We API users on the other side must parse the code to somehow identify quoted identifiers and their context before we can decide how to proceed.

Formatting DDL Automatically

You can configure SQL Developer to automatically format DDL with your current formatter settings. For that you have to enable the option Autoformat Dictionary Objects SQL as in the screenshot below:

Here’s the result for the deptsal view using the PL/SQL & SQL Formatter Settings:

The identifiers in upper case were originally quoted identifiers. By default, we configure the formatter to keep the case of identifiers. This ensures that code using JSON dot notation is not affected by a formatting operation.

Processing Many Files

SQL Developer is not suited to format many files. However, you can use the SQLcl script or the standalone formatter to format files in a directory tree. The formatter settings (path to the .xml and .arbori file) can be passed as parameters. I recommend using the standalone formatter. It uses the up-to-date and much faster JavaScript engine from GraalVM. Furthermore, the standalone formatter also works with JDK 17, which no longer contains a JavaScript engine.

You can download the latest tvdformat.jar from here. Run java -jar tvdformat.jar to show all command line options.

Summary

If your code base contains generated code, then it probably also contains unnecessarily quoted identifiers. Especially if dbms_metadata was used to extract DDL statements. Removing these double quotes without breaking some code is not that easy. However, SQL Developer’s highly configurable formatter can do the job, even without actually formatting the code.

I hope that some of the shortcomings of dbms_metadata will be addressed in an upcoming release of the Oracle Database. Supporting nonquoted identifiers as an additional non-default option should be easy and not so risky to implement.

Anyway, instead of just detecting violations of G-2180: Never use quoted identifiers, it is a good idea to be able to correct them automatically.

Please open a GitHub issue if you encounter a bug in the formatter settings. Thank you.