Is Your Application SmartDB?

I had recently a few discussions regarding the Smart Database Paradigm (SmartDB) with long-standing customers, new customers, partners, competitors and colleagues. Some people think that using APEX and PL/SQL in their database application is SmartDB. But it is not that simple. Bryn Llewelyn defined the term “Smart Database Paradigm” (SmartDB) in his talk Guarding your data behind a hard shell PL/SQL API. Based on his definition a SmartDB application must have the following five properties:

  1. The connect user does not own database objects
  2. The connect user can execute PL/SQL API units only
  3. PL/SQL API units handle transactions
  4. SQL statements are written by human hand
  5. SQL statements exploit the full power of set-based SQL

These five properties are not a set of recommendations. They are the bare minimum. Either your application has these properties or not. It’s binary. There is (almost) no room for interpretation. Here’s an excerpt of a longer Twitter thread, making my and especially Bryn Llewelyn’s view a bit clearer.

In this blog post I show how to check the compliance with the first three SmartDB properties by querying the Oracle data dictionary. The remaining two SmartDB properties have to be evaluated manually using reviews. The goal is to show that some of these properties are easily not followed (for good reasons) and that makes your database centric application something else than SmartDB (but not necessarily a curate’s egg).

In How to Prove That Your SmartDB App Is Secure I’ve crafted a good, a bad and an ugly demo application. I installed these applications using this script in my Oracle Database 18c instance.

The anonymous PL/SQL block and the SQL queries in this blog post require DBA privileges. The required minimum database version is mentioned in the title of the code bock, e.g. (>=9.2), (>=12.1) or (>=12.2).

Now let’s look at the five SmartDB properties.

1. The connect user does not own database objects

The connect user is used by application components outside of the database to interact with the database. It is configured for example in the connection pool of the middle tier application.

The connect user must access only the APIs of the underlying database applications and therefore does not need own database objects.

Checking the compliance of this property is simple.

If you are using a connect user that is not listed as result, then your application is not SmartDB.

The result contains also users that do not have the CREATE SESSION privilege and therefore cannot be used as connect users. The queries to check SmartDB properties 2 and 3 will address this issue.

2. The connect user can execute PL/SQL API units only

Database views and tables are guarded behind a hard shell PL/SQL API. Only the following database objects may be part of the API:

  • Packages
  • Types
  • Functions
  • Procedures

So we just have to check if the connect user has access to objects with the predicate object_type NOT IN ('PACKAGE', 'TYPE', 'FUNCTION', 'PROCEDURE'), right? Yes, but the result would not be helpful. Why? Because every user with just the CREATE SESSION privilege has access to some thousand tables and views via the PUBLIC role. For example DUAL, ALL_VIEWS or NLS_SESSION_PARAMETERS. Strictly speaking it is not possible to create an Oracle user that can execute PL/SQL units only. Some might argue that this alone makes SmartDB applications a fantasy. However, I’m not in that camp. I think we just have to focus on our own objects and exclude all Oracle maintained users along with some common utility users from the analysis.

Furthermore the connect user should only have the CONNECT role (no more and no less). This way we ensure/know that no access is granted to internal objects via ANY privileges.

For this check we can reuse the query for rule 1 from my previous blog post How to Prove That Your SmartDB App Is Secure.

Query to check SmartDB property 2


If you are using a connect user that is not listed as result, then your application is not SmartDB.

In this case the APEX_REST_PUBLIC_USER is a false positive. The named subquery  app_user excludes the APEX_180100 user which grants various views and sequences to PUBLIC. Hence APEX 18.1 is not a SmartDB application.

3. PL/SQL API units handle transactions

A SmartDB application holds the complete business logic in the database. A PL/SQL API call handles a transaction completely. The API must not contain units for partial transaction work. Such units may exist, but must not be part of the PL/SQL API exposed to the connect user.

For write operations a COMMIT is called on success and a ROLLBACK is called on failure at the end of the operation.

For read operations the PL/SQL API is responsible for the read consistency.

Distributed transactions are supported via database links only. Other data sources cannot participate in the same database transaction. If this is a mandatory requirement, then SmartDB is the wrong approach. However, Oracle AQ can be a good alternative to propagate data consistently in upstream or downstream transactions.

To check if an application has this SmartDB property, we have to do something like this:

  • Find all PL/SQL API units (as we’ve done it for the SmartDB property 2).
  • Produce a call tree for PL/SQL API units. On object level this could be achieved by querying DBA_DEPENDENCIES. For a more accurate result on sub-object level PL/Scope could be used by querying DBA_IDENTIFIERS .
  • Find INSERT, UPDATEDELETE, MERGE, COMMIT and ROLLBACK statements in PL/SQL units. Static statements can be found via PL/Scope in the DBA_STATEMENTS view. But executions in dynamic statements are a challenge, since the DML may be stored outside of the PL/SQL unit (e.g. in tables). It’s virtually impossible to get a complete result using static code analysis.
  • Bring these results together and check if DML statements are followed by a transaction control statement. This is another challenge. Without a parser (and some semantic analysis) it is not possible to find out if a statement is really executed (e.g. PL/Scope does not provide information about control structures).

For this blog post we use a naïve static code analysis approach. We do the analysis on object level and consider static SQL statements only. Furthermore we assume that DML statements (INSERT, UPDATEDELETE, MERGE) and transaction control statements (COMMITROLLBACK) found in the call hierarchy are all executed and the transaction control statement at the very end.

As long as the transaction control statements are not executed as dynamic SQL the result should be good enough. This means if the query produces no result for an application, then this is for sure not a SmartDB application, but if a result is produced, then this does not guarantee that the application is really following the rules and issuing a COMMIT or a ROLLBACK at the end of a write transaction.

Compile all application users with PL/Scope


Query to check SmartDB property 3


If you are using a connect user that is not listed as result, then your application is not SmartDB.

The query checks also the SmartDB properties 1 and 2. However, the query produces only a result if the PL/SQL bodies in application users are compiled with PL/Scope (see script above). Change the main part of the query to SELECT * from issues if you want to know why a connect user is not shown in the result.

BTW: the check results for the SmartDB properties 2 and 3 are identical because THE_BAD_USER and THE_GOOD_USER do not have access to write operations.

4. SQL statements are written by human hand

If you generate SELECT, INSERT, UPDATEDELETE or MERGE statements, then your application is not SmartDB.

I see the following reasons to generate code (including SQL statements):

  • Don’t repeat yourself (DRY principle). Striving for DRYness leads to better data models, better designs and better code. In some cases code generators are required to achieve the goal.
  • Reduce the overall complexity by using a domain specific language (DSL). Enforce rules and conventions in the DSL and the code templates. This leads to a smaller code base and improves the productivity.

A generator creates code at design/build time or at runtime. Both approaches have pros and cons. Code generators producing code at runtime are easier to deploy, but may produce more runtime errors, are harder to debug and come with a performance penalty. Code generator producing code at design/build time are more extensive and more costly to deploy, are much easier to debug, have a better runtime performance and produce errors at install time rather than runtime.

Even if generated code should look like written by human hand, it should never become a part of your code base. Generated code is derived from something else. This “something else” (generator, code templates, generator input) is part of your code base. Do not amend generated code and keep your code base as small as possible. It’s okay to keep the generated code also in the version control system, but you should separate it from the “real code base”. It should be absolutely clear that generated code is completely replaced by a subsequent generator run.

Code generators offer a high value when used correctly. Therefore, the general ban on the use of generators is simply ignorant.

5. SQL statements exploit the full power of set-based SQL

If you use row-by-row processing when set-based SQL is feasible and noticeably faster, then your application is not SmartDB.

This is the most important SmartDB property. Set-based SQL ist the key to good performance. It means that you are using the database as a processing engine and not as a data store only.

You should make it a habit to minimize the total number of executed SQL statements to get a job done. Less loops, more set-based SQL. In the end it is simpler. You tell the database what you want and the optimizer figures out how to do it efficiently.

Conclusion

If you use set-based SQL in your application and manually craft your SQL statements, then you can check with a single SQL statement, if your application is really SmartDB. Don’t be disappointed, if it is not. Beside some demo applications I haven’t seen a SmartDB application and I do not expect to see one soon.

The SmartDB idea is based on sound analysis and some good advices (see Toon Koppelaar’s excellent video and slide deck). But the resulting SmartDB definition overshoots the mark. It focuses too much on PL/SQL and ignores the capabilities of database-aware tools. These tools support SQL (or MDX) as the primary interface to the database (especially for queries). Using another path to the database is usually possible, but less efficient from a development costs and time to market perspective.

It looks like that there is currently no way to refine the SmartDB definition making this approach broadly usable. The recommended alternative is to come up with an own definition. My next post is dealing with that topic. However, I’d still like to see a SmartDB 2.0 definition that tolerates views as part of the API, generated code and transaction control statements by the API caller.

3 Comments

  1. […] Would you like to know if your database application is SmartDB compliant? Then see my previous blog post. There’s a script you can run to find […]

  2. […] If you breaking your monolithic applications into smaller parts to allow teams to be “master of their own destiny”, then you must be aware that these smaller parts are in fact not independent. Treating the database as a persistence layer only will lead to applications that are mimicking database functionality in a less efficient way. For small applications you won’t notice the difference, but you will on a larger scale. The solution is simple. Use the database as processing engine as recommended by PinkDB and SmartDB. […]

  3. […] took the liberty of adding headers for the SmartDB properties I’ve described in this post. At that time I assumed that all these five properties were mandatory, which in fact only holds […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.