Merging Temporal Intervals with Gaps

In Joining Temporal Intervals I explained how to join multiple temporal tables. The provided solution merges also temporal intervals but – as pointed out in that post – may produce wrong results if the underlying driving table is not gaplessly historized. In this post I’ll explain how to merge temporal intervals with various data constellations correctly.

You find the scripts to create and populate the tables for scenario A and B here.

Scenario A – No overlapping intervals

This scenario handles consistent data. This means no overlapping intervals, no duplicate intervals, no including intervals, no negative intervals (valid_to < valid_from). Here is the content of the example table t1:

I’d like to write a query which produces all intervals for the columns OID and C1 honoring gaps in the historization. For OID 1 I expect that record 1 and 2 are merged, but records 3 and 4 are not merged because the intervals are not “connected”. For OID 2 I expected to get a single merged interval. For OID 4 I expect to get 3 records, since the records with C1=’D’ are not connected.

So, the following query result is expected:

The next query produces exactly this result.

The usability of the WITH clause aka Subquery Factoring Clause improved significantly with 11gR2. Since then it’s not necessary to reference all named queries anymore. The named queries become real transient views and this simplifies debugging a lot. – If you replace the content of line 57 with “SELECT * FROM calc_various ORDER BY oid, valid_from;” the query produces the following result:

You see that the value 1 for HAS_GAP indicates that the record is not “connected” with the previous record. Additionally the value 1 for the column NEW_GROUP indicates that the records must not be merged even if they are connected.

To simplify the calculation of NEW_GROUP for multiple group by columns (used in the named query “merged”) build a concatenated string of all relevant columns to deal with a single column similar to the column C1 in this example.

HAS_GAP and NEW_GROUP are used in the subsequent named query calc_group which produces the following result:

The GROUP_NO is calculated per OID. It’s technically a running total of HAS_GAP + NEW_GROUP. All intervals with the same GROUP_NO are mergeable. E.g. record 2, 3 and 4 have a different GROUP_NO which ensures that every single gap is honored for OID 1.

Scenario B – With Overlapping intervals

Reality is that we have sometimes to deal with inconsistent data. E.g. duplicate data intervals, overlapping data intervals or even negative data intervals (valid_to < valid_from). I’ve created an example table t2 which is in fact a copy of t1 but includes an additional messy OID 3 with the following data:

The following figure visualizes the intervals of OID 3. The green patterns are expected to be used and the red pattern are expected to be ignored. The rationale is that the VID (version ID) is typically based on an Oracle Sequence and therefore it’s assumed that higher VIDs are newer and therefore more adequate. In real cases you may have additional information such a created_at or modified_on timestamps which helps you identifying the record to be used in conflicting situations. Since data is considered inconsistent in this scenario the cleansing strategy might be very different in real cases. However, cleansing is always the first step and in this example the highest VID has the highest priority in case of conflicts.

MergingOverviewScenarioB

In cleansing step 1 we gather all potential starting intervals. We do not need all these intervals, but since we will merge intervals in the last processing step, we do not have to care about some unnecessary intervals right now.

In cleansing step 2 we calculate the relevant VID for every previous result. Additionally we may inexpensively calculate if an interval is a gap.

In cleansing step 3 we extend the previous result by the missing columns from table t2 and calculate the NEW_GROUP column with the same logic as in scenario A.

Now we have cleansed the data and are ready for the final steps “calc_group” and “merge” which are very similar to scenario A. The relevant difference is the highlighted line 70 which filters non-gap records. Here is the complete statement and the query result:

If you change line 70 to “WHERE is_gap = 1” you’ll get all gap records, just a way to query non-existing intervals.

Conclusion

Merging temporal intervals is challenging, especially if the history has gaps and the data is inconsistent as in scenario B. However, the SQL engine is a powerful tool to clean up data and merge the temporal intervals efficiently in a single SQL statement.

1 Comment

  1. […] Merging Temporal Intervals with Gaps → […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.