Joining Temporal Intervals Part 2

The solution I’ve provided in Joining Temporal Intervals produces wrong results if one or more temporal tables have gaps in their history or if disconnected intervals have the same content. In this post I’ll address both problems.

Test Data

The example queries are based on the same model as described in Joining Temporal Intervals. For the join of the tables EMPV, DEPTV, JOBV and EMPV (manager) I’ve amended the history to contain some gaps which are highlighted in the following listing.

From a business point of view Scott left the company on 31-JUL-1991 and came back on 01-JAN-1992 with a better salary. It’s important to notice, that Scott is Adams manager and Adams is therefore leaderless from 01-AUG-1991 until 31-DEC-1991. Additionally I fabricated a gap for JOBNO 5 on 21-JAN-1990.

You find the SQL script to create and populate the model here.

Gap-Aware Temporal Join

The following figure visualizes the expected result of the temporal join. The raw data intervals queried perviously are represented in blue and the join result in red. The yellow bars highlight the gaps in the source and result data set.

Here is the query and the join result for EMPNO = 7788. Please note that the column LOC from table DEPV is not queried, which will reduce the number of final result intervals from 7 to 6.

The highlighted inline view g produces a list of all distinct VALID_FROM values which will be used as additional join criterion for all temporal tables. Unlinke in Joining Temporal Intervals all interval endpoints need also be considered to identify gaps.

The calculation of the VALID_TO column is a bit laborious (see highlighted lines 3 to 11). You need to get the lowest value for VALID_TO of all involved intervals, including outer joined intervals like m. Also the subsequent VALID_FROM has to be considered since the inline view g is providing all VALID_FROM values to be probed and they may be completely independent of the involved intervals.

The remaining part of the query is quite simple. I’ve highlighted the rows in the result set which should be merged in a subsequent step (see line 4 and 8).

If you change line 46 to “WHERE e.mgr = 7788” you get the following result:

This result is interesting for several reasons. First, there are two highlighted records which should be merged in a subsequent step. Second, the line 10 represents the time where Scott was not employed by this company. Third, the records in line 9 and 11 must not be merged in a subsequent step. They are identical (beside VALID_FROM and VALID_TO) but the intervals are not connected.

Merging Temporal Intervals

If you look at the result of the previous query you might be tempted to avoid the merging step since there are just a few intervals which need merging. However, in real live scenarios you might easily end up with daily intervals for large tables, since the inline-view g considers all valid_from and valid_to columns of all involved tables. Sooner or later you will think about merging temporal intervals or about other solutions to reduce the result set. – If you’re skeptical, then querying for “e.empno = 7369” might give you an idea what I’m taking about (21 intervals before merge, 6 intervals after merge).

Since I covered this topic in Merging Temporal Intervals with Gaps I’ll provide the query to produce the final result and explain the specialities only.

The named query “joined” produces an additional column RESULT_COLS (see highlighted lines 16-26). It’s simply a concatenation of all columns used in the group by clause of the named query “merged”. RESULT_COLS is used in the named query “calc_various” (see highlighted lines 86-88) to calculate the column NEW_GROUP. NEW_GROUP is set to 1 if the value of RESULT_COLS is different for the current and previous row. NEW_GROUP ensures that Adam’s intervals valid from 01-APR-1991 and valid from 01-JAN-1992 are not merged. See line 7 and 9 in the following listing.

The ORA_HASH function over RESULT_COLS could give me a shorter representation of all result columns. But since I do not detect hash collisions and this would lead to wrong results in some rare data constellations, I decided not to use a hash function.

The named queries “calc_various”, “calc_group” and “merged” are based on the query in scenario A of Merging Temporal Intervals with Gaps. The columns HAS_GAP, NEW_GROUP and GROUP_NO are explained in this post.

Conclusion

Joining and merging temporal intervals is indeed very challenging. Even if I showed in this post that it is doable I recommend to choose a simpler solution whenever feasible. E.g. limiting the query to a certain point in time for all involved temporal tables, since this eliminates the need of merging temporal intervals and even simplifies the gap-aware temporal join.

3 Comments

  1. […] results if “disconnected” intervals have the same content. Issues are addressed in part 2 of this […]

  2. […] time ago Philipp Salvisberg has posted several thoughts about joining and merging temporal intervals. Recently I was looking for some examples of using the new MATCH_RECOGNIZE clause introduced in 12c […]

  3. […] einiger Zeit hat mein Kollege Philipp Salvisberg ein paar interessante Beiträge zum Thema Joinen und Mergen der Zeitintervalle gepostet. Neulich war ich auf der Suche nach Anwendungsbeispielen für die neue MATCH_RECOGNIZE […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.