Joining Temporal Intervals

From time to time a customer asks me how to join multiple tables with temporal intervals (e.g. defined by two columns such as valid_from and valid_to per row). The solution is quite simple if you may limit your query to a certain point in time like now, yesterday or similar. Such a time point becomes just an additional filter criterion per temporal table (e.g. t1.some_date between t2.valid_from and t2.valid_to). But in some cases this approach is not feasible, e.g. if you have to provide all relevant time intervals. I’ll explain in this post a solution approach based on the following data model.

TemporalJoinModel

This model is based on the famous EMP and DEPT tables. I added a reference table for job just to get an additional table to join. You find the SQL script to create and populate the model here.

I’d like to query data from the tables EMPV, DEPTV, JOBV and EMPV (manager). Here is the content of the tables reduced to the data relevant for empno 7788 (SCOTT).

The following figure visualizes the expected result of a temporal join using the data queried previously.

TemporalJoinOverview

In this case six result records (intervals) are expected. As you see the result is dependent on the number of different intervals or the distinct VALID_FROM values. The driving object is valid from 19-APR-1987 until 31-DEC-9999. VALID_FROM values outside of the validity are irrelevant (e.g. 01-JAN-1980 and 02-APR-1981).

Based on these information we are able to write the query. The highlighted inline view g produces a list of all distinct VALID_FROM values which will be used as additional join criterion for all temporal tables.

The beauty of this approach is that it works with any granularity and it automatically merges identical intervals. In this example I use a granularity of a day, but this approach works also for granularity of seconds or even fraction of a seconds, e.g. if you are using a TIMESTAMP data type to define the interval boundaries.

It’s important to notice that I’ve used an including semantic for VALID_TO in this example. If you use an excluding semantic (VALID_TO = VALID_FROM of the subsequent interval) you have to amend the calculation of the VALID_TO and the join criteria (BETWEEN is not feasible with excluding semantic). Furthermore this example does not cover gaps in the historization. If you have gaps you need to amend the calculation of the VALID_TO column and ensure that you do not merge gaps. Merging intervals with a simple group by will produce wrong results if “disconnected” intervals have the same content. Issues are addressed in part 2 of this post.

Updated on 2012-12-28, emphasized possibility of wrong results and added link to part 2 of this post.

1 Comment

  1. blank Felix says:

    Nice Blog entry – transparent and systematic approach. I got it after a few moments of (hard) thinking ;-) -Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.