Joining Temporal Intervals

From time to time a customer asks me how to join multiple tables with temporal intervals (e.g. defined by two columns such as valid_from and valid_to per row). The solution is quite simple if you may limit your query to a certain point in time like now, yesterday or similar. Such a time point becomes just an additional filter criterion per temporal table (e.g. t1.some_date between t2.valid_from and t2.valid_to). But in some cases, this approach is not feasible, e.g. if you have to provide all relevant time intervals. I’ll explain in this post a solution approach based on the following data model.

TemporalJoinModel

This model is based on the famous EMP and DEPT tables. I added a reference table for job just to get an additional table to join. You find the SQL script to create and populate the model here.

I’d like to query data from the tables EMPV, DEPTV, JOBV and EMPV (manager). Here is the content of the tables reduced to the data relevant for empno 7788 (SCOTT).

Data Excerpt
SQL> SELECT * FROM empv WHERE empno = 7788 ORDER BY valid_from;

EMPVID EMPNO ENAME JOBNO  MGR HIREDATE     SAL COMM DEPTNO VALID_FROM  VALID_TO
------ ----- ----- ----- ---- ----------- ---- ---- ------ ----------- -----------
     8  7788 SCOTT     5 7566 19-APR-1987 3000          20 19-APR-1987 31-DEC-1989
    22  7788 Scott     5 7566 19-APR-1987 3000          20 01-JAN-1990 31-MAR-1991
    36  7788 Scott     5 7566 19-APR-1987 3300          20 01-APR-1991 31-DEC-9999

SQL> SELECT * FROM jobv WHERE jobno = 5 ORDER BY valid_from;

JOBVID JOBNO JOB     VALID_FROM  VALID_TO
------ ----- ------- ----------- -----------
     5     5 ANALYST 01-JAN-1980 20-JAN-1990
    10     5 Analyst 21-JAN-1990 31-DEC-9999

SQL> SELECT * FROM deptv WHERE deptno = 20 ORDER BY valid_from;

DEPTVID DEPTNO DNAME    LOC    VALID_FROM  VALID_TO
------- ------ -------- ------ ----------- -----------
      2     20 RESEARCH DALLAS 01-JAN-1980 28-FEB-1990
      6     20 Research DALLAS 01-MAR-1990 31-MAR-1990
     10     20 Research Dallas 01-APR-1990 31-DEC-9999

SQL> SELECT * FROM empv WHERE empno = 7566 ORDER BY valid_from;

EMPVID EMPNO ENAME JOBNO  MGR HIREDATE       SAL COMM DEPTNO VALID_FROM  VALID_TO
------ ----- ----- ----- ---- ----------- ------ ---- ------ ----------- -----------
     4  7566 JONES     4 7839 02-APR-1981   2975          20 02-APR-1981 31-DEC-1989
    18  7566 Jones     4 7839 02-APR-1981   2975          20 01-JAN-1990 31-MAR-1991
    32  7566 Jones     4 7839 02-APR-1981 3272.5          20 01-APR-1991 31-DEC-9999

The following figure visualizes the expected result of a temporal join using the data queried previously.

TemporalJoinOverview

In this case, six result records (intervals) are expected. As you see the result is dependent on the number of different intervals or the distinct VALID_FROM values. The driving object is valid from 19-APR-1987 until 31-DEC-9999. VALID_FROM values outside of the validity are irrelevant (e.g. 01-JAN-1980 and 02-APR-1981).

Based on this information we can write the query. The highlighted inline view g produces a list of all distinct VALID_FROM values which will be used as additional join criterion for all temporal tables.

Temporal Query (including semantic, no gaps)
SELECT e.empno,
       MIN(g.valid_from) AS valid_from,
       LEAD(MIN(g.valid_from) - 1, 1, DATE '9999-12-31') OVER(
          PARTITION BY e.empno ORDER BY MIN(g.valid_from)
       ) AS valid_to,
       e.ename,
       j.job,
       e.mgr,
       m.ename AS mgr_ename,
       e.hiredate,
       e.sal,
       e.comm,
       e.deptno,
       d.dname,
       d.loc
  FROM empv e
 INNER JOIN (SELECT valid_from FROM empv
             UNION
             SELECT valid_from FROM deptv
             UNION
             SELECT valid_from FROM jobv) g
    ON g.valid_from BETWEEN e.valid_from AND e.valid_to
 INNER JOIN deptv d
    ON d.deptno = e.deptno
       AND g.valid_from BETWEEN d.valid_from AND d.valid_to
 INNER JOIN jobv j
    ON j.jobno = e.jobno
       AND g.valid_from BETWEEN j.valid_from AND j.valid_to
  LEFT JOIN empv m
    ON m.empno = e.mgr
       AND g.valid_from BETWEEN m.valid_from AND m.valid_to
 WHERE e.empno = 7788
 GROUP BY e.empno,
          e.ename,
          j.job,
          e.mgr,
          m.ename,
          e.hiredate,
          e.sal,
          e.comm,
          e.deptno,
          d.dname,
          d.loc
 ORDER BY empno, valid_from;
Query Result
EMPNO VALID_FROM  VALID_TO    ENAME JOB      MGR MGR_ENAME HIREDATE     SAL COMM DEPTNO DNAME    LOC
----- ----------- ----------- ----- ------- ---- --------- ----------- ---- ---- ------ -------- ------
 7788 19-APR-1987 31-DEC-1989 SCOTT ANALYST 7566 JONES     19-APR-1987 3000          20 RESEARCH DALLAS
 7788 01-JAN-1990 20-JAN-1990 Scott ANALYST 7566 Jones     19-APR-1987 3000          20 RESEARCH DALLAS
 7788 21-JAN-1990 28-FEB-1990 Scott Analyst 7566 Jones     19-APR-1987 3000          20 RESEARCH DALLAS
 7788 01-MAR-1990 31-MAR-1990 Scott Analyst 7566 Jones     19-APR-1987 3000          20 Research DALLAS
 7788 01-APR-1990 31-MAR-1991 Scott Analyst 7566 Jones     19-APR-1987 3000          20 Research Dallas
 7788 01-APR-1991 31-DEC-9999 Scott Analyst 7566 Jones     19-APR-1987 3300          20 Research Dallas

The beauty of this approach is that it works with any granularity and it automatically merges identical intervals. In this example, I use a granularity of a day, but this approach works also for the granularity of seconds or even a fraction of a second, e.g. if you are using a TIMESTAMP data type to define the interval boundaries.

It’s important to notice that I’ve used an including semantic for VALID_TO in this example. If you use an excluding semantic (VALID_TO = VALID_FROM of the subsequent interval) you have to amend the calculation of the VALID_TO and the join criteria (BETWEEN is not feasible with excluding semantic). Furthermore, this example does not cover gaps in the historization. If you have gaps you need to amend the calculation of the VALID_TO column and ensure that you do not merge gaps. Merging intervals with a simple group by will produce wrong results if “disconnected” intervals have the same content. Issues are addressed in part 2 of this post.

Updated on 2012-12-28, emphasized the possibility of wrong results and added a link to part 2 of this post.

1 Comment

  1. blank Felix says:

    Nice Blog entry – transparent and systematic approach. I got it after a few moments of (hard) thinking ;-) -Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.