Joining Temporal Intervals Part 2

The solution I’ve provided in Joining Temporal Intervals produces wrong results if one or more temporal tables have gaps in their history or if disconnected intervals have the same content. In this post I’ll address both problems.

Test Data

The example queries are based on the same model as described in Joining Temporal Intervals. For the join of the tables EMPV, DEPTV, JOBV and EMPV (manager) I’ve amended the history to contain some gaps which are highlighted in the following listing.

Test Data Excerpt
SQL> SELECT * FROM empv WHERE empno = 7788 ORDER BY valid_from;

EMPVID EMPNO ENAME JOBNO  MGR HIREDATE       SAL COMM DEPTNO VALID_FROM  VALID_TO
------ ----- ----- ----- ---- ----------- ------ ---- ------ ----------- -----------
     8  7788 SCOTT     5 7566 19-APR-1987 3000.0          20 19-APR-1987 31-DEC-1989
    22  7788 Scott     5 7566 19-APR-1987 3000.0          20 01-JAN-1990 31-MAR-1991
    36  7788 Scott     5 7566 19-APR-1987 3300.0          20 01-APR-1991 31-JUL-1991
    43  7788 Scott     5 7566 01-JAN-1992 3500.0          20 01-JAN-1992 31-DEC-9999

SQL> SELECT * FROM jobv WHERE jobno = 5 ORDER BY valid_from;

    JOBVID JOBNO JOB       VALID_FROM  VALID_TO
---------- ----- --------- ----------- -----------
         5     5 ANALYST   01-JAN-1980 20-JAN-1990
        10     5 Analyst   22-JAN-1990 31-DEC-9999

SQL> SELECT * FROM deptv WHERE deptno = 20 ORDER BY valid_from;

   DEPTVID DEPTNO DNAME          LOC           VALID_FROM  VALID_TO
---------- ------ -------------- ------------- ----------- -----------
         2     20 RESEARCH       DALLAS        01-JAN-1980 28-FEB-1990
         6     20 Research       DALLAS        01-MAR-1990 31-MAR-1990
        10     20 Research       Dallas        01-APR-1990 31-DEC-9999

SQL> SELECT * FROM empv WHERE empno = 7566 ORDER BY valid_from;

EMPVID EMPNO ENAME JOBNO  MGR HIREDATE       SAL COMM DEPTNO VALID_FROM  VALID_TO
------ ----- ----- ----- ---- ----------- ------ ---- ------ ----------- -----------
     4  7566 JONES     4 7839 02-APR-1981 2975.0          20 02-APR-1981 31-DEC-1989
    18  7566 Jones     4 7839 02-APR-1981 2975.0          20 01-JAN-1990 31-MAR-1991
    32  7566 Jones     4 7839 02-APR-1981 3272.5          20 01-APR-1991 31-DEC-9999

SQL> SELECT * FROM empv WHERE mgr = 7788 ORDER BY valid_from;

EMPVID EMPNO ENAME JOBNO  MGR HIREDATE       SAL COMM DEPTNO VALID_FROM  VALID_TO
------ ----- ----- ----- ---- ----------- ------ ---- ------ ----------- -----------
    11  7876 ADAMS     1 7788 23-MAY-1987 1100.0          20 23-MAY-1987 31-DEC-1989
    25  7876 Adams     1 7788 23-MAY-1987 1100.0          20 01-JAN-1990 31-MAR-1991
    39  7876 Adams     1 7788 23-MAY-1987 1210.0          20 01-APR-1991 31-DEC-9999

From a business point of view Scott left the company on 31-JUL-1991 and came back on 01-JAN-1992 with a better salary. It’s important to notice, that Scott is Adams manager and Adams is therefore leaderless from 01-AUG-1991 until 31-DEC-1991. Additionally I fabricated a gap for JOBNO 5 on 21-JAN-1990.

You find the SQL script to create and populate the model here.

Gap-Aware Temporal Join

The following figure visualizes the expected result of the temporal join. The raw data intervals queried perviously are represented in blue and the join result in red. The yellow bars highlight the gaps in the source and result data set.

Here is the query and the join result for EMPNO = 7788. Please note that the column LOC from table DEPV is not queried, which will reduce the number of final result intervals from 7 to 6.

Gap-Aware Temporal Join
SELECT e.empno,
       g.valid_from,
       LEAST(
          e.valid_to, 
          d.valid_to, 
          j.valid_to, 
          NVL(m.valid_to, e.valid_to),
          LEAD(g.valid_from - 1, 1, e.valid_to) OVER(
             PARTITION BY e.empno ORDER BY g.valid_from
          )
       ) AS valid_to,
       e.ename,
       j.job,
       e.mgr,
       m.ename AS mgr_ename,
       e.hiredate,
       e.sal,
       e.comm,
       e.deptno,
       d.dname
  FROM empv e
 INNER JOIN (SELECT valid_from FROM empv
             UNION
             SELECT valid_from FROM deptv
             UNION
             SELECT valid_from FROM jobv
             UNION
             SELECT valid_to + 1 FROM empv 
              WHERE valid_to != DATE '9999-12-31'
             UNION
             SELECT valid_to + 1 FROM deptv 
              WHERE valid_to != DATE '9999-12-31'
             UNION
             SELECT valid_to + 1 FROM jobv 
              WHERE valid_to != DATE '9999-12-31') g
    ON g.valid_from BETWEEN e.valid_from AND e.valid_to
 INNER JOIN deptv d
    ON d.deptno = e.deptno
       AND g.valid_from BETWEEN d.valid_from AND d.valid_to
 INNER JOIN jobv j
    ON j.jobno = e.jobno
       AND g.valid_from BETWEEN j.valid_from AND j.valid_to
  LEFT JOIN empv m
    ON m.empno = e.mgr
       AND g.valid_from BETWEEN m.valid_from AND m.valid_to
WHERE e.empno = 7788
ORDER BY 1, 2;
Query Result for EMPNO = 7788
EMPNO VALID_FROM  VALID_TO    ENAME JOB     MGR MGR_E HIREDATE       SAL COMM DEPTNO DNAME
----- ----------- ----------- ----- ------- ---- ----- ----------- ------ ---- ------ --------
 7788 19-APR-1987 22-MAY-1987 SCOTT ANALYST 7566 JONES 19-APR-1987 3000.0          20 RESEARCH
 7788 23-MAY-1987 31-DEC-1989 SCOTT ANALYST 7566 JONES 19-APR-1987 3000.0          20 RESEARCH
 7788 01-JAN-1990 20-JAN-1990 Scott ANALYST 7566 Jones 19-APR-1987 3000.0          20 RESEARCH
 7788 22-JAN-1990 28-FEB-1990 Scott Analyst 7566 Jones 19-APR-1987 3000.0          20 RESEARCH
 7788 01-MAR-1990 31-MAR-1990 Scott Analyst 7566 Jones 19-APR-1987 3000.0          20 Research
 7788 01-APR-1990 31-MAR-1991 Scott Analyst 7566 Jones 19-APR-1987 3000.0          20 Research
 7788 01-APR-1991 31-JUL-1991 Scott Analyst 7566 Jones 19-APR-1987 3300.0          20 Research
 7788 01-JAN-1992 31-DEC-9999 Scott Analyst 7566 Jones 01-JAN-1992 3500.0          20 Research

The highlighted inline view g produces a list of all distinct VALID_FROM values which will be used as additional join criterion for all temporal tables. Unlike in Joining Temporal Intervals all interval endpoints need also be considered to identify gaps.

The calculation of the VALID_TO column is a bit laborious (see highlighted lines 3 to 11). You need to get the lowest value for VALID_TO of all involved intervals, including outer joined intervals like m. Also, the subsequent VALID_FROM has to be considered since the inline view g is providing all VALID_FROM values to be probed and they may be completely independent of the involved intervals.

The remaining part of the query is quite simple. I’ve highlighted the rows in the result set which should be merged in a subsequent step (see lines 4 and 8).

If you change line 46 to “WHERE e.mgr = 7788” you get the following result:

Query Result for MGR = 7788
EMPNO VALID_FROM  VALID_TO    ENAME JOB      MGR MGR_E HIREDATE       SAL COMM DEPTNO DNAME
----- ----------- ----------- ----- ------- ---- ----- ----------- ------ ---- ------ --------
 7876 23-MAY-1987 31-DEC-1989 ADAMS CLERK   7788 SCOTT 23-MAY-1987 1100.0          20 RESEARCH
 7876 01-JAN-1990 20-JAN-1990 Adams CLERK   7788 Scott 23-MAY-1987 1100.0          20 RESEARCH
 7876 21-JAN-1990 21-JAN-1990 Adams Clerk   7788 Scott 23-MAY-1987 1100.0          20 RESEARCH
 7876 22-JAN-1990 28-FEB-1990 Adams Clerk   7788 Scott 23-MAY-1987 1100.0          20 RESEARCH
 7876 01-MAR-1990 31-MAR-1990 Adams Clerk   7788 Scott 23-MAY-1987 1100.0          20 Research
 7876 01-APR-1990 31-MAR-1991 Adams Clerk   7788 Scott 23-MAY-1987 1100.0          20 Research
 7876 01-APR-1991 31-JUL-1991 Adams Clerk   7788 Scott 23-MAY-1987 1210.0          20 Research
 7876 01-AUG-1991 31-DEC-1991 Adams Clerk   7788       23-MAY-1987 1210.0          20 Research
 7876 01-JAN-1992 31-DEC-9999 Adams Clerk   7788 Scott 23-MAY-1987 1210.0          20 Research

This result is interesting for several reasons. First, there are two highlighted records which should be merged in a subsequent step. Second, line 10 represents the time when Scott was not employed by this company. Third, the records in lines 9 and 11 must not be merged in a subsequent step. They are identical (beside VALID_FROM and VALID_TO) but the intervals are not connected.

Merging Temporal Intervals

If you look at the result of the previous query you might be tempted to avoid the merging step since there are just a few intervals which need merging. However, in real-life scenarios, you might easily end up with daily intervals for large tables since the inline-view g considers all valid_from and valid_to columns of all involved tables. Sooner or later you will think about merging temporal intervals or about other solutions to reduce the result set. – If you’re sceptical, then querying for “e.empno = 7369” might give you an idea of what I’m talking about (21 intervals before merge, 6 intervals after merge).

Since I covered this topic in Merging Temporal Intervals with Gaps I’ll provide the query to produce the final result and explain the specialities only.

Joining and Merging Temporal Intervals
WITH 
   joined AS (
      -- gap-aware temporal join
      -- produces result_cols to calculate new_group in the subsequent query
      SELECT e.empno,
             g.valid_from,
             LEAST(
                e.valid_to, 
                d.valid_to, 
                j.valid_to, 
                NVL(m.valid_to, e.valid_to),
                LEAD(g.valid_from - 1, 1, e.valid_to) OVER(
                   PARTITION BY e.empno ORDER BY g.valid_from
                )
             ) AS valid_to,
             (
                e.ename 
                || ',' || j.job 
                || ',' || e.mgr 
                || ',' || m.ename 
                || ',' || TO_CHAR(e.hiredate,'YYYY-MM-DD') 
                || ',' || e.sal 
                || ',' || e.comm 
                || ',' || e.deptno 
                || ',' || d.dname 
             ) AS result_cols,
             e.ename,
             j.job,
             e.mgr,
             m.ename AS mgr_ename,
             e.hiredate,
             e.sal,
             e.comm,
             e.deptno,
             d.dname             
        FROM empv e
       INNER JOIN (SELECT valid_from FROM empv
                   UNION
                   SELECT valid_from FROM deptv
                   UNION
                   SELECT valid_from FROM jobv
                   UNION
                   SELECT valid_to + 1 FROM empv 
                    WHERE valid_to != DATE '9999-12-31'
                   UNION
                   SELECT valid_to + 1 FROM deptv 
                    WHERE valid_to != DATE '9999-12-31'
                   UNION
                   SELECT valid_to + 1 FROM jobv 
                    WHERE valid_to != DATE '9999-12-31') g
          ON g.valid_from BETWEEN e.valid_from AND e.valid_to
       INNER JOIN deptv d
          ON d.deptno = e.deptno
             AND g.valid_from BETWEEN d.valid_from AND d.valid_to
       INNER JOIN jobv j
          ON j.jobno = e.jobno
             AND g.valid_from BETWEEN j.valid_from AND j.valid_to
        LEFT JOIN empv m
          ON m.empno = e.mgr
             AND g.valid_from BETWEEN m.valid_from AND m.valid_to
   ),
   calc_various AS (
      -- produces columns has_gap, new_group
      SELECT empno,
             valid_from,
             valid_to,
             result_cols,
             ename,
             job,
             mgr,
             mgr_ename,
             hiredate,
             sal,
             comm,
             deptno,
             dname,
             CASE
                WHEN LAG(valid_to, 1, valid_from - 1) OVER(
                        PARTITION BY empno ORDER BY valid_from
                     ) = valid_from - 1 THEN
                   0
                ELSE
                   1
             END AS has_gap,
             CASE 
                WHEN LAG(result_cols, 1, result_cols) OVER (
                        PARTITION BY empno ORDER BY valid_from
                     ) = result_cols THEN
                   0
                ELSE
                   1
             END AS new_group
        FROM joined
   ),
   calc_group AS (
      -- produces column group_no
      SELECT empno,
             valid_from,
             valid_to,
             ename,
             job,
             mgr,
             mgr_ename,
             hiredate,
             sal,
             comm,
             deptno,
             dname,
              SUM(has_gap + new_group) OVER(
                PARTITION BY empno ORDER BY valid_from
             ) AS group_no
        FROM calc_various
   ),
   merged AS (
      -- produces the final merged result
      -- grouping by group_no ensures that gaps are honored
      SELECT empno,
             MIN(valid_from) AS valid_from,
             MAX(valid_to) AS valid_to,
             ename,
             job,
             mgr,
             mgr_ename,
             hiredate,
             sal,
             comm,
             deptno,
             dname
        FROM calc_group
       GROUP BY empno,
                group_no,
                ename,
                job,
                mgr,
                mgr_ename,
                hiredate,
                sal,
                comm,
                deptno,
                dname
       ORDER BY empno,
                valid_from
   )   
-- main
select * FROM merged WHERE empno = 7788;
Query Result for EMPNO = 7788
EMPNO VALID_FROM  VALID_TO    ENAME JOB      MGR MGR_E HIREDATE       SAL COMM DEPTNO DNAME
----- ----------- ----------- ----- ------- ---- ----- ----------- ------ ---- ------ --------
 7788 19-APR-1987 31-DEC-1989 SCOTT ANALYST 7566 JONES 19-APR-1987 3000.0          20 RESEARCH
 7788 01-JAN-1990 20-JAN-1990 Scott ANALYST 7566 Jones 19-APR-1987 3000.0          20 RESEARCH
 7788 22-JAN-1990 28-FEB-1990 Scott Analyst 7566 Jones 19-APR-1987 3000.0          20 RESEARCH
 7788 01-MAR-1990 31-MAR-1991 Scott Analyst 7566 Jones 19-APR-1987 3000.0          20 Research
 7788 01-APR-1991 31-JUL-1991 Scott Analyst 7566 Jones 19-APR-1987 3300.0          20 Research
 7788 01-JAN-1992 31-DEC-9999 Scott Analyst 7566 Jones 01-JAN-1992 3500.0          20 Research

The named query “joined” produces an additional column RESULT_COLS (see highlighted lines 16-26). It’s simply a concatenation of all columns used in the group by clause of the named query “merged”. RESULT_COLS is used in the named query “calc_various” (see highlighted lines 86-88) to calculate the column NEW_GROUP. NEW_GROUP is set to 1 if the value of RESULT_COLS is different for the current and previous rows. NEW_GROUP ensures that Adam’s intervals valid from 01-APR-1991 and valid from 01-JAN-1992 are not merged. See lines 7 and 9 in the following listing.

Query Result for MGR = 7788
EMPNO VALID_FROM  VALID_TO    ENAME JOB      MGR MGR_E HIREDATE       SAL COMM DEPTNO DNAME
----- ----------- ----------- ----- ------- ---- ----- ----------- ------ ---- ------ --------
 7876 23-MAY-1987 31-DEC-1989 ADAMS CLERK   7788 SCOTT 23-MAY-1987 1100.0          20 RESEARCH
 7876 01-JAN-1990 20-JAN-1990 Adams CLERK   7788 Scott 23-MAY-1987 1100.0          20 RESEARCH
 7876 21-JAN-1990 28-FEB-1990 Adams Clerk   7788 Scott 23-MAY-1987 1100.0          20 RESEARCH
 7876 01-MAR-1990 31-MAR-1991 Adams Clerk   7788 Scott 23-MAY-1987 1100.0          20 Research
 7876 01-APR-1991 31-JUL-1991 Adams Clerk   7788 Scott 23-MAY-1987 1210.0          20 Research
 7876 01-AUG-1991 31-DEC-1991 Adams Clerk   7788       23-MAY-1987 1210.0          20 Research
 7876 01-JAN-1992 31-DEC-9999 Adams Clerk   7788 Scott 23-MAY-1987 1210.0          20 Research

The ORA_HASH function over RESULT_COLS could give me a shorter representation of all result columns. But since I do not detect hash collisions and this would lead to wrong results in some rare data constellations, I decided not to use a hash function.

The named queries “calc_various”, “calc_group” and “merged” are based on the query in scenario A of Merging Temporal Intervals with Gaps. The columns HAS_GAP, NEW_GROUP and GROUP_NO are explained in this post.

Conclusion

Joining and merging temporal intervals is indeed very challenging. Even if I showed in this post that it is doable I recommend choosing a simpler solution whenever feasible. E.g. limiting the query to a certain point in time for all involved temporal tables, since this eliminates the need to merge temporal intervals and even simplifies the gap-aware temporal join.

3 Comments

  1. […] results if “disconnected” intervals have the same content. Issues are addressed in part 2 of this […]

  2. […] time ago Philipp Salvisberg has posted several thoughts about joining and merging temporal intervals. Recently I was looking for some examples of using the new MATCH_RECOGNIZE clause introduced in 12c […]

  3. […] einiger Zeit hat mein Kollege Philipp Salvisberg ein paar interessante Beiträge zum Thema Joinen und Mergen der Zeitintervalle gepostet. Neulich war ich auf der Suche nach Anwendungsbeispielen für die neue MATCH_RECOGNIZE […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.