From time to time I’m asked to aggregate strings from multiple records into a single column using SQL. Here’s an example, showing a comma separated list of ordered employee names per department based on the famous EMP and DEPT tables.
DEPTNO | DNAME | ENAME_LIST |
---|---|---|
10 | ACCOUNTING | CLARK, KING, MILLER |
20 | RESEARCH | ADAMS, FORD, JONES, SCOTT, SMITH |
30 | SALES | ALLEN, BLAKE, JAMES, MARTIN, TURNER, WARD |
40 | OPERATIONS |
Oracle introduced the aggregate function LISTAGG for that purpose in 11.2. If you may use LISTAGG go for it, but if you have to work with an older version of the Oracle Database Server you might be interested in some other options which I discuss per Oracle Database version.
I cover just some options in this post, if you are interested in more than please visit Tim Hall’s String Aggregation Techniques on oracle-base.com.
More than twenty years ago PL/SQL was introduced as part of the Oracle Database Server version 7.0 allowing to write functions to be used in SQL statements. Back than something like the following was necessary to build comma separated values:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
CREATE OR REPLACE FUNCTION deptno_to_ename_list(in_deptno IN VARCHAR2) RETURN VARCHAR2 IS CURSOR l_cur IS SELECT ename FROM emp WHERE deptno = in_deptno ORDER BY ename; l_ret VARCHAR2(2000); BEGIN FOR l_rec IN l_cur LOOP IF l_cur%ROWCOUNT > 1 THEN l_ret := l_ret || ', '; END IF; l_ret := l_ret || l_rec.ename; END LOOP; RETURN l_ret; END; / SELECT deptno, dname, deptno_to_ename_list(deptno) AS ename_list FROM dept ORDER BY deptno; |
Version 8.0 came with the object option allowing to solve the problem in more generic ways.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
CREATE OR REPLACE TYPE string_tabtype IS TABLE OF VARCHAR2(2000); CREATE OR REPLACE FUNCTION collection_to_comma_list( in_strings IN string_tabtype ) RETURN VARCHAR2 IS l_ret VARCHAR2(2000); BEGIN IF in_strings.COUNT > 0 THEN FOR i IN 1 .. in_strings.COUNT LOOP IF i > 1 THEN l_ret := l_ret || ', '; END IF; l_ret := l_ret || in_strings(i); END LOOP; END IF; RETURN l_ret; END; / SELECT d.deptno, d.dname, collection_to_comma_list( CAST( MULTISET( SELECT ename FROM emp e WHERE e.deptno = d.deptno ORDER BY ename ) AS string_tabtype ) ) AS ename_list FROM dept d ORDER BY d.deptno; |
Another option was to use a REF CURSOR instead of a collection type. The PL/SQL part was executable in Oracle7 too, but the CURSOR expression was not available back than. BTW: SYS_REFCURSOR was introduced in 9.0, so this specific PL/SQL type is really necessary with version 8.0.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
CREATE OR REPLACE PACKAGE mytypes_pkg IS TYPE refcursor_type IS REF CURSOR; END mytypes_pkg; / CREATE OR REPLACE FUNCTION cursor_to_comma_list( in_refcursor IN mytypes_pkg.refcursor_type ) RETURN VARCHAR2 IS l_string VARCHAR2(2000); l_ret VARCHAR2(2000); BEGIN LOOP FETCH in_refcursor INTO l_string; EXIT WHEN in_refcursor%NOTFOUND; IF in_refcursor%ROWCOUNT > 1 THEN l_ret := l_ret || ', '; END IF; l_ret := l_ret || l_string; END LOOP; CLOSE in_refcursor; RETURN l_ret; END; / SELECT d.deptno, d.dname, cursor_to_comma_list( CURSOR( SELECT ename FROM emp e WHERE e.deptno = d.deptno ORDER BY ename ) ) AS ename_list FROM dept d ORDER BY d.deptno; |
Version 9.0 came with basic XML support which allowed to aggregate strings without the need of a helper function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
SELECT d.deptno, d.dname, RTRIM( SYS_XMLAGG( SYS_XMLGEN( e.ename||', ' ) ).EXTRACT( '/ROWSET/ROW/text()' ).getStringVal(), ', ' ) AS ename_list FROM dept d LEFT JOIN emp e ON e.deptno = d.deptno GROUP BY d.deptno, d.dname ORDER BY d.deptno, d.dname; |
In this solution the sort order of the aggregated strings is not definable. Version 9i introduced also user-defined aggregate functions. To use them you need to implement the ODCIAggregate interface which allows you also to sort the result.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
CREATE OR REPLACE TYPE string_tabtype AS TABLE OF VARCHAR2(2000); CREATE OR REPLACE TYPE mylistagg_type AS OBJECT ( strings string_tabtype, STATIC FUNCTION ODCIAggregateInitialize( sctx IN OUT mylistagg_type ) RETURN NUMBER, MEMBER FUNCTION ODCIAggregateIterate( SELF IN OUT mylistagg_type, value IN VARCHAR2 ) RETURN NUMBER, MEMBER FUNCTION ODCIAggregateTerminate( SELF IN mylistagg_type, returnValue OUT VARCHAR2, flags IN NUMBER ) RETURN NUMBER, MEMBER FUNCTION ODCIAggregateMerge( SELF IN OUT mylistagg_type, ctx2 IN mylistagg_type) RETURN NUMBER ); CREATE OR REPLACE TYPE BODY mylistagg_type IS STATIC FUNCTION ODCIAggregateInitialize( sctx IN OUT mylistagg_type ) RETURN NUMBER IS BEGIN sctx := mylistagg_type(string_tabtype()); RETURN ODCIConst.Success; END; MEMBER FUNCTION ODCIAggregateIterate( SELF IN OUT mylistagg_type, value IN VARCHAR2 ) RETURN NUMBER IS BEGIN SELF.strings.EXTEND; SELF.strings(SELF.strings.COUNT) := VALUE; RETURN ODCIConst.Success; END ODCIAggregateIterate; MEMBER FUNCTION ODCIAggregateTerminate( SELF IN mylistagg_type, returnValue OUT VARCHAR2, flags IN NUMBER ) RETURN NUMBER IS l_sorted_strings string_tabtype; l_return_value VARCHAR2(2000); BEGIN SELECT COLUMN_VALUE BULK COLLECT INTO l_sorted_strings FROM TABLE(strings) ORDER BY COLUMN_VALUE; FOR i IN 1 .. l_sorted_strings.COUNT LOOP IF i > 1 THEN l_return_value := l_return_value || ', '; END IF; l_return_value := l_return_value || l_sorted_strings(i); END LOOP; returnValue := l_return_value; RETURN ODCIConst.Success; END ODCIAggregateTerminate; MEMBER FUNCTION ODCIAggregateMerge( SELF IN OUT mylistagg_type, ctx2 IN mylistagg_type ) RETURN NUMBER IS BEGIN FOR i IN 1 .. ctx2.strings.COUNT LOOP SELF.strings.EXTEND; SELF.strings(SELF.strings.COUNT) := ctx2.strings(i); END LOOP; RETURN ODCIConst.Success; END ODCIAggregateMerge; END; / CREATE OR REPLACE FUNCTION mylistagg ( in_string IN VARCHAR2 ) RETURN VARCHAR2 PARALLEL_ENABLE AGGREGATE USING mylistagg_type; / SELECT d.deptno, d.dname, mylistagg(e.ename) AS ename_list FROM dept d LEFT JOIN emp e ON e.deptno = d.deptno GROUP BY d.deptno, d.dname ORDER BY d.deptno, d.dname; |
Version 9i Release 2 came with SQL/XML support and the function XMLAGG which replaces SYS_XMLAGG and allows to sort the elements to be aggregated (see line 7).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
SELECT d.deptno, d.dname, RTRIM( XMLAGG( XMLELEMENT( "e", e.ename || ', ' ) ORDER BY e.ename ).EXTRACT( '/e/text()' ).getStringVal(), ', ' ) AS ename_list FROM dept d LEFT JOIN emp e ON e.deptno = d.deptno GROUP BY d.deptno, d.dname ORDER BY d.deptno, d.dname; |
One may argue that using RTRIM to get rid of the last comma is not the way to interact with XML, especially since SQL/XML supports XSLT. But probably no one can deny that writing the appropriate stylesheet is a bit more complex and time-consuming. Nonetheless, here’s a XSLT example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
SELECT d.deptno, d.dname, XMLTRANSFORM( XMLAGG( XMLELEMENT( "e", e.ename ) ORDER BY e.ename ), '<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text"/> <xsl:template match="/"> <xsl:for-each select="*"> <xsl:if test="position() != 1"> <xsl:value-of select="'', ''"/> </xsl:if> <xsl:value-of select="."/> </xsl:for-each> </xsl:template> </xsl:stylesheet>' ).getStringVal() AS ename_list FROM dept d LEFT JOIN emp e ON e.deptno = d.deptno GROUP BY d.deptno, d.dname ORDER BY d.deptno, d.dname; |
As mentioned at the beginning, in version 11g Release 2 Oracle finally introduced the aggregate function LISTAGG to conveniently aggregate strings.
1 2 3 4 5 6 7 8 9 10 11 |
SELECT d.deptno, d.dname, LISTAGG ( e.ename, ', ' ) WITHIN GROUP ( ORDER BY e.ename ) AS ename_list FROM dept d LEFT JOIN emp e ON e.deptno = d.deptno GROUP BY d.deptno, d.dname ORDER BY d.deptno, d.dname; |
To compare the runtime performance of the different solution approaches I created 1 million rows in the dept table and 5 million rows in the emp table using this script and measured the second serial execution of a CREATE TABLE AS SELECT statement for each of the 8 described approaches against my 11.2.0.3 instance. The following figure summarizes the results.
A lot of things have changed in the Oracle Database area since Oracle7, even in the niche of string aggregation. I recommend to use LISTAGG (8) whenever possible and avoid the use of SYS_XMLAGG (4) or XSLT (7) for string aggregation. The Collection Type (2) approach is a good alternative if you do not mind creating helper objects otherwise use XMLAGG (6).
2 Comments
Thanks, this made my day, not only did it solve the request I had, but also I loved trying all the different methods of achieving the same result.
If you are looking to aggregate data more then 4000 characters then see below link.
ListAgg_CLOB Custom Function
http://sql-plsql-de.blogspot.com.au/2014/01/sql-listagg-mit-clob-ausgabe-kein.html