## Thursday, July 9, 2015

### NTILE vs WIDTH_BUCKET and the Winner is...PIVOT

At a recent meeting of developers, a requirement was discussed where a finite number of columns are to be populated in alphabetical order - think the opposite of telephone book.

As is often the case, there are multiple ways to do this, but at it's heart, this really is a pure pivot. I think we should rule out both NTILE and WIDTH_BUCKET.

NTILE creates equiheight histograms, while WIDTH_BUCKET creates equiwidth histograms. We could probably force the use of either, but we would still need to pivot based on the quartile values.

NTILE, in reality, allows us to specify the number of rows not the number of columns across. For example,

SELECT ename , NTILE(5) OVER ( ORDER BY ename ) AS quartile FROM emp

 ENAME QUARTILE ADAMS 1 ALLEN 1 BLAKE 1 CLARK 2 FORD 2 JAMES 2 JONES 3 KING 3 MARTIN 3 MILLER 4 SCOTT 4 SMITH 4 TURNER 5 WARD 5

Modifying the query actually adds to the number of columns. For example,

SELECT ename , NTILE(3) OVER ( ORDER BY ename ) AS quartile FROM emp

 ENAME QUARTILE ADAMS 1 ALLEN 1 BLAKE 1 CLARK 1 FORD 1 JAMES 2 JONES 2 KING 2 MARTIN 2 MILLER 2 SCOTT 3 SMITH 3 TURNER 3 WARD 3

At first glance, WIDTH_BUCKET would seem to make better sense, because we want to specify a number of columns, but these columns are assigned based on numerical ranges, not strings.

So the most straightforward way to do this, I feel, is a pure PIVOT...something like this.

SELECT col_1 , col_2 , col_3
FROM  ( SELECT ename ,
TRUNC ( ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1 ) / 3 ) AS row_num ,
MOD ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1, 3 )    AS col_num
FROM emp ) PIVOT
( MAX(ename) FOR col_num IN ( 0 as col_1 , 1 as col_2 , 2 as col_3 )
)
ORDER BY row_num

 COL_1 COL_2 COL_3 ADAMS ALLEN BLAKE CLARK FORD JAMES JONES KING MARTIN MILLER SCOTT SMITH TURNER WARD

To display 4 columns instead of 3....

SELECT col_1 , col_2 , col_3 , col_4
FROM  ( SELECT ename ,
TRUNC ( ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1 ) / 4 ) AS row_num ,
MOD ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1, 4 )    AS col_num
FROM emp ) PIVOT
( MAX(ename) FOR col_num IN ( 0 as col_1 , 1 as col_2 , 2 as col_3 , 3 as col_4 )
)
ORDER BY row_num

 COL_1 COL_2 COL_3 COL_4 ADAMS ALLEN BLAKE CLARK FORD JAMES JONES KING MARTIN MILLER SCOTT SMITH TURNER WARD

To completely eliminate the PIVOT, one could use the MODEL clause, but that can get pretty tedious, pretty fast.

SELECT col_1, col_2, col_3
FROM   emp
MODEL
RETURN UPDATED ROWS
DIMENSION BY
( TRUNC ( ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1 ) / 3 ) AS row_num ,
MOD ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1, 3 )    AS col_num
)
MEASURES ( ename AS col_1, ename AS col_2, ename AS col_3 )
RULES UPSERT
 ...not this kind of MODEL
(
col_2[ANY,0] = col_1[CV(),1] ,
col_3[ANY,0] = col_1[CV(),2]
)
ORDER BY row_num

 COL_1 COL_2 COL_3 ADAMS ALLEN BLAKE CLARK FORD JAMES JONES KING MARTIN MILLER SCOTT SMITH TURNER WARD

...our journey continues

Kim Berg Hansen said...

I think you found a very nice and easy way, Monty :-)

The first question I would ask myself when given the task to split into columns alphabetically is, whether the alphabetic sorting should be "down first, then across" or "across first, then down". If you want "down first, then across", then NTILE would be the easy way:

select col_1, col_2, col_3
from (
select ename
, col_num
, row_number() over (partition by col_num order by ename) row_num
from (
select ename
, ntile(3) over (order by ename) col_num
from emp
)
)
pivot (
max(ename)
for col_num in (
1 as col_1
, 2 as col_2
, 3 as col_3
)
)
order by row_num
/

COL_1 COL_2 COL_3
---------- ---------- ----------
ALLEN JONES SMITH
BLAKE KING TURNER
CLARK MARTIN WARD
FORD MILLER

Split the data ordered alphabetically into three buckets of equal height. Then the buckets become your columns.

WIDTH_BUCKET making buckets of equal width could potentially (with some manipulation to make a numeric value) create columns, where col_1 was A-F, col_2 was G-L, and so on. Then col_1 might have 5 rows, col_2 only 3 rows, etc. It would be more tricky, but such a solution would be what WIDTH_BUCKET could be useful for.

If you want "across first, then down", then to use NTILE you would have to make NTILE create rows instead of columns, which would require you to first find out how many buckets you need. It can be done, but it's a kind of "hack":

select col_1, col_2, col_3
from (
select ename
, row_num
, row_number() over (partition by row_num order by ename) col_num
from (
select ename
, ntile(buckets) over (partition by buckets order by ename) row_num
from (
select ename
, ceil(count(*) over () / 3) buckets
from emp
)
)
)
pivot (
max(ename)
for col_num in (
1 as col_1
, 2 as col_2
, 3 as col_3
)
)
order by row_num
/

COL_1 COL_2 COL_3
---------- ---------- ----------
CLARK FORD JAMES
JONES KING MARTIN
MILLER SCOTT SMITH
TURNER WARD

If you are not using a literal constant for number of buckets in NTILE, it needs to be an expression that the database can be sure is constant within each partition. PARTITION BY buckets meet that requirement, and as I know buckets is a constant identical in all rows, I just get one big partition of all rows. It's kind of a hack, but doable.

But I wouldn't use NTILE that way, I'd use your own ROW_NUMBER / PIVOT solution. If I'm on an old version that doesn't support PIVOT, I could do an "oldfashioned" pivoting:

select max(case col_num when 0 then ename end) col_1
, max(case col_num when 1 then ename end) col_2
, max(case col_num when 2 then ename end) col_3
from (
select ename
, trunc ( ( row_number() over ( order by ename ) - 1 ) / 3 ) as row_num
, mod ( row_number() over ( order by ename ) - 1, 3 ) as col_num
from emp
)
group by row_num
order by row_num
/

COL_1 COL_2 COL_3
---------- ---------- ----------
CLARK FORD JAMES
JONES KING MARTIN
MILLER SCOTT SMITH
TURNER WARD

But the only reason for that would be old database version - PIVOT does the same thing ;-)

(Blogger doesn't allow me to put PRE tags in the comment, so sorry about the code looking bad...)

Monty Latiolais said...

Thanks, Kim for taking the time to explain the other options. You're [pre]awesome[/pre].

Martin Rose said...

Is using PIVOT or MODEL faster than NTILE/WIDTH_BUCKET though?