NTILE vs WIDTH_BUCKET and the Winner is...PIVOT

Thursday, July 9, 2015

NTILE vs WIDTH_BUCKET and the Winner is...PIVOT

At a recent meeting of developers, a requirement was discussed where a finite number of columns are to be populated in alphabetical order - think the opposite of telephone book.

As is often the case, there are multiple ways to do this, but at it's heart, this really is a pure pivot. I think we should rule out both NTILE and WIDTH_BUCKET.

NTILE creates equiheight histograms, while WIDTH_BUCKET creates equiwidth histograms. We could probably force the use of either, but we would still need to pivot based on the quartile values.

NTILE, in reality, allows us to specify the number of rows not the number of columns across. For example,

SELECT ename , NTILE(5) OVER ( ORDER BY ename ) AS quartile FROM emp

ENAME	QUARTILE
ADAMS	1
ALLEN	1
BLAKE	1
CLARK	2
FORD	2
JAMES	2
JONES	3
KING	3
MARTIN	3
MILLER	4
SCOTT	4
SMITH	4
TURNER	5
WARD	5

Modifying the query actually adds to the number of columns. For example,

SELECT ename , NTILE(3) OVER ( ORDER BY ename ) AS quartile FROM emp

ENAME	QUARTILE
ADAMS	1
ALLEN	1
BLAKE	1
CLARK	1
FORD	1
JAMES	2
JONES	2
KING	2
MARTIN	2
MILLER	2
SCOTT	3
SMITH	3
TURNER	3
WARD	3

At first glance, WIDTH_BUCKET would seem to make better sense, because we want to specify a number of columns, but these columns are assigned based on numerical ranges, not strings.

So the most straightforward way to do this, I feel, is a pure PIVOT...something like this.

SELECT col_1 , col_2 , col_3

FROM ( SELECT ename ,

TRUNC ( ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1 ) / 3 ) AS row_num ,

MOD ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1, 3 ) AS col_num

FROM emp ) PIVOT

( MAX(ename) FOR col_num IN ( 0 as col_1 , 1 as col_2 , 2 as col_3 )

)

ORDER BY row_num

COL_1	COL_2	COL_3
ADAMS	ALLEN	BLAKE
CLARK	FORD	JAMES
JONES	KING	MARTIN
MILLER	SCOTT	SMITH
TURNER	WARD

To display 4 columns instead of 3....

SELECT col_1 , col_2 , col_3 , col_4

FROM ( SELECT ename ,

TRUNC ( ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1 ) / 4 ) AS row_num ,

MOD ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1, 4 ) AS col_num

FROM emp ) PIVOT

( MAX(ename) FOR col_num IN ( 0 as col_1 , 1 as col_2 , 2 as col_3 , 3 as col_4 )

)

ORDER BY row_num

COL_1	COL_2	COL_3	COL_4
ADAMS	ALLEN	BLAKE	CLARK
FORD	JAMES	JONES	KING
MARTIN	MILLER	SCOTT	SMITH
TURNER	WARD

To completely eliminate the PIVOT, one could use the MODEL clause, but that can get pretty tedious, pretty fast.

SELECT col_1, col_2, col_3

FROM emp

MODEL

RETURN UPDATED ROWS

DIMENSION BY

( TRUNC ( ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1 ) / 3 ) AS row_num ,

MOD ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1, 3 ) AS col_num

)

MEASURES ( ename AS col_1, ename AS col_2, ename AS col_3 )

RULES UPSERT

...not this kind of MODEL

(

col_2[ANY,0] = col_1[CV(),1] ,

col_3[ANY,0] = col_1[CV(),2]

)

ORDER BY row_num

COL_1	COL_2	COL_3
ADAMS	ALLEN	BLAKE
CLARK	FORD	JAMES
JONES	KING	MARTIN
MILLER	SCOTT	SMITH
TURNER	WARD

...our journey continues

4 comments:

Kim Berg Hansen said...: I think you found a very nice and easy way, Monty :-)

The first question I would ask myself when given the task to split into columns alphabetically is, whether the alphabetic sorting should be "down first, then across" or "across first, then down". If you want "down first, then across", then NTILE would be the easy way:

select col_1, col_2, col_3
from (
select ename
, col_num
, row_number() over (partition by col_num order by ename) row_num
from (
select ename
, ntile(3) over (order by ename) col_num
from emp
)
)
pivot (
max(ename)
for col_num in (
1 as col_1
, 2 as col_2
, 3 as col_3
)
)
order by row_num
/

COL_1 COL_2 COL_3
---------- ---------- ----------
ADAMS JAMES SCOTT
ALLEN JONES SMITH
BLAKE KING TURNER
CLARK MARTIN WARD
FORD MILLER

Split the data ordered alphabetically into three buckets of equal height. Then the buckets become your columns.

WIDTH_BUCKET making buckets of equal width could potentially (with some manipulation to make a numeric value) create columns, where col_1 was A-F, col_2 was G-L, and so on. Then col_1 might have 5 rows, col_2 only 3 rows, etc. It would be more tricky, but such a solution would be what WIDTH_BUCKET could be useful for.

If you want "across first, then down", then to use NTILE you would have to make NTILE create rows instead of columns, which would require you to first find out how many buckets you need. It can be done, but it's a kind of "hack":

select col_1, col_2, col_3
from (
select ename
, row_num
, row_number() over (partition by row_num order by ename) col_num
from (
select ename
, ntile(buckets) over (partition by buckets order by ename) row_num
from (
select ename
, ceil(count(*) over () / 3) buckets
from emp
)
)
)
pivot (
max(ename)
for col_num in (
1 as col_1
, 2 as col_2
, 3 as col_3
)
)
order by row_num
/

COL_1 COL_2 COL_3
---------- ---------- ----------
ADAMS ALLEN BLAKE
CLARK FORD JAMES
JONES KING MARTIN
MILLER SCOTT SMITH
TURNER WARD

If you are not using a literal constant for number of buckets in NTILE, it needs to be an expression that the database can be sure is constant within each partition. PARTITION BY buckets meet that requirement, and as I know buckets is a constant identical in all rows, I just get one big partition of all rows. It's kind of a hack, but doable.

But I wouldn't use NTILE that way, I'd use your own ROW_NUMBER / PIVOT solution. If I'm on an old version that doesn't support PIVOT, I could do an "oldfashioned" pivoting:

select max(case col_num when 0 then ename end) col_1
, max(case col_num when 1 then ename end) col_2
, max(case col_num when 2 then ename end) col_3
from (
select ename
, trunc ( ( row_number() over ( order by ename ) - 1 ) / 3 ) as row_num
, mod ( row_number() over ( order by ename ) - 1, 3 ) as col_num
from emp
)
group by row_num
order by row_num
/

COL_1 COL_2 COL_3
---------- ---------- ----------
ADAMS ALLEN BLAKE
CLARK FORD JAMES
JONES KING MARTIN
MILLER SCOTT SMITH
TURNER WARD

But the only reason for that would be old database version - PIVOT does the same thing ;-)

(Blogger doesn't allow me to put PRE tags in the comment, so sorry about the code looking bad...); July 10, 2015 at 12:17 AM
Monty Latiolais said...: Thanks, Kim for taking the time to explain the other options. You're [pre]awesome[/pre].; July 10, 2015 at 7:42 AM
Martin Rose said...: Is using PIVOT or MODEL faster than NTILE/WIDTH_BUCKET though?; July 19, 2019 at 4:47 AM
nive said...: thanks for giving great content..

Autocad Unigraphics NX cad centre in coimbatore 2021
Autocad Unigraphics NX training in coimbatore
Autocad Unigraphics NX acadamy in coimbatore 2021
Autocad Unigraphics NX institutes in coimbatore
Best Autocad Unigraphics NX coaching in coimbatore
Autocad Unigraphics NX courses in coimbatore
Autocad Unigraphics NX classes in coimbatore
Autocad Unigraphics NX classes in coimbatore
Cad centre in coimbatore
Best Autocad Unigraphics NX training institute in coimbatore; July 3, 2021 at 12:57 AM