Thursday, July 9, 2015

NTILE vs WIDTH_BUCKET and the Winner is...PIVOT

At a recent meeting of developers, a requirement was discussed where a finite number of columns are to be populated in alphabetical order - think the opposite of telephone book.

As is often the case, there are multiple ways to do this, but at it's heart, this really is a pure pivot. I think we should rule out both NTILE and WIDTH_BUCKET.

NTILE creates equiheight histograms, while WIDTH_BUCKET creates equiwidth histograms. We could probably force the use of either, but we would still need to pivot based on the quartile values.

NTILE, in reality, allows us to specify the number of rows not the number of columns across. For example,

SELECT ename , NTILE(5) OVER ( ORDER BY ename ) AS quartile FROM emp

ENAMEQUARTILE
ADAMS1
ALLEN1
BLAKE1
CLARK2
FORD2
JAMES2
JONES3
KING3
MARTIN3
MILLER4
SCOTT4
SMITH4
TURNER5
WARD5

Modifying the query actually adds to the number of columns. For example,

SELECT ename , NTILE(3) OVER ( ORDER BY ename ) AS quartile FROM emp

ENAMEQUARTILE
ADAMS1
ALLEN1
BLAKE1
CLARK1
FORD1
JAMES2
JONES2
KING2
MARTIN2
MILLER2
SCOTT3
SMITH3
TURNER3
WARD3


At first glance, WIDTH_BUCKET would seem to make better sense, because we want to specify a number of columns, but these columns are assigned based on numerical ranges, not strings.


So the most straightforward way to do this, I feel, is a pure PIVOT...something like this.

SELECT col_1 , col_2 , col_3 
  FROM  ( SELECT ename ,
                 TRUNC ( ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1 ) / 3 ) AS row_num ,
                     MOD ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1, 3 )    AS col_num 
            FROM emp ) PIVOT
          ( MAX(ename) FOR col_num IN ( 0 as col_1 , 1 as col_2 , 2 as col_3 )
        )
ORDER BY row_num

COL_1COL_2COL_3
ADAMSALLENBLAKE
CLARKFORDJAMES
JONESKINGMARTIN
MILLERSCOTTSMITH
TURNERWARD

To display 4 columns instead of 3....

SELECT col_1 , col_2 , col_3 , col_4
  FROM  ( SELECT ename ,
                 TRUNC ( ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1 ) / 4 ) AS row_num ,
                     MOD ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1, 4 )    AS col_num 
            FROM emp ) PIVOT
          ( MAX(ename) FOR col_num IN ( 0 as col_1 , 1 as col_2 , 2 as col_3 , 3 as col_4 )
        )
ORDER BY row_num

COL_1COL_2COL_3COL_4
ADAMSALLENBLAKECLARK
FORDJAMESJONESKING
MARTINMILLERSCOTTSMITH
TURNERWARD



To completely eliminate the PIVOT, one could use the MODEL clause, but that can get pretty tedious, pretty fast.

SELECT col_1, col_2, col_3
  FROM   emp
 MODEL
  RETURN UPDATED ROWS
  DIMENSION BY
  ( TRUNC ( ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1 ) / 3 ) AS row_num ,
        MOD ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1, 3 )    AS col_num
  )
  MEASURES ( ename AS col_1, ename AS col_2, ename AS col_3 )
  RULES UPSERT
...not this kind of MODEL
  (
    col_2[ANY,0] = col_1[CV(),1] ,
    col_3[ANY,0] = col_1[CV(),2]
  )
ORDER BY row_num 


COL_1COL_2COL_3
ADAMSALLENBLAKE
CLARKFORDJAMES
JONESKINGMARTIN
MILLERSCOTTSMITH
TURNERWARD



...our journey continues

2 comments:

Kim Berg Hansen said...

I think you found a very nice and easy way, Monty :-)

The first question I would ask myself when given the task to split into columns alphabetically is, whether the alphabetic sorting should be "down first, then across" or "across first, then down". If you want "down first, then across", then NTILE would be the easy way:

select col_1, col_2, col_3
from (
select ename
, col_num
, row_number() over (partition by col_num order by ename) row_num
from (
select ename
, ntile(3) over (order by ename) col_num
from emp
)
)
pivot (
max(ename)
for col_num in (
1 as col_1
, 2 as col_2
, 3 as col_3
)
)
order by row_num
/

COL_1 COL_2 COL_3
---------- ---------- ----------
ADAMS JAMES SCOTT
ALLEN JONES SMITH
BLAKE KING TURNER
CLARK MARTIN WARD
FORD MILLER

Split the data ordered alphabetically into three buckets of equal height. Then the buckets become your columns.

WIDTH_BUCKET making buckets of equal width could potentially (with some manipulation to make a numeric value) create columns, where col_1 was A-F, col_2 was G-L, and so on. Then col_1 might have 5 rows, col_2 only 3 rows, etc. It would be more tricky, but such a solution would be what WIDTH_BUCKET could be useful for.

If you want "across first, then down", then to use NTILE you would have to make NTILE create rows instead of columns, which would require you to first find out how many buckets you need. It can be done, but it's a kind of "hack":

select col_1, col_2, col_3
from (
select ename
, row_num
, row_number() over (partition by row_num order by ename) col_num
from (
select ename
, ntile(buckets) over (partition by buckets order by ename) row_num
from (
select ename
, ceil(count(*) over () / 3) buckets
from emp
)
)
)
pivot (
max(ename)
for col_num in (
1 as col_1
, 2 as col_2
, 3 as col_3
)
)
order by row_num
/

COL_1 COL_2 COL_3
---------- ---------- ----------
ADAMS ALLEN BLAKE
CLARK FORD JAMES
JONES KING MARTIN
MILLER SCOTT SMITH
TURNER WARD

If you are not using a literal constant for number of buckets in NTILE, it needs to be an expression that the database can be sure is constant within each partition. PARTITION BY buckets meet that requirement, and as I know buckets is a constant identical in all rows, I just get one big partition of all rows. It's kind of a hack, but doable.

But I wouldn't use NTILE that way, I'd use your own ROW_NUMBER / PIVOT solution. If I'm on an old version that doesn't support PIVOT, I could do an "oldfashioned" pivoting:

select max(case col_num when 0 then ename end) col_1
, max(case col_num when 1 then ename end) col_2
, max(case col_num when 2 then ename end) col_3
from (
select ename
, trunc ( ( row_number() over ( order by ename ) - 1 ) / 3 ) as row_num
, mod ( row_number() over ( order by ename ) - 1, 3 ) as col_num
from emp
)
group by row_num
order by row_num
/

COL_1 COL_2 COL_3
---------- ---------- ----------
ADAMS ALLEN BLAKE
CLARK FORD JAMES
JONES KING MARTIN
MILLER SCOTT SMITH
TURNER WARD

But the only reason for that would be old database version - PIVOT does the same thing ;-)

(Blogger doesn't allow me to put PRE tags in the comment, so sorry about the code looking bad...)

Monty Latiolais said...

Thanks, Kim for taking the time to explain the other options. You're [pre]awesome[/pre].