What's the difference between RANK()
and DENSE_RANK()
functions? How to find out nth salary in the following emptbl
table?
DEPTNO EMPNAME SAL
------------------------------
10 rrr 10000.00
11 nnn 20000.00
11 mmm 5000.00
12 kkk 30000.00
10 fff 40000.00
10 ddd 40000.00
10 bbb 50000.00
10 ccc 50000.00
If in the table data having nulls
, what will happen if I want to find out nth
salary?
RANK()
gives you the ranking within your ordered partition. Ties are assigned the same rank, with the next ranking(s) skipped. So, if you have 3 items at rank 2, the next rank listed would be ranked 5.
DENSE_RANK()
again gives you the ranking within your ordered partition, but the ranks are consecutive. No ranks are skipped if there are ranks with multiple items.
As for nulls, it depends on the ORDER BY
clause. Here is a simple test script you can play with to see what happens:
with q as (
select 10 deptno, 'rrr' empname, 10000.00 sal from dual union all
select 11, 'nnn', 20000.00 from dual union all
select 11, 'mmm', 5000.00 from dual union all
select 12, 'kkk', 30000 from dual union all
select 10, 'fff', 40000 from dual union all
select 10, 'ddd', 40000 from dual union all
select 10, 'bbb', 50000 from dual union all
select 10, 'xxx', null from dual union all
select 10, 'ccc', 50000 from dual)
select empname, deptno, sal
, rank() over (partition by deptno order by sal nulls first) r
, dense_rank() over (partition by deptno order by sal nulls first) dr1
, dense_rank() over (partition by deptno order by sal nulls last) dr2
from q;
EMP DEPTNO SAL R DR1 DR2
--- ---------- ---------- ---------- ---------- ----------
xxx 10 1 1 4
rrr 10 10000 2 2 1
fff 10 40000 3 3 2
ddd 10 40000 3 3 2
ccc 10 50000 5 4 3
bbb 10 50000 5 4 3
mmm 11 5000 1 1 1
nnn 11 20000 2 2 2
kkk 12 30000 1 1 1
9 rows selected.
Here's a link to a good explanation and some examples.
This article here nicely explains it. Essentially, you can look at it as such:
CREATE TABLE t AS
SELECT 'a' v FROM dual UNION ALL
SELECT 'a' FROM dual UNION ALL
SELECT 'a' FROM dual UNION ALL
SELECT 'b' FROM dual UNION ALL
SELECT 'c' FROM dual UNION ALL
SELECT 'c' FROM dual UNION ALL
SELECT 'd' FROM dual UNION ALL
SELECT 'e' FROM dual;
SELECT
v,
ROW_NUMBER() OVER (ORDER BY v) row_number,
RANK() OVER (ORDER BY v) rank,
DENSE_RANK() OVER (ORDER BY v) dense_rank
FROM t
ORDER BY v;
The above will yield:
+---+------------+------+------------+
| V | ROW_NUMBER | RANK | DENSE_RANK |
+---+------------+------+------------+
| a | 1 | 1 | 1 |
| a | 2 | 1 | 1 |
| a | 3 | 1 | 1 |
| b | 4 | 4 | 2 |
| c | 5 | 5 | 3 |
| c | 6 | 5 | 3 |
| d | 7 | 7 | 4 |
| e | 8 | 8 | 5 |
+---+------------+------+------------+
In words
ROW_NUMBER() attributes a unique value to each row
RANK() attributes the same row number to the same value, leaving "holes"
DENSE_RANK() attributes the same row number to the same value, leaving no "holes"
rank() : It is used to rank a record within a group of rows.
dense_rank() : The DENSE_RANK function acts like the RANK function except that it assigns consecutive ranks.
Query -
select
ENAME,SAL,RANK() over (order by SAL) RANK
from
EMP;
Output -
+--------+------+------+
| ENAME | SAL | RANK |
+--------+------+------+
| SMITH | 800 | 1 |
| JAMES | 950 | 2 |
| ADAMS | 1100 | 3 |
| MARTIN | 1250 | 4 |
| WARD | 1250 | 4 |
| TURNER | 1500 | 6 |
+--------+------+------+
Query -
select
ENAME,SAL,dense_rank() over (order by SAL) DEN_RANK
from
EMP;
Output -
+--------+------+-----------+
| ENAME | SAL | DEN_RANK |
+--------+------+-----------+
| SMITH | 800 | 1 |
| JAMES | 950 | 2 |
| ADAMS | 1100 | 3 |
| MARTIN | 1250 | 4 |
| WARD | 1250 | 4 |
| TURNER | 1500 | 5 |
+--------+------+-----------+
SELECT empno,
deptno,
sal,
RANK() OVER (PARTITION BY deptno ORDER BY sal) "rank"
FROM emp;
EMPNO DEPTNO SAL rank
---------- ---------- ---------- ----------
7934 10 1300 1
7782 10 2450 2
7839 10 5000 3
7369 20 800 1
7876 20 1100 2
7566 20 2975 3
7788 20 3000 4
7902 20 3000 4
7900 30 950 1
7654 30 1250 2
7521 30 1250 2
7844 30 1500 4
7499 30 1600 5
7698 30 2850 6
SELECT empno,
deptno,
sal,
DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal) "rank"
FROM emp;
EMPNO DEPTNO SAL rank
---------- ---------- ---------- ----------
7934 10 1300 1
7782 10 2450 2
7839 10 5000 3
7369 20 800 1
7876 20 1100 2
7566 20 2975 3
7788 20 3000 4
7902 20 3000 4
7900 30 950 1
7654 30 1250 2
7521 30 1250 2
7844 30 1500 3
7499 30 1600 4
7698 30 2850 5
select empno
,salary
,row_number() over(order by salary desc) as Serial
,Rank() over(order by salary desc) as rank
,dense_rank() over(order by salary desc) as denseRank
from emp ;
Row_number()
-> Used for generating serial number
Dense_rank()
will give continuous rank but Rank()
will skip rank in case of clash of rank.
The only difference between the RANK() and DENSE_RANK() functions is in cases where there is a “tie”; i.e., in cases where multiple values in a set have the same ranking. In such cases, RANK() will assign non-consecutive “ranks” to the values in the set (resulting in gaps between the integer ranking values when there is a tie), whereas DENSE_RANK() will assign consecutive ranks to the values in the set (so there will be no gaps between the integer ranking values in the case of a tie).
For example, consider the set {25, 25, 50, 75, 75, 100}. For such a set, RANK() will return {1, 1, 3, 4, 4, 6} (note that the values 2 and 5 are skipped), whereas DENSE_RANK() will return {1,1,2,3,3,4}.
Rank() SQL function generates rank of the data within ordered set of values but next rank after previous rank is row_number of that particular row. On the other hand, Dense_Rank() SQL function generates next number instead of generating row_number. Below is the SQL example which will clarify the concept:
Select ROW_NUMBER() over (order by Salary) as RowNum, Salary,
RANK() over (order by Salary) as Rnk,
DENSE_RANK() over (order by Salary) as DenseRnk from (
Select 1000 as Salary union all
Select 1000 as Salary union all
Select 1000 as Salary union all
Select 2000 as Salary union all
Select 3000 as Salary union all
Select 3000 as Salary union all
Select 8000 as Salary union all
Select 9000 as Salary) A
It will generate following output:
----------------------------
RowNum Salary Rnk DenseRnk
----------------------------
1 1000 1 1
2 1000 1 1
3 1000 1 1
4 2000 4 2
5 3000 5 3
6 3000 5 3
7 8000 7 4
8 9000 8 5
Rank(), Dense_rank(), row_number()
These all are window functions that means they act as window over some ordered input set at first. These windows have different functionality attached to it based on the requirement. Heres the above 3 :
row_number()
Starting by row_number()
as this forms the basis of these related window functions. row_number()
as the name suggests gives a unique number to the set of rows over which its been applied. Similar to giving a serial number to each row.
Rank()
A subversion of row_number()
can be said as rank()
. Rank() is used to give same serial number to those ordered set rows which are duplicates but it still keeps the count mantained as similar to a row_number()
for all those after duplicates rank() meaning as from below eg. For data 2 row_number() =rank() meaning both just differs in the form of duplicates.
Data row_number() rank() dense_rank()
1 1 1 1
1 2 1 1
1 3 1 1
2 4 4 2
Finally,
Dense_rank() is an extended version of rank() as the name suggests its dense because as you can see from the above example rank() = dense_rank() for all data 1 but just that for data 2 it differs in the form that it mantains the order of rank() from previous rank() not the actual data
Rank and Dense rank gives the rank in the partitioned dataset.
Rank() : It doesn't give you consecutive integer numbers.
Dense_rank() : It gives you consecutive integer numbers.
https://i.stack.imgur.com/fbv8Z.png
In above picture , the rank of 10008 zip is 2 by dense_rank() function and 24 by rank() function as it considers the row_number.
The only difference between the RANK() and DENSE_RANK() functions is in cases where there is a “tie”; i.e., in cases where multiple values in a set have the same ranking. In such cases, RANK() will assign non-consecutive “ranks” to the values in the set (resulting in gaps between the integer ranking values when there is a tie), whereas DENSE_RANK() will assign consecutive ranks to the values in the set (so there will be no gaps between the integer ranking values in the case of a tie).
For example, consider the set {30, 30, 50, 75, 75, 100}. For such a set, RANK() will return {1, 1, 3, 4, 4, 6} (note that the values 2 and 5 are skipped), whereas DENSE_RANK() will return {1,1,2,3,3,4}.
Success story sharing
values
clause.from dual
for generating this data in Redshift