Retrieving the Top-N rows

The task of retrieving the top or bottom N rows from a database table (by salary, sales amount, credit, etc.) is often referred to as a "top-N query." This task is fairly common in application development. The most straightforward, but inefficient, way of accomplishing such a query is by retrieving all rows from the database table(s), sorting them by specified criteria, scanning from the top, and then selecting the top N rows. This is definitely not an elegant solution.

PROBLEM
Writing a procedure that retrieves and sorts all rows from a database table requires lots of code, and it probably won't execute as quickly as you'd expect

SOLUTION
Use a single SQL statement to perform a top-N query. You can do so either by using:
1 - The ROWNUM pseudocolumn available in several versions of Oracle
2 - By utilizing new analytic functions available in Oracle 8i: RANK() and DENSE_RANK().

In this example, I am going to use table EMP in the queries examples of this article. Here is the table's structure:

Name         Null?     Type
-----------  --------  ------------
EMPNO        NOT NULL  NUMBER(4)
ENAME                  VARCHAR2(10)
JOB                    VARCHAR2(9)
MGR                    NUMBER(4)
HIREDATE               DATE
SAL                    NUMBER(7,2)
COMM                   NUMBER(7,2)
DEPTNO                 NUMBER(2)

Let's start out by updating employee James (Empno = 7900) and setting his salary to NULL:
UPDATE Emp
   SET Sal = NULL
   WHERE Empno = 7900;
COMMIT;

Now we'll look at the data in Emp table with 14 rows:
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal  FROM Emp;

EMPNO  ENAME      JOB              MGR HIREDATE         SAL
------ ---------- --------- ---------- --------- ----------
  7369 SMITH      CLERK           7902 17-DEC-80        800
  7499 ALLEN      SALESMAN        7698 20-FEB-81       1600
  7521 WARD       SALESMAN        7698 22-FEB-81       1250
  7566 JONES      MANAGER         7839 02-APR-81       2975
  7654 MARTIN     SALESMAN        7698 28-SEP-81       1250
  7698 BLAKE      MANAGER         7839 01-MAY-81       2850
  7782 CLARK      MANAGER         7839 09-JUN-81       2450
  7788 SCOTT      ANALYST         7566 19-APR-87       3000
  7839 KING       PRESIDENT            17-NOV-81       5000
  7844 TURNER     SALESMAN        7698 08-SEP-81       1500
  7876 ADAMS      CLERK           7788 23-MAY-87       1100
  7900 JAMES      CLERK           7698 03-DEC-81
  7902 FORD       ANALYST         7566 03-DEC-81       3000
  7934 MILLER     CLERK           7782 23-JAN-82       1300
 

Using the ROWNUM Pseudocolumn
One way to solve this problem is by using the Oracle pseudocolumn ROWNUM. For each row returned by a query, the ROWNUM pseudocolumn returns a number indicating the order in which Oracle selects the row from a table or set of joined rows. The first row selected has a ROWNUM of 1, the second has 2, and so on. You can use the ROWNUM pseudocolumn to limit the number of rows returned by a query to 5:
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal
   FROM Emp
   WHERE ROWNUM < 6;

EMPNO  ENAME      JOB              MGR HIREDATE         SAL
------ ---------- --------- ---------- --------- ----------
  7369 SMITH      CLERK           7902 17-DEC-80        800
  7499 ALLEN      SALESMAN        7698 20-FEB-81       1600
  7521 WARD       SALESMAN        7698 22-FEB-81       1250
  7566 JONES      MANAGER         7839 02-APR-81       2975
  7654 MARTIN     SALESMAN        7698 28-SEP-81       1250
Listing 2. ROWNUM pseudocolumn limits the number of rows returned by the query to five.

If an ORDER BY clause follows ROWNUM in the same query, the rows will be reordered by the ORDER BY clause.
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal
   FROM Emp
   WHERE ROWNUM < 6
   ORDER BY Sal;

Listing 3 shows the output of the above code.
EMPNO ENAME      JOB              MGR HIREDATE         SAL
----- ---------- --------- ---------- --------- ----------
 7369 SMITH      CLERK           7902 17-DEC-80        800
 7521 WARD       SALESMAN        7698 22-FEB-81       1250
 7654 MARTIN     SALESMAN        7698 28-SEP-81       1250
 7499 ALLEN      SALESMAN        7698 20-FEB-81       1600
 7566 JONES      MANAGER         7839 02-APR-81       2975
 

Watch Out! Because the ROWNUM is assigned upon retrieval, it is assigned prior to any sorting! This is opposite to the result you would get in SQL Server using the TOP clause. In order to select employees with the highest five salaries, you have to force sorting and then apply ROWNUM condition. Here is the syntax for a top-N query where N = 5 (this syntax with the subquery works only in Oracle 8i):
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal
   FROM
      (SELECT Empno, Ename, Job, Mgr, Hiredate, Sal
           FROM Emp
           ORDER BY NVL(Sal, 0) DESC)
   WHERE ROWNUM < 6;

Listing 4 shows the output of the above code. I used the NVL() function to sort the expression because sorting just by Emp_Salary would have put all records with NULL salary before those with the highest salaries, and that's not what we wanted to achieve.

EMPNO ENAME      JOB              MGR HIREDATE         SAL
----- ---------- --------- ---------- --------- ----------
 7839 KING       PRESIDENT            17-NOV-81       5000
 7788 SCOTT      ANALYST         7566 19-APR-87       3000
 7902 FORD       ANALYST         7566 03-DEC-81       3000
 7566 JONES      MANAGER         7839 02-APR-81       2975
 7698 BLAKE      MANAGER         7839 01-MAY-81       2850
Listing 4. After forced sorting and applying the ROWNUM condition.
 

Utilizing Oracle 8i's Ranking Functions
Another way to perform a top-N query uses the new Oracle 8i feature called "analytic functions" . The SQL language, while extremely capable in many areas, has never provided strong support for analytic tasks, such as computing rankings, cumulative and moving averages, lead/lag comparisons, and reporting.
Oracle 8i now provides a new wide set of analytic functions that address this need.
For a top-N query you can use two ranking functions: RANK and DENSE_RANK. Both allow you to rank items in a group—for example, finding the top-five employees by salary.

The difference between RANK() and DENSE_RANK() is that RANK() leaves gaps in the ranking sequence when there are ties. In our case, Scott and Ford tie for second place with a $3,000 salary; Jones' $2,975 salary brings him in third place using DENSE_RANK() but only fourth place using RANK():
SELECT Empno, Ename, Job, Mgr, Sal,
                RANK() OVER
                (ORDER BY SAL Desc NULLS LAST) AS Rank,
                DENSE_RANK() OVER
                (ORDER BY SAL Desc NULLS LAST) AS Drank
    FROM Emp
   ORDER BY SAL Desc NULLS LAST;

Listing 5 shows the output of the above code.

EMPNO ENAME      JOB              MGR        SAL       RANK      DRANK
------ ---------- --------- ---------- ---------- ---------- ----------
  7839 KING       PRESIDENT                  5000          1          1
  7788 SCOTT      ANALYST         7566       3000          2          2
  7902 FORD       ANALYST         7566       3000          2          2
  7566 JONES      MANAGER         7839       2975          4          3
  7698 BLAKE      MANAGER         7839       2850          5          4
  7782 CLARK      MANAGER         7839       2450          6          5
  7499 ALLEN      SALESMAN        7698       1600          7          6
  7844 TURNER     SALESMAN        7698       1500          8          7
  7934 MILLER     CLERK           7782       1300          9          8
  7521 WARD       SALESMAN        7698       1250         10          9
  7654 MARTIN     SALESMAN        7698       1250         10          9
  7876 ADAMS      CLERK           7788       1100         12         10
  7369 SMITH      CLERK           7902        800         13         11
  7900 JAMES      CLERK           7698                    14         12
Listing 5. Compare the difference between RANK() and DENSE_RANK() in the last columns of this output.
 

The NULLS FIRST | NULLS LAST clause determines the position of rows with NULL values in the ordered query.

If the sequence is in descending order, then NULLS LAST implies that NULL values are smaller than non-NULL ones and rows with NULLs will appear at the bottom of the list. If the NULLS FIRST | NULLS LAST clause is omitted, then NULL values are considered larger than any other values and their ordering position depends on the ASC | DESC arguments.

If the ordering sequence is ascending (ASC), then rows with NULLs will appear last; if the sequence is descending (DESC), then rows with NULLs will appear first. NULLs are considered equal to other NULLs and, therefore, the order in which rows with NULLs are presented is nondeterministic.
 

Using RANK() to Obtain a Top-N Query
To obtain a top-N query, use RANK() in a subquery and then apply a filter condition outside the subquery:
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal
   FROM
   (SELECT Empno, Ename, Job, Mgr, Hiredate, Sal,
      RANK() OVER
         (ORDER BY SAL Desc NULLS LAST) AS Emp_Rank
      FROM Emp
      ORDER BY SAL Desc NULLS LAST)
   WHERE Emp_Rank < 6;

Listing 6 shows the output of the above code.

EMPNO  ENAME      JOB              MGR HIREDATE         SAL
------ ---------- --------- ---------- --------- ----------
  7839 KING       PRESIDENT            17-NOV-81       5000
  7788 SCOTT      ANALYST         7566 19-APR-87       3000
  7902 FORD       ANALYST         7566 03-DEC-81       3000
  7566 JONES      MANAGER         7839 02-APR-81       2975
  7698 BLAKE      MANAGER         7839 01-MAY-81       2850
Listing 6. Obtaining a top-N query by using RANK() in a subquery and then applying a filter condition outside the subquery.
 

Using the same technique, you can retrieve the bottom-five employees by salary:
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal
   FROM
   (SELECT Empno, Ename, Job, Mgr, Hiredate, Sal,
      RANK() OVER
         (ORDER BY SAL ASC NULLS FIRST) AS Emp_Rank
      FROM Emp
      ORDER BY SAL ASC NULLS FIRST)
   WHERE Emp_Rank < 6;

Listing 7 shows the output of the above code.
EMPNO ENAME      JOB              MGR HIREDATE         SAL
----- ---------- --------- ---------- --------- ----------
 7900 JAMES      CLERK           7698 03-DEC-81
 7369 SMITH      CLERK           7902 17-DEC-80        800
 7876 ADAMS      CLERK           7788 23-MAY-87       1100
 7521 WARD       SALESMAN        7698 22-FEB-81       1250
 7654 MARTIN     SALESMAN        7698 28-SEP-81       1250
Listing 7. Using the same technique shown in Listing 6 to retrieve the bottom-five employees by salary.

Ranking functions can be used to operate within groups, too—that is, the rank value gets reset whenever the group changes. This is achieved with a PARTION BY subclause. Here is the syntax to retrieve the top employee by salary per manager group:
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal
   FROM
   (SELECT Empno, Ename, Job, Mgr, Hiredate, Sal,
      RANK() OVER
         (PARTITION BY MGR ORDER BY MGR, SAL DESC NULLS LAST) AS Emp_Rank
      FROM Emp
      ORDER BY MGR, SAL DESC NULLS LAST)
   WHERE Emp_Rank = 1;

Listing 8 shows the output of the above code.
EMPNO  ENAME      JOB              MGR HIREDATE         SAL
------ ---------- --------- ---------- --------- ----------
  7788 SCOTT      ANALYST         7566 19-APR-87       3000
  7902 FORD       ANALYST         7566 03-DEC-81       3000
  7499 ALLEN      SALESMAN        7698 20-FEB-81       1600
  7934 MILLER     CLERK           7782 23-JAN-82       1300
  7876 ADAMS      CLERK           7788 23-MAY-87       1100
  7566 JONES      MANAGER         7839 02-APR-81       2975
  7369 SMITH      CLERK           7902 17-DEC-80        800
  7839 KING       PRESIDENT            17-NOV-81       5000
Listing 8. Using a ranking function within a group.

As you can see, analytic functions are extremely useful in all types of analysis and computations, and they provide substantial SQL extensions to Oracle 8i.