Rank Transformation doesn't function as SQL Ran... Skip navigation

# Rank Transformation doesn't function as SQL Rank Function

score 30
You have not voted. Active

Hi,

I had a query regarding the Rank transformation. For me, the Rank transformation should behave no different than the rank function in SQL.

Lately, working on a particular logic, I discovered that this isn’t the case.

I’ll illustrate this with the help of SCOTT schema EMP table in Oracle 10g database.

 EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO 7839 KING PRESIDENT 17-Nov-81 5000 10 7782 CLARK MANAGER 7839 9-Jun-81 2450 10 7934 MILLER CLERK 7782 23-Jan-82 1300 10 7902 FORD ANALYST 7566 3-Dec-81 3000 20 7788 SCOTT ANALYST 7566 19-Apr-87 3000 20 7566 JONES MANAGER 7839 2-Apr-81 2975 20 7569 CRAIG MANAGER 7839 2-Apr-81 2975 20 7876 ADAMS CLERK 7788 23-May-87 1100 20 7369 SMITH CLERK 7902 17-Dec-80 800 20 7698 BLAKE MANAGER 7839 1-May-81 2850 30 7499 ALLEN SALESMAN 7698 20-Feb-81 1600 300 30 7844 TURNER SALESMAN 7698 8-Sep-81 1500 0 30 7521 WARD SALESMAN 7698 22-Feb-81 1250 500 30 7654 MARTIN SALESMAN 7698 28-Sep-81 1250 1400 30

Now, in order to find the top 3 highest paid employees from each department, we can use the below SQL query.

select a.rank, a.empno, a.ename, a.deptno, a.sal from

(select empno empno, ename ename, deptno deptno, sal sal,

rank() over (partition by deptno order by sal desc) as rank

from emp) a

where a.rank < 4

order by a.deptno

The result is below:

 RANK EMPNO ENAME DEPTNO SAL 1 7839 KING 10 5000 2 7782 CLARK 10 2450 3 7934 MILLER 10 1300 1 7788 SCOTT 20 3000 1 7902 FORD 20 3000 3 7569 CRAIG 20 2975 3 7566 JONES 20 2975 1 7698 BLAKE 30 2850 2 7499 ALLEN 30 1600 3 7844 TURNER 30 1500

This is as desired, since the rank function will return both the rows for EMPNOs – 7569 and 7566, even though both have same salary and work in the same department.

When I try this same logic using the Rank transformation, I get the result as below.

 RANK EMPNO ENAME DEPTNO SAL 1 7839 KING 10 5000 2 7782 CLARK 10 2450 3 7934 MILLER 10 1300 1 7788 SCOTT 20 3000 1 7902 FORD 20 3000 3 7569 CRAIG 20 2975 1 7698 BLAKE 30 2850 2 7499 ALLEN 30 1600 3 7844 TURNER 30 1500

The row for EMPNO – 7566 isn’t passed out by the Rank transformation.

Below are the rank transformation properties:

Below are the Rank transformation ports:

Further, if I need to get the highest paid employee(s) from each department, I modify the SQL query as below:

select a.rank, a.empno, a.ename, a.deptno, a.sal from

(select empno empno, ename ename, deptno deptno, sal sal,

rank() over (partition by deptno order by sal desc) as rank

from emp) a

where a.rank = 1

order by a.deptno

I get the below results:

 RANK EMPNO ENAME DEPTNO SAL 1 7839 KING 10 5000 1 7788 SCOTT 20 3000 1 7902 FORD 20 3000 1 7698 BLAKE 30 2850

Using Informatica Rank transformation, I get the below result:

 RANK EMPNO ENAME DEPTNO SAL 1 7839 KING 10 5000 1 7788 SCOTT 20 3000 1 7698 BLAKE 30 2850

The row for EMPNO – 7902 gets missed out.

Hence, the SQL Rank function and Informatica Rank transformation function differently.

My guess is, that since the Number of Ranks = 1 as in the example above, so if DEPTNO = 20 already has a row for EMPNO = 7788 inserted into the rank cache, when a new incoming row for EMPNO = 7902 occurs, it cannot be inserted into the rank cache even though it he has the same SAL since the Number of Ranks = 1.

If the Number of Ranks = 3, the EMPNO – 7788 and 7902 get inserted into the rank cache, but now the same issue arises for EMPNO – 7566 as only EMPNO – 7569 gets inserted into the rank cache even though both have the same salaries.

I would like to know if this understanding is correct.

Is this a known limitation of the Rank transformation? If so, then we need to have it functioning as the RANK function in SQL does without skipping any rows.

Thanks & Regards,

Nelrick Rodrigues