Rank Transformation doesn't function as SQL Ran... Skip navigation

Rank Transformation doesn't function as SQL Rank Function

score 30
You have not voted. Active

Hi,

 

I had a query regarding the Rank transformation. For me, the Rank transformation should behave no different than the rank function in SQL.

 

Lately, working on a particular logic, I discovered that this isn’t the case.

 

I’ll illustrate this with the help of SCOTT schema EMP table in Oracle 10g database.

 

EMPNO

ENAME

JOB

MGR

HIREDATE

SAL

COMM

DEPTNO

7839

KING

PRESIDENT


17-Nov-81

5000


10

7782

CLARK

MANAGER

7839

9-Jun-81

2450


10

7934

MILLER

CLERK

7782

23-Jan-82

1300


10

7902

FORD

ANALYST

7566

3-Dec-81

3000


20

7788

SCOTT

ANALYST

7566

19-Apr-87

3000


20

7566

JONES

MANAGER

7839

2-Apr-81

2975


20

7569

CRAIG

MANAGER

7839

2-Apr-81

2975


20

7876

ADAMS

CLERK

7788

23-May-87

1100


20

7369

SMITH

CLERK

7902

17-Dec-80

800


20

7698

BLAKE

MANAGER

7839

1-May-81

2850


30

7499

ALLEN

SALESMAN

7698

20-Feb-81

1600

300

30

7844

TURNER

SALESMAN

7698

8-Sep-81

1500

0

30

7521

WARD

SALESMAN

7698

22-Feb-81

1250

500

30

7654

MARTIN

SALESMAN

7698

28-Sep-81

1250

1400

30

 

Now, in order to find the top 3 highest paid employees from each department, we can use the below SQL query.

 

select a.rank, a.empno, a.ename, a.deptno, a.sal from

(select empno empno, ename ename, deptno deptno, sal sal,

rank() over (partition by deptno order by sal desc) as rank

from emp) a

where a.rank < 4

order by a.deptno

 

The result is below:

 

RANK

EMPNO

ENAME

DEPTNO

SAL

1

7839

KING

10

5000

2

7782

CLARK

10

2450

3

7934

MILLER

10

1300

1

7788

SCOTT

20

3000

1

7902

FORD

20

3000

3

7569

CRAIG

20

2975

3

7566

JONES

20

2975

1

7698

BLAKE

30

2850

2

7499

ALLEN

30

1600

3

7844

TURNER

30

1500

 

 

This is as desired, since the rank function will return both the rows for EMPNOs – 7569 and 7566, even though both have same salary and work in the same department.

 

When I try this same logic using the Rank transformation, I get the result as below.

 

RANK

EMPNO

ENAME

DEPTNO

SAL

1

7839

KING

10

5000

2

7782

CLARK

10

2450

3

7934

MILLER

10

1300

1

7788

SCOTT

20

3000

1

7902

FORD

20

3000

3

7569

CRAIG

20

2975

1

7698

BLAKE

30

2850

2

7499

ALLEN

30

1600

3

7844

TURNER

30

1500

 

The row for EMPNO – 7566 isn’t passed out by the Rank transformation.

Below are the rank transformation properties:

 

 

Below are the Rank transformation ports:

 

Further, if I need to get the highest paid employee(s) from each department, I modify the SQL query as below:

 

select a.rank, a.empno, a.ename, a.deptno, a.sal from

(select empno empno, ename ename, deptno deptno, sal sal,

rank() over (partition by deptno order by sal desc) as rank

from emp) a

where a.rank = 1

order by a.deptno

I get the below results:

 

RANK

EMPNO

ENAME

DEPTNO

SAL

1

7839

KING

10

5000

1

7788

SCOTT

20

3000

1

7902

FORD

20

3000

1

7698

BLAKE

30

2850

 

Using Informatica Rank transformation, I get the below result:

 

RANK

EMPNO

ENAME

DEPTNO

SAL

1

7839

KING

10

5000

1

7788

SCOTT

20

3000

1

7698

BLAKE

30

2850

 

The row for EMPNO – 7902 gets missed out.

 

Hence, the SQL Rank function and Informatica Rank transformation function differently.

 

My guess is, that since the Number of Ranks = 1 as in the example above, so if DEPTNO = 20 already has a row for EMPNO = 7788 inserted into the rank cache, when a new incoming row for EMPNO = 7902 occurs, it cannot be inserted into the rank cache even though it he has the same SAL since the Number of Ranks = 1.

 

If the Number of Ranks = 3, the EMPNO – 7788 and 7902 get inserted into the rank cache, but now the same issue arises for EMPNO – 7566 as only EMPNO – 7569 gets inserted into the rank cache even though both have the same salaries.

 

I would like to know if this understanding is correct.

 

Is this a known limitation of the Rank transformation? If so, then we need to have it functioning as the RANK function in SQL does without skipping any rows.

 

Thanks & Regards,

Nelrick Rodrigues

Comments

Vote history