1 of 1 people found this helpful
Please follow the approach mentioned below:
Source --> Source Qualifier --> Sequence generator --> Transaction control --> Aggregator --> Target
1. Read sorted data
2. Generate sequence number for each row using a Sequence Generator transformation.
3. Use a Transaction control transformation to commit after each 65000th row, with the following condition: IIF(NEXTVAL%65000=0,TC_COMMIT_AFTER)
4. Find min and max values using min and max functions respectively in aggregator transformation.
Important note: Set Transformation scope to Transaction and do not group by on any port, as one complete transaction would be considered as one group here.
5. Write min and max values to target.
Lekha G M
You can use a sequence generator to provide a row number for each row.
In an expression you can substract 1 from the rownumber, divide the row number (minus 1) by 65000, add 1 and then trunc the value to zero decimals.
This provides you the chunk the record belongs to and you can perform your aggregations on this value.
Please let us know if this either solved your issue or that we missed a requirement.
a simple exp & agg does the job here.
Need to configure a custom port in exp that can hold values like 1, 2, 3, etc., for each set. Probably, a iterative variable that can reset once it reaches 65k number. (This can be eliminated if the source has value that can be used to determine 65k number)
And in agg this custom port can be used as group to perform min & max for each group.
Basically your mapping would look like
logic for chunk
TRUNC(((ROW_NUMBER - 1) / 65000) + 1, 0)
Be aware you need to have your borders of the chunks managed correctly.
detail of the creation of CHUNKS
The above should do the trick.
Don't forget to check the Reset checkbox on the sequence generator (Properties tab)
This will ensure the sequence generator starts at 1 every time the session is started.
Thanks janLeendert ,
i will try and let you know.
Does it need to be grouped on chunk size ?
A small optimisation: the Sequence Generator is not really necessary here, you can achieve the same with one single EXP (instead of two). This EXP will use two variable ports of type Integer (or Bigint, that doesn't matter much); the first one counts from 1 thru 65,000 , the other one will be incremented by 1 for each new "chunk".
Then you can use two output ports to forward the "chunk number" followed by the "line number" (1-65,000) to the AGG. This will enable you to use the Sorted Input property of the AGG, which in turn will minimise cache usage and maximise processing speed at the same time. Just "Group By" the chunk number and don't care for the line number.
As mentioned above, the EXP will need to have two variable ports.
The first variable port, v_line_number, has this expression:
IIF( v_running_number = 65000, 0, v_line_number) + 1
This will make sure that for the first record v_line_number is set to 1, then to 2, 3, and so on until 65000, followed by 1, 2, 3, and so on.
The second variable port, v_chunk_number, has this expression:
IIF( v_line_number = 1, 1, 0) + v_group_number
This will make sure that the "chunk number" will be 1 for the first 65000 records, 2 for records 65001 thru 130000, and so on.
Thanks for the solution. I tried using seq generator and a filter to achieve this.
In seq gen i reset for every 65000 and filter the records 1 and 65000 .
1 of 1 people found this helpful
So what you needed was to get the first and the last record out of each chunk of 65000 records?
Well, that's a completely different thing from what we all understood.
Well, that happens. We should have read your question more thoroughly, then we might have been able to avoid this confusion on our side.
Thanks for the clarification,
this worked perfect!!