Aggregation using Sorted Input

Version 6

    Purpose

    Use the ‘Sorted Input’ option of an aggregator transformation object.

    Usage

    Improve the performance of mappings that perform aggregation by using the ‘Sorted Input’ option when the source data can be sorted according to the aggregator transformation ports that are checked in the group by column.

     

    Overview

    In a mapping that uses the ‘Sorted Input’ option, the Informatica server assumes that all data entering an aggregator transformation are already sorted into groups that correspond to the aggregator’s ‘Group By’ ports.  As a result, the server does not have to read the entire data source before performing calculations. As soon as the aggregator transformation detects a new unique group, it performs all of the calculations required for the current group, and then passes the current group’s record on to the next transformation. Selecting the ‘Sorted Input’ option often provides dramatic increases in aggregator transformation performance.

     

    aggregation using sorted input.png

     

    Challenge Addressed

     

    In a typical PowerCenter mapping that  performs aggregation (without the ‘Sorted Input’ option), the  Informatica server must read the entire data source before it begins  performing calculations, in order to ensure that it has received every  record that belongs to each unique group. While the server is optimized  to perform aggregate calculations, the time required to read the entire  data source can be a bottleneck in mappings that load very large amounts  of data.

     

    Pros

    • Improves performance of mappings that perform aggregation.
    • Can  be used with relational and file-based sources, as long as data can be  sorted according to the aggregator transformation’s ‘Group By’ port(s).
    • With relational sources, additional overhead is exerted on the database when ORDER BY clauses are used.

    Cons

    • With relational sources, additional overhead is exerted on the database when ORDER BY clauses are used.

    Download

    • PowerCenter 9.6.x Mapping XML (See Attachments section below)
    • PowerCenter 9.1.0 Mapping XML (See Attachments section below)
    • PowerCenter 8.6.1 Mapping XML (See Attachments section below)

     

    Implementation Guidelines

    In order for ‘Sorted Input’ option to work, you must be able to sort the data in your source by the aggregator transformation’s ‘Group By’ columns. The key concepts illustrated in this mapping template can be found in two transformation objects, the source qualifer transformation (sq_ORDER_ITEMS) and the aggregator transformation (agg_CALC_PROFIT_and_MARGIN):

    • sq_ORDER_ITEMS contains a ‘SQL Override’ statement that pulls data from the ORDER_ITEMS table.  The SELECT statement in this SQL override contains an ORDER BY clause that orders the source data on the ITEM_ID column.  In addition, on the ‘Properties’ tab of sq_ORDER_ITEMS, the ‘Number of Sorted Ports’ option is set to ‘1’. Configuring the source qualifier transformation this way prepares the data for ‘Sorted Input’ aggregation.
    • In agg_CALC_PROFIT_and_MARGIN, the ITEM_ID port is selected as the ‘Group By’ port in the transformation. (NOTE: The number of ‘Group By’ ports must correspond to the ‘Number of Sorted Ports’ indicated in the source qualifier transformation object). In addition, on the ‘Properties’ tab of agg_CALC_PROFIT_and_MARGIN, the ‘Sorted Input’ option is selected.

    No other configuration is required in order for ‘Sorted Input’ functionality to work in this mapping. When a PowerCenter session is created for this mapping, it will automatically detect that the ‘Sorted Input’ option has been selected. Refer to the mapping and transformation/metadata descriptions in m_CDW_AGGREGATION_USING_SORTED_INPUT for more details on the functionality provided by this mapping template.

     

    Note:

    • The Mappings were originally built using PowerCenter Designer. If you are using PowerCenter Express (PCX), all the mappings cannot be imported as PCX includes only the Informatica Developer tool and not PowerCenter Designer tool.  For example, Informatica Developer doesn’t support mappings that use “Sequence Generator” transformation.
    • The objects are based on “UTF-8” codepage. If you have changed you codepage, XML may need to be edited.