Aggregation Using Expression Transformation

Version 7

    Purpose

    Aggregating Records using an Expression Transformation

    Usage

    As a potential method of enhancing performance, this mapping uses PowerCenter Expression Transformation objects to aggregate data rather than using an Aggregator Transformation object.

     

    Overview

    One method of calculating aggregations without using an Aggregator Transformation object is to use an Expression Transformation in conjunction with variable logic to perform the calculation. An Expression Transformation does not cache records in memory, and therefore it can sometimes be more efficient than using an Aggregator. This mapping template illustrates a method of using an Expression object with variables to perform aggregations.

     

    aggregation using expression transformation.png

    Challenge Addressed

    When you must aggregate records (sum, count, find the min or max values within a data set, etc.) within your mapping, PowerCenter provides for the use of an Aggregator Transformation object to do the aggregation. Using an Aggregator object, however, can sometimes use up significant server resources in order to actually perform the aggregations and calculations. This is due partly to the fact that Aggregators must first cache records in memory and then perform functions and calculations.

    One method of calculating aggregations without using an Aggregator Transformation object is to use an Expression Transformation in conjunction with variable logic to perform the calculation. An Expression Transformation does not cache records in memory, and therefore it can sometimes be more efficient than using an Aggregator. This mapping template illustrates a method of using an Expression object with variables to perform aggregations.

     

    Pros

     

    • Does not cache records in memory thereby making the Expression object sometimes more efficient than an Aggregator.

    Cons

    • Does not allow for full Aggregator functionality and flexibility.
      • Requires advanced development abilities to implement.

     

    Download

     

    • PowerCenter 9.6.x Mapping XML (See Attachments section below)
    • PowerCenter 9.1.0 Mapping XML (See Attachments section below)
    • PowerCenter 8.6.1 Mapping XML (See Attachments section below)

    Implementation Guidelines

     

    Thiis template demonstrates reading from two source tables (ORDERS, ORDER_ITEMS), and calculates the sum of the total revenue (Price * Quantity) based off of the month and year in the DATE_ENTERED column. Therefore, we are trying to sum the revenue while grouping off of the month and year.

    • Select data from the source making sure to add an "order by" to the SQL Override on the column(s) that you need to group by. THE ORDER BY IS REQUIRED! Without the order by statement, the expression will not be able to properly sum the totals. *Note that the order by can either be done in the SQL Override of the Source Qualifier, or it can be done with the Sorter Transformation object. This mapping accomplishes this by setting the ‘Number of Sorted Ports’ property to 1, moving the DATE_ENTERED port to the top of the list.
    • For the mapping to work, you must be able to filter out every record except for the last record in each grouping (with that record holding the summation of total revenue). Assuming your data is ordered (see step 1), then you can determine the last record in each grouping by evaluating conditions in transformation variables. Condition 1: If the current record starts a new Month/Year grouping but is not the first record read from the source, then the previous record was the last record in the grouping. Condition 2: The current record is the last record read from the source. We will evaluate Condition 1 at a later time, but in order to evaluate Condition 2 (knowing if your current record is the last record from the source), you must know how many total records you are going to read, and you also must keep a record count of each record coming into the mapping. When your record count equals the total number of records from the source, then you know that the current record is the last record. The object exp_Get_ORDER_ITEMS_Record_Count accomplishes this task. First, a mapping variable $$Record_Count is created. Then, the expression will call the unconnected lookup if the current record is the first record in the mapping (we will know that because the $$Record_Count is initialized to ‘0’). We will send a dummy value of ‘0’ to the lookup. The lookup SQL has been modified to the following:

      SELECT COUNT(*) as ORDER_ID

      FROM ORDER_ITEMS, ORDERS

      WHERE

      ORDERS.ORDER_ID=ORDER_ITEMS.ORDER_ID

      The lookup adds the condition of 'where ORDER_ID > DUMMIE_IN'. Since we send in '0' for the dummy value, the full SQL executed internally for the lookup is as follows:

      SELECT COUNT(*) as ORDER_ID

      FROM ORDER_ITEMS, ORDERS

      WHERE ORDERS.ORDER_ID=ORDER_ITEMS.ORDER_ID

      AND ORDER_ID > 0

      ORDER BY ORDER_ID

      The count is returned to the expression and then stored in the v_SRC_TABLE_COUNT port. Next, the v_RECORD_COUNT variable simply increments the $$Record_Count mapping variable for every record that passes through the mapping. We now have the data we need to determine if the current record is the last record in the mapping. Also, we have the data we need to determine if the current record is the first record in the mapping, which is used later. Finally, two flags are set with the o_FIRST_RECORD_FLAG and the o_LAST_RECORD_FLAG. These flags are raised (i.e. set to 1) if the current record is the first record or the last record in the mapping.
    • The object exp_Sum_Revenue is where the aggregation activities take place. This object must be able to:
      • Calculate the total revenue of the current record by multiplying price*quantity
      • Parse out the month and year from the DATE_ENTERED field
      • Remember the running total revenue up to the previous record
      • Remember the month and year of the previous record
      • Determine if the current record starts a new aggregation grouping
      • Add the running total revenue from the previous record with the revenue for the current record as long as the current record does not start a new aggregate grouping
      • If the current record is the last record read from the source (based on the o_LAST_RECORD_FLAG set in the previous transformation), output the month, year, and running revenue of the current record, otherwise output the month, year, and total revenue up to the previous record
        • In order to accomplish part a) above, the v_REVENUE port was created to multiply the price*quantity for the current record. Two variables were created to accomplish part b): v_MONTH and v_YEAR.

        • Part c) is done by creating two variable ports, ensuring that they are ordered correctly in the transformation. The v_PREVIOUS_TOTAL port must be first, and is set to evaluate to the value in the second port, v_RUNNING_TOTAL. It is important that v_RUNNING_TOTAL is at the end of the transformation, and v_PREVIOUS_TOTAL is at the beginning. When a record passes through this transformation, v_RUNNING_TOTAL is the last thing to be set, and it is set to add the revenue of the current record with itself if the record is a part of the current aggregate grouping. If the record is not a part of the current aggregate grouping, then it will simply evaluate to the revenue of the current record. v_RUNNING_TOTAL will remember how it evaluated in the previous record so when the next record comes in, it retains the previous value. Before that value is changed, the v_PREVIOUS_TOTAL variable stores whatever was in the v_RUNNING_TOTAL.

        • Similar to how we remembered the previous running total, we can accomplish d) with the same method. Two variable ports were created for the month and year ports. For example, before the v_MONTH port was evaluated to the current record, the v_PREVIOUS_MONTH port would grab the value in v_MONTH before it could be changed.

        • Now that we have the month and year from the previous record, we can easily determine if the current record starts a new aggregate grouping (part e). The v_NEW_GROUPING_FLAG is set to 1 if the month concatenated with the year of the previous record do not equal the month concatenated with the year of the current record.

        • To accomplish part f), the v_RUNNING_TOTAL will first see if the current record starts a new grouping by checking the v_NEW_GROUPING_FLAG. If it is a new group, then it simply evaluates to the v_REVENUE value which is price*quantity for the current record. If the record falls within the same grouping, it will add the current records price*quantity with the running total from the previous record.

        • Finally, to accomplish part g), the three output ports were created to output the month, year, and revenue. Each port will output the previous records’ month, year, and running total UNLESS the current record is the last record read from the source. The o_LAST_RECORD_FLAG which was evaluated in the previous transformation is used to make this determination.

    • A filter is used to only pass the last record of each group, or the last record read from the source. All of the information needed to make this determination has been calculated in earlier steps. The check should be if a record that starts a new grouping (but is not the first record read from the source), or if the record is the last record read from the source, then pass through. Remember that as long as the current record is not the last record read from the source, the current record sitting in the filter actually holds the month, year, and revenue from the previous record. This is the trick to only inserting the total amounts for each grouping. The statement used is as follows:   
      • ((o_NEW_GROUPING_FLAG) and (NOT o_FIRST_RECORD_FLAG)) or

      • (o_LAST_RECORD_FLAG)

    • Finally, a sequence generator is used to generate the primary key value for the target table.

     

    Note:

    • The Mappings were originally built using PowerCenter Designer. If you are using PowerCenter Express (PCX), all the mappings cannot be imported as PCX includes only the Informatica Developer tool and not PowerCenter Designer tool.  For example, Informatica Developer doesn’t support mappings that use “Sequence Generator” transformation.
    • The objects are based on “UTF-8” codepage. If you have changed you codepage, XML may need to be edited.