Pipeline Partitioning Mapping Template

Version 6

    Purpose

    Pipeline Partitioning

    Usage

    Use the enhanced pipeline partitioning features in PowerCenter to improve the performance of an individual mapping/workflow.

    Overview

    The purpose of this template is to demonstrate how taking advantage of the pipeline partitioning functionality available in PowerCenter can increase overall session performance. This individual template presents an excellent opportunity to see how increasing the number of partitions and modifying the type of partitioning schemes used and the location of partition points within the mapping can be an effective tuning technique. The mapping represented in this template reads student report card information from three flat files of varying sizes. It processes records for Active students only, calculates their GPAs, and loads the data into a partitioned relational target.

     

    pipeline partition 1.png

     

    pipeline partition 2.png

     

     

    Challenge Addressed

    Performance tuning is an integral part of  the data integration process. It is essential to optimize the  performance of each individual piece of the application so that optimal  performance can be achieved. After the application and databases have  been tuned for maximum single partition performance, you may find that  the system is under-utilized. PowerCenter provides enhanced pipeline  partitioning capabilities that when appropriately configured, can  increase pipeline parallelism, resulting in greater throughput and a  significant performance improvement.

     

    Pros

    • Performance can be greatly enhanced by using Informatica’s partitioning.

    Cons

    • If the source and target databases are not fully optimized, then partitioning a mapping may not increase session performance.

     

    Download

    • PowerCenter 9.6.x Mapping XML (See Attachments section below)
    • PowerCenter 9.1.0 Mapping XML (See Attachments section below)
    • PowerCenter 8.6.1 Mapping XML (See Attachments section below)

     

    Implementation Guidelines

    In order to partition the pipeline to accommodate the three source files and read the data simultaneously, the number of partitions was increased from 1 to 3. By processing the data sets in parallel, true pipeline parallelism can be achieved.

     

    The implementation guidelines in this template focus on session settings for the mapping illustrated above. The following screenshots illustrate session property sheets from the PowerCenter Workflow Manager utility.

     

    pipeline partition 3.png

     

    Due to the varying size of the source files, the workloads will be unevenly distributed across the partitions. Setting a partition point at the Filter transformation and using round robin partitioning will balance the workloads going into the filter. Round Robin partitioning will force the Informatica Server to distribute the data evenly across the partitions.

     

    pipeline partition 4.png

     

    As the data enters both the Sorter and Aggregator transformations, there is a potential for overlapping aggregate groups. To alleviate this problem, hash partitioning was used at the Sorter transformation partition point to route the data appropriately. This partition type will group the data based on the designated key designated in the Sorter transformation, keeping all of the data for each student together in the same partition, thus optimizing the sort and aggregation.

     

    pipeline partition 5.png

     

     

    Since the target table itself is partitioned by key range, the partition type at the target instance was set to key range.The OVERALL_GPA field was chosen as the partition key to mimic the physical partitions on the table. The Informatica Server uses this key to align the data with the physical partitions in the target table by routing it to the appropriate partition.

     

    pipeline partition 6.png

     

    Note:

    • The Mappings were originally built using PowerCenter Designer. If you are using PowerCenter Express (PCX), all the mappings cannot be imported as PCX includes Informatica Developer tool and not PowerCenter Designer tool.  For example, Informatica Developer does no’t support mappings that use “Sequence Generator” transformation.
    • The objects are based on “UTF-8” codepage. If you have changed you codepage, XML may need to be edited.