Transformations > Hierarchy Processor transformation > Output data configuration
  

Output data configuration

The Hierarchy Processor transformation allows you to define the data processing strategy in the mapping.
When the output data format is relational, you can configure filter conditions.
When the output data format is hierarchical, you can configure data source join and filter conditions as well as group by and order by fields. You can aggregate on both the input and output data.

Order of operations

When the output data format is hierarchical, the order of operations is as follows:
  1. 1. Join. Specify join conditions in the data sources.
  2. 2. Filter. Optionally, specify filter conditions to include only a subset of the source data.
  3. 3. Group By. Optionally, specify input fields for aggregate expressions.
  4. 4. Order By. Optionally, specify input fields to create sorted output.

Renaming and deleting input groups

When the output data format is hierarchical, you can rename or delete input groups.
When you modify the input group, the following configurations are also updated, if they refer to the input group:
However, you must modify the following configurations manually:
Note: When the output data format is relational, you cannot change the input group name.

About aggregation

When the output data format is hierarchical, you can aggregate both the input and output data.
For information about aggregating input data, see Configure group by fields.
For information about aggregating output data, see Configuring output groups and fields.
Note: When the output data format is relational, you cannot aggregate data.

Data source configuration

In a Hierarchy Processor transformation, a data source must be configured for the output group and for all array and struct fields in the output. A data source identifies the input group or incoming array that populates the primitive child fields for the output group or field.
If you select a field as a data source, you have access to the following objects:
You can configure multiple data sources for an output group or field. If you do this, you must configure a join condition to join the data.
You can configure filters to exclude certain records. You can also specify group by fields for aggregating the data and order by fields for sorting records.
If the output is relational, the data source for the output groups is always the input group or an incoming field. For example, you add an array to the output group. If you add single occurring children, the data source for the output group is the input group. If you add all descendants, the data source for each output group is an incoming field array.
If the output is hierarchical, the data sources for the output group and fields can vary based on the output data structure.

Inheriting data sources from the parent

When you configure the data source for an array or struct field, you can choose to use the incoming data or inherit the parent's data sources to populate the children of the field.
When you use the incoming data, the incoming data is used to populate the children of the array or struct.
When you choose to inherit the parent's data sources, the data that is transformed into the parent output field is used to populate the children of the array or struct. Data transformations, such as joins and filters, that are applied to the parent field are preserved. You can apply filters to the field to further filter the data, but you cannot configure data sources, joins, group by fields, or order by fields.
You might want to choose the Inherit parent's data sources option when you create a new struct in the output.
For example, you are reading data from a relational table of customer records in which the customer ID is unique. The incoming data contains the following records:
CustID,Name,Street,City,State,ZIP
00234,Ravindra Singh,123 6th St. Apt. 5A,Boston,MA,02134
14416,Melissa Clark,11 Winding Way,Watch Hill,RI,02891
You want to write the customer address fields to a struct.
The following image shows the incoming and output fields:
The image shows the Hierarchy Processor tab of the Hierarchy Processor transformation. The Incoming Fields panel contains one input group with the following string fields: CustID, Name, Street, City, State, and ZIP. The Output Fields panel contains one output group with the following fields: CustID (string), Name (string), and Address (struct). The Address struct contains the following string fields: Street, City, State, and ZIP.
In the Output Fields panel, set the data source for the Output group to Input and the data source for the Address struct to Inherit parent's data sources (Output). When you run the mapping, Data Integration creates one record for each occurrence of CustID in the input data and populates the struct with the address data that corresponds to the customer ID in the output:
{
"CustID":"00234",
"Name":"Ravindra Singh",
"Address":{
"Street":"123 6th St. Apt. 5A",
"City":"Boston",
"State":"MA",
"ZIP":"02134"
}
}
{
"CustID":"14416",
"Name":"Melissa Clark",
"Address":{
"Street":"11 Winding Way",
"City":"Watch Hill",
"State":"RI",
"ZIP":"02891"
}
}
If you set the data source for the Address struct to Input, then you must also configure the following filter condition on the struct to get the same output: :fld.{Input.CustID} = :fld.{Output.CustID} AND :fld.{Input.Name} = :fld.{Output.Name}. For more information about configuring filter conditions, see Configure filter conditions.
When the output field is an array that inherits its parent's data, Data Integration creates an array with one element.

Data source configuration in hierarchical output example

When you add a nested array to the output and choose to preserve the incoming field, the records that get created vary based on how you configure the data source for the output group.
For example, you want to extract the description information from a nested array of maintenance records in a JSON file. The description information is in an array of strings. You want the output data to also be in an array of strings.
The following sample data shows an incoming record:
[
{
"vehicle": [
{
"make": "Toyota",
"model": "Corolla",
"insurance": {
"company": "Allstate",
"policy_num": "AS12876"
},
"maintenance": [
{
"date": "01/01/2020",
"description": ["oil filter1", "oil filter2"]
},
{
"date": "01/08/2020",
"description": ["tire rotation1", "tire rotation2"]
}
]
},
{
"make": "Toyota",
"model": "RAV4",
"insurance": {
"company": "Allstate",
"policy_num": "AS2033"
},
"maintenance": [
{
"date": "01/02/2020",
"description": ["air filter replacement1", "air filter replacement2"]
},
{
"date": "01/08/2020",
"description": ["battery replacement1", "battery replacement2"]
}
]
}
]
}
]
In the Hierarchy Processor transformation, you add the description array to the output group and choose Preserve Incoming Field.
The following image shows the incoming and output fields:
The image shows the Hierarchy Processor tab of the Hierarchy Processor transformation. The Incoming Fields panel contains one input group with the following field: vehicle (array). The vehicle array contains the following fields: make (string), model (string) insurance (struct), and maintenance (array). The insurance struct contains the following fields: company (string) and policy_num (string). The maintenance array contains the following fields: date (string) and description (array of strings). The Output Fields panel contains the following field: description (array of strings).
When Data Integration creates the output array, it sets the data source for the description array to Input.vehicle.vehicle.maintenance.maintenance.desc.elem, indicating that the information for the output array comes from the elements in the desc array in the Input group. By default, Data Integration sets the data source for the Output group to Input.
The data source configuration for the Output group determines how Data Integration creates the output records.
To collate all descriptions into one output record, keep the data source for the Output group as Input. This produces the following output:
{"description":["battery replacement2","battery replacement1","air filter replacement2","air filter replacement1","tire rotation2","tire rotation1","oil filter2","oil filter1"]}
To create one output record for each occurrence of vehicle in the incoming data, set the data source to Input.vehicle.vehicle. In this case, the output data contains one record for each vehicle:
{"description":["tire rotation2","tire rotation1","oil filter2","oil filter1"]}
{"description":["battery replacement2","battery replacement1","air filter replacement2","air filter replacement1"]}
To create one output record for each occurrence of maintenance in the incoming data, set the data source to Input.vehicle.vehicle.maintenance.maintenance. In this case, the output contains one record for each maintenance record:
{"description":["oil filter2","oil filter1"]}
{"description":["tire rotation2","tire rotation1"]}
{"description":["air filter replacement2","air filter replacement1"]}
{"description":["battery replacement2","battery replacement1"]}

Data source conflicts

If you use a Hierarchy Processor transformation to convert hierarchical data to hierarchical data and you add multi-level arrays to the output, you must ensure that the data sources for the output group or fields do not conflict. A conflict occurs when you select an incoming array and one of its descendant arrays as data sources for the same output group or field.
If you have a data source conflict, the transformation remains invalid until you resolve the conflict. Additionally, you cannot configure joins, filters, order by fields, or group by fields until you resolve the conflict.

Example

You add the following Input group to the Output group and preserve the incoming field:
The image shows the Hierarchy Processor tab of the Hierarchy Processor transformation. The Incoming Fields panel contains one input group with the following field: Array1 (array). Array1 contains the following fields: Field1 (string), Field2 (string), and Array2 (array). Array2 contains the following fields: Field3 (string) and Field4 (string). The Output Fields panel contains one output group with the same fields as the input group.
If you select both Array1 and Array2 as data sources for the Output group, there is no way to determine which data source provides the data for Field1 and Field2. In this case, Data Integration displays a conflicting data sources error.
To resolve the conflict, remove one of the data sources from the Output group.

Configure data sources

A data source must be configured for the output group and for all array and struct fields in the output. In most cases, Data Integration configures the data source when you add an input group or incoming field to the output. You can change the data source or add additional data sources.
Configure data sources on the Hierarchy Processor tab.
    1. Click the Data Sources icon for the output group, array field, or struct field.
    2. Add the data sources.
    If you configure data sources for an output group, add the data sources. If you configure data sources for an array or struct field, you can add data sources or inherit the parent's data sources.
    3. Validate the configuration.
    4. Click Save.

Configure join conditions

When the output data format is hierarchical, you can define join conditions for the data sources. You must configure a join condition if an output group or field has multiple data sources. Configure a join condition to join the data from the input groups or incoming fields.
Configure the join conditions for the output groups on the Hierarchy Processor tab.
    1. Click the Join Conditions icon for the output group.
    2. Add a join condition.
    3. Select the left data source.
    4. Select the join type:
    Note: If you select an outer join on a large data set, you might need to increase the Spark driver memory in the mapping task. For more information about Spark session properties for elastic mappings, see Tasks.
    5. Select the right data source.
    6. Click Configure Join Condition.
    7. Select fields and built-in functions to create the expression.
    8. Validate the expression.
    9. Click Save.

Configure filter conditions

Optionally, define filter conditions to project a subset of the input data in the transformation. You can filter based on incoming fields or output fields.
The transformation uses a combination of input and output fields to filter the input data to match the data constructed for the parent output element for a particular output row. The matching of input data with the parent element output data is accomplished by using any of the ancestor's primitive child fields.
Configure filter conditions for output groups, array fields, or struct fields on the Hierarchy Processor tab.
    1. Click the Filter Condition icon for the output group or field.
    2. Click Configure Filter Condition.
    3. Select fields and built-in functions to create the expression.
    4. Validate the expression.
    5. Click Save.

Filter configuration example

You might configure a filter condition to read data from primitive fields into an output array or struct field when the data in the array or struct field must correspond to the data in a sibling field in the output group.
For example, you are converting relational data to a JSON file. The incoming data is in a relational table that contains orders information. The orders table contains multiple rows for each order because each order can contain several products.
The incoming data looks like the following data:
OrderNumber,ProductName,ProductType,NumberOfItems,PricePerItem
12345,M&Ms Candies Chocolate Peanut Party Size - 38 Oz,Candy,2,14.49
12345,Stella Parm Shredded Cup - 20 Oz,Dairy,1,10.99
12345,AHA Sparkling Water Blueberry Pomegranate - 8-12 Fl. Oz.,Beverages,1,3.33
23456,Weetabix Biscuit Cereal Whole Grain 2 Count - 14 Oz,Breakfast & Cereal,2,4.99
23456,Producers Milk Lowfat 1% - Half Gallon,Dairy,1,2.79
23456,Egglands Best Eggs Cage Free Large Brown - 12 Count,Eggs,1,4.99
You want to read the product details into an array. The array must contain the product details that are associated with a particular order number.
The following image shows the structure of the incoming and output fields:
The image shows the Hierarchy Processor tab of the Hierarchy Processor transformation. The Incoming Fields panel contains one input group with the following fields: OrderNumber, ProductName, ProductType, NumberOfItems, and PricePerItem. The Output Fields panel contains one output group with the following fields: OrderNumber (string) and ProductDetails (array). The ProductDetals array contains the following fields: ProductName, ProductType, NumberOfItems, and PricePerItem.
In the Output Fields panel, set the data source for the Output group to Input, and configure the group by field as Input.OrderNumber to remove duplicate records from the output. Set the data source for the ProductDetails array to Input.
To ensure that the details in the ProductDetails array correspond to the order number in the output, configure the following filter condition for the array:
:fld.{Input.OrderNumber}= :fld.{Output.OrderNumber}
To further refine the records, use an AND condition in the filter. For example, to exclude records in which the product type is "Candy," configure the following filter condition:
:fld.{Input.OrderNumber}= :fld.{Output.OrderNumber} AND :fld.{Input.ProductType} != 'Candy'
When you do this, the output contains one record for each order, and incoming records with the product type "Candy" are excluded.
The output data contains the following records:
{
"OrderNumber":"12345",
"ProductDetails":[
{
"ProductName":"AHA Sparkling Water Blueberry Pomegranate - 8-12 Fl. Oz.",
"ProductType":"Beverages",
"NumberOfItems":"1",
"PricePerItem":"3.33"
},
{
"ProductName":"Stella Parm Shredded Cup - 20 Oz",
"ProductType":"Dairy",
"NumberOfItems":"1",
"PricePerItem":"10.99"
}
]
}
{
"OrderNumber":"23456",
"ProductDetails":[
{
"ProductName":"Egglands Best Eggs Cage Free Large Brown - 12 Count",
"ProductType":"Eggs",
"NumberOfItems":"1",
"PricePerItem":"4.99"
},
{
"ProductName":"Producers Milk Lowfat 1% - Half Gallon",
"ProductType":"Dairy",
"NumberOfItems":"1",
"PricePerItem":"2.79"
},
{
"ProductName":"Weetabix Biscuit Cereal Whole Grain 2 Count - 14 Oz",
"ProductType":"Breakfast & Cereal",
"NumberOfItems":"2",
"PricePerItem":"4.99"
}
]
}

Configure group by fields

When the output data format is hierarchical, you can optionally define input fields for aggregation to produce one row for each group of input data.
Configure group by fields on the Hierarchy Processor tab.
    1. Click the Group By fields icon for the output group or the array or struct field.
    2. Add the input fields for aggregation.
    3. Validate the configuration.
    4. Click Save.

Configure order by fields

When the output data format is hierarchical, you can optionally define input fields for sorted output.
Note: The following conditions must be true for the sort operation to take effect:
Configure order by fields on the Hierarchy Processor tab.
    1. Click the Order By fields icon for the output group or the array or struct field.
    2. Add the input fields to order by and sort the data in ascending or descending order.
    3. Rearrange the fields to adjust the sort order.
    4. Validate the configuration.
    5. Click Save.