Cloud Data Integration : 2013 : July : 18 Skip navigation

This is a continuation of the discussion we started on Part 1 of this topic about Cloud Integration Templates, where we saw how to define field link rules using regular expressions. Refer to the post for other details. Let’s recap from the benefits of doing it:

 

  1. We did not have to specify rules for each individual field separately
  2. If the number of fields of each of these types changes in future these rules will still handle those.
  3. We have broken down the data structure into field sets of different logical types, for which we can define transformations specific to those

 

The template we used is as in the screenshot below. The rules described in the earlier post are about linking fields from the object “SQ” to the object “EXPRegEx”. Now we need to define transformations for each field set.

 

 

For that, the ability to perform #3 above is very important. Having defined the field links like this, if we then have to define transformations using the individual field names, it diminishes the benefits of using regex patterns for links.

 

Thankfully, templates allow a user to define transformations on regex patterns as well! Now we will see how we can do that.

 

For this we need to use what is called as “Macro definition” in expressions. We use a declaration field to identify all field name patterns, and then define pattern specific transformations on those.

 

In the screenshot below, the first field in the Expression is the declaration discussed above. It declares four patterns:

 

Name

Pattern

Description

QQ

Qty$

Any expression defined on the name “QQ” will be applied to all fields   with names ending with Qty

QY

Qtd$|Ytd$

Any expression defined on the name “QY” will be applied to all fields   with names ending with Qtd or Ytd

DF

Dt$|Date$

Any expression defined on the name “DF” will be applied to all fields   with names ending with Dt or Date

CC

Comment\d

Any expression defined on the name “CC” will be applied to all fields   with names having the text “Comment” followed by a digit

 

 

 

 

The names used for the patterns above are just for convenience and do not mean anything in particular. Now a quick word about those actual field names and expressions used in this Expression:

 

Let’s consider the field “OUT2_%DF%”. Since we have four date fields in the input the actual Custom Integration Task will create four fields with the values as below

 

Field

Expression

OUT2_CreatedDate

TO_DATE(CreatedDate,'yyyymmdd')

OUT2_LastModifiedDate

TO_DATE(LastModifiedDate,'yyyymmdd')

OUT2_FirstTransactionPostedDt

TO_DATE(FirstTransactionPostedDt,'yyyymmdd')

OUT2_MostRecentTransactionDt

TO_DATE (MostRecentTransactionDt,'yyyymmdd')

 

As you can see the task has “vertically expanded” the expression defined on the field pattern. In future if two more date fields are added to the input file structure, this will handle those automatically.

 

We do not have to use those prefixes like OUT2_ here, but doing that allows us to configure link rules or expressions downstream if necessary. For example, we can create a link rule that uses “^OUT2” in the pattern to link only these fields to another object.

 

Hope this example illustrates how you can identify and work on logical field sets using regex patterns through your data flow. Feel free to post your questions on the Informatica Cloud Developer Community as you try these!

As we define data integration patterns using Templates on Informatica Cloud, one of the most powerful features is the ability to specify link rules- the rules that define how fields are linked between objects- using various ways. You can define these using field names, field data types, using an externally created file of field pairs (“Dictionary” file) etc. One of these options is the use of “patterns”. You can use regex patterns to identify field sets on which you can perform transformations.

 

Let me admit something before I get into details. For somebody who is writing this to help others, I am usually quite confused by those strange looking combinations of special characters. Despite of having done my share of “awk”,”grep” and find/replace with “vi” using such tokens, it all looks very unfamiliar to me when I look at it after a few days. On such occasions I always wish I had written some illustrations down somewhere so I could refer to those.

 

So I am hoping this post will help me as much it might help others.

 

When we take a look at these within the context of specifying rules for linking fields, we don’t have to go all “^ShipTo|BillTo\s{0,1}\d{2}[#\-]$”, at least right in the first rule. In fact I don’t expect most such rules to be too complicated. Let’s start with something simple.

 

Let’s assume we are reading from a sample file or table with the field names listed below:

  • CustomerId
  • CustomerKey
  • RecordType
  • CustName
  • CustDesc
  • CreatedDate
  • LastModifiedDate
  • FirstTransactionPostedDt
  • MostRecentTransactionDt
  • TotalOrderedQty
  • TotalReceivedQty
  • InvoicedAmtQtd
  • PaidAmtQtd
  • InvoiceAmtYtd
  • PaidAmtYtd
  • Comment1
  • Comment2
  • Comment3
  • Comment4
  • StatusCode
  • CurrencyCode
  • ShipToLocation
  • BillToLocation

 

Now if you are one of those fortunate people who have never had to deal with data definitions created by somebody else you might think this is a convenient structure created to demonstrate the point, and to some extent it might be. But otherwise for those of us who have to work with such structures, this looks similar to those in real life.

 

The use case here is that we need to load this data into some other table/file. While loading that we need to transform some of these values differently:

  1. All date fields to be converted to a “datetime” data type
  2. All QTD/YTD amounts need to be converted to amounts in the company currency
  3. Need to use first 80 characters of each comment field
  4. Fields with “Total” in name need to be linked to similar field names but with no “Total” in it.

 

Moreover such structure can undergo changes over time with more fields of some type getting added (e.g. a “Comment5” gets added later). So you want to future-proof your DI flow to handle such cases.

 

To achieve this let’s refer to a template that defines different ways of transforming this data to implement the requirements listed above.

 

This template depicts a pattern where we read some customer records from a flat file, perform some transformations over the values and then write to another flat file. The use of files here is just for illustration, this can be done between any source/target types supported on Informatica Cloud. The rules we have described below are between the object “SQ” and the object “EXPRegEx” below.

 

 

Let’s look at the rules for linking.

 

Rule #1                 Link fields with names ending with “Dt” or “Date” to fields with identical names

 

The screenshot below shows the actual rule definition. For the “from” side of fields we use the pattern “Dt$|Date$” which indicates any name that ends with either of “Dt” or “Date”. This will link the four date fields in the list above. For the “to” side we use “$0”, which means these will be linked to fields with identical names. Note that we have used the “Pattern” option in the rule link.

 

 

 

For other field sets we will use similar rule definitions

 

Rule #2                 Qtd/Ytd amounts to be linked as-is

 

This will be done in a similar way

 

 

Rule #3                 All comment fields to be linked as-is

 

There are comment fields with a digit suffix in the name in the list above. We define a rule that will apply to all such fields

 

 

Rule #4                 Quantify fields with names starting with Total to be linked, without the Total prefix

 

 

 

The empty “to” pattern here means a field name “TotalOrderedQty” will be linked to a field “OrderedQty”, taking out the “Total” at the start.

 

This allows us to link fields based on the naming pattern. There are several benefits of doing this:

  1. We did not have to specify rules for each individual field separately
  2. If the number of fields of each of these types changes in future these rules will still handle those.
  3. We have broken down the data structure into field sets of different logical types, for which we can define transformations specific to those

 

What next?

 

The first time I tried this, at this point I almost felt like those fish in the last scene of Finding Nemo, where they have crossed the street in a plastic bag, are into the ocean, but are wondering how to get out of the bag.

 

To solve it, we are going to define the transformations in a similar way, where we can use regex patterns. Let’s look at that in the second part of this post!

Filter Blog

By date: By tag: