As we define data integration patterns using Templates on Informatica Cloud, one of the most powerful features is the ability to specify link rules- the rules that define how fields are linked between objects- using various ways. You can define these using field names, field data types, using an externally created file of field pairs (“Dictionary” file) etc. One of these options is the use of “patterns”. You can use regex patterns to identify field sets on which you can perform transformations.

 

Let me admit something before I get into details. For somebody who is writing this to help others, I am usually quite confused by those strange looking combinations of special characters. Despite of having done my share of “awk”,”grep” and find/replace with “vi” using such tokens, it all looks very unfamiliar to me when I look at it after a few days. On such occasions I always wish I had written some illustrations down somewhere so I could refer to those.

 

So I am hoping this post will help me as much it might help others.

 

When we take a look at these within the context of specifying rules for linking fields, we don’t have to go all “^ShipTo|BillTo\s{0,1}\d{2}[#\-]$”, at least right in the first rule. In fact I don’t expect most such rules to be too complicated. Let’s start with something simple.

 

Let’s assume we are reading from a sample file or table with the field names listed below:

  • CustomerId
  • CustomerKey
  • RecordType
  • CustName
  • CustDesc
  • CreatedDate
  • LastModifiedDate
  • FirstTransactionPostedDt
  • MostRecentTransactionDt
  • TotalOrderedQty
  • TotalReceivedQty
  • InvoicedAmtQtd
  • PaidAmtQtd
  • InvoiceAmtYtd
  • PaidAmtYtd
  • Comment1
  • Comment2
  • Comment3
  • Comment4
  • StatusCode
  • CurrencyCode
  • ShipToLocation
  • BillToLocation

 

Now if you are one of those fortunate people who have never had to deal with data definitions created by somebody else you might think this is a convenient structure created to demonstrate the point, and to some extent it might be. But otherwise for those of us who have to work with such structures, this looks similar to those in real life.

 

The use case here is that we need to load this data into some other table/file. While loading that we need to transform some of these values differently:

  1. All date fields to be converted to a “datetime” data type
  2. All QTD/YTD amounts need to be converted to amounts in the company currency
  3. Need to use first 80 characters of each comment field
  4. Fields with “Total” in name need to be linked to similar field names but with no “Total” in it.

 

Moreover such structure can undergo changes over time with more fields of some type getting added (e.g. a “Comment5” gets added later). So you want to future-proof your DI flow to handle such cases.

 

To achieve this let’s refer to a template that defines different ways of transforming this data to implement the requirements listed above.

 

This template depicts a pattern where we read some customer records from a flat file, perform some transformations over the values and then write to another flat file. The use of files here is just for illustration, this can be done between any source/target types supported on Informatica Cloud. The rules we have described below are between the object “SQ” and the object “EXPRegEx” below.

 

 

Let’s look at the rules for linking.

 

Rule #1                 Link fields with names ending with “Dt” or “Date” to fields with identical names

 

The screenshot below shows the actual rule definition. For the “from” side of fields we use the pattern “Dt$|Date$” which indicates any name that ends with either of “Dt” or “Date”. This will link the four date fields in the list above. For the “to” side we use “$0”, which means these will be linked to fields with identical names. Note that we have used the “Pattern” option in the rule link.

 

 

 

For other field sets we will use similar rule definitions

 

Rule #2                 Qtd/Ytd amounts to be linked as-is

 

This will be done in a similar way

 

 

Rule #3                 All comment fields to be linked as-is

 

There are comment fields with a digit suffix in the name in the list above. We define a rule that will apply to all such fields

 

 

Rule #4                 Quantify fields with names starting with Total to be linked, without the Total prefix

 

 

 

The empty “to” pattern here means a field name “TotalOrderedQty” will be linked to a field “OrderedQty”, taking out the “Total” at the start.

 

This allows us to link fields based on the naming pattern. There are several benefits of doing this:

  1. We did not have to specify rules for each individual field separately
  2. If the number of fields of each of these types changes in future these rules will still handle those.
  3. We have broken down the data structure into field sets of different logical types, for which we can define transformations specific to those

 

What next?

 

The first time I tried this, at this point I almost felt like those fish in the last scene of Finding Nemo, where they have crossed the street in a plastic bag, are into the ocean, but are wondering how to get out of the bag.

 

To solve it, we are going to define the transformations in a similar way, where we can use regex patterns. Let’s look at that in the second part of this post!