Authors: user189994 Skip navigation

Cloud Data Integration

4 Posts authored by: user189994

This is a continuation of the discussion we started on Part 1 of this topic about Cloud Integration Templates, where we saw how to define field link rules using regular expressions. Refer to the post for other details. Let’s recap from the benefits of doing it:

 

  1. We did not have to specify rules for each individual field separately
  2. If the number of fields of each of these types changes in future these rules will still handle those.
  3. We have broken down the data structure into field sets of different logical types, for which we can define transformations specific to those

 

The template we used is as in the screenshot below. The rules described in the earlier post are about linking fields from the object “SQ” to the object “EXPRegEx”. Now we need to define transformations for each field set.

 

 

For that, the ability to perform #3 above is very important. Having defined the field links like this, if we then have to define transformations using the individual field names, it diminishes the benefits of using regex patterns for links.

 

Thankfully, templates allow a user to define transformations on regex patterns as well! Now we will see how we can do that.

 

For this we need to use what is called as “Macro definition” in expressions. We use a declaration field to identify all field name patterns, and then define pattern specific transformations on those.

 

In the screenshot below, the first field in the Expression is the declaration discussed above. It declares four patterns:

 

Name

Pattern

Description

QQ

Qty$

Any expression defined on the name “QQ” will be applied to all fields   with names ending with Qty

QY

Qtd$|Ytd$

Any expression defined on the name “QY” will be applied to all fields   with names ending with Qtd or Ytd

DF

Dt$|Date$

Any expression defined on the name “DF” will be applied to all fields   with names ending with Dt or Date

CC

Comment\d

Any expression defined on the name “CC” will be applied to all fields   with names having the text “Comment” followed by a digit

 

 

 

 

The names used for the patterns above are just for convenience and do not mean anything in particular. Now a quick word about those actual field names and expressions used in this Expression:

 

Let’s consider the field “OUT2_%DF%”. Since we have four date fields in the input the actual Custom Integration Task will create four fields with the values as below

 

Field

Expression

OUT2_CreatedDate

TO_DATE(CreatedDate,'yyyymmdd')

OUT2_LastModifiedDate

TO_DATE(LastModifiedDate,'yyyymmdd')

OUT2_FirstTransactionPostedDt

TO_DATE(FirstTransactionPostedDt,'yyyymmdd')

OUT2_MostRecentTransactionDt

TO_DATE (MostRecentTransactionDt,'yyyymmdd')

 

As you can see the task has “vertically expanded” the expression defined on the field pattern. In future if two more date fields are added to the input file structure, this will handle those automatically.

 

We do not have to use those prefixes like OUT2_ here, but doing that allows us to configure link rules or expressions downstream if necessary. For example, we can create a link rule that uses “^OUT2” in the pattern to link only these fields to another object.

 

Hope this example illustrates how you can identify and work on logical field sets using regex patterns through your data flow. Feel free to post your questions on the Informatica Cloud Developer Community as you try these!

As we define data integration patterns using Templates on Informatica Cloud, one of the most powerful features is the ability to specify link rules- the rules that define how fields are linked between objects- using various ways. You can define these using field names, field data types, using an externally created file of field pairs (“Dictionary” file) etc. One of these options is the use of “patterns”. You can use regex patterns to identify field sets on which you can perform transformations.

 

Let me admit something before I get into details. For somebody who is writing this to help others, I am usually quite confused by those strange looking combinations of special characters. Despite of having done my share of “awk”,”grep” and find/replace with “vi” using such tokens, it all looks very unfamiliar to me when I look at it after a few days. On such occasions I always wish I had written some illustrations down somewhere so I could refer to those.

 

So I am hoping this post will help me as much it might help others.

 

When we take a look at these within the context of specifying rules for linking fields, we don’t have to go all “^ShipTo|BillTo\s{0,1}\d{2}[#\-]$”, at least right in the first rule. In fact I don’t expect most such rules to be too complicated. Let’s start with something simple.

 

Let’s assume we are reading from a sample file or table with the field names listed below:

  • CustomerId
  • CustomerKey
  • RecordType
  • CustName
  • CustDesc
  • CreatedDate
  • LastModifiedDate
  • FirstTransactionPostedDt
  • MostRecentTransactionDt
  • TotalOrderedQty
  • TotalReceivedQty
  • InvoicedAmtQtd
  • PaidAmtQtd
  • InvoiceAmtYtd
  • PaidAmtYtd
  • Comment1
  • Comment2
  • Comment3
  • Comment4
  • StatusCode
  • CurrencyCode
  • ShipToLocation
  • BillToLocation

 

Now if you are one of those fortunate people who have never had to deal with data definitions created by somebody else you might think this is a convenient structure created to demonstrate the point, and to some extent it might be. But otherwise for those of us who have to work with such structures, this looks similar to those in real life.

 

The use case here is that we need to load this data into some other table/file. While loading that we need to transform some of these values differently:

  1. All date fields to be converted to a “datetime” data type
  2. All QTD/YTD amounts need to be converted to amounts in the company currency
  3. Need to use first 80 characters of each comment field
  4. Fields with “Total” in name need to be linked to similar field names but with no “Total” in it.

 

Moreover such structure can undergo changes over time with more fields of some type getting added (e.g. a “Comment5” gets added later). So you want to future-proof your DI flow to handle such cases.

 

To achieve this let’s refer to a template that defines different ways of transforming this data to implement the requirements listed above.

 

This template depicts a pattern where we read some customer records from a flat file, perform some transformations over the values and then write to another flat file. The use of files here is just for illustration, this can be done between any source/target types supported on Informatica Cloud. The rules we have described below are between the object “SQ” and the object “EXPRegEx” below.

 

 

Let’s look at the rules for linking.

 

Rule #1                 Link fields with names ending with “Dt” or “Date” to fields with identical names

 

The screenshot below shows the actual rule definition. For the “from” side of fields we use the pattern “Dt$|Date$” which indicates any name that ends with either of “Dt” or “Date”. This will link the four date fields in the list above. For the “to” side we use “$0”, which means these will be linked to fields with identical names. Note that we have used the “Pattern” option in the rule link.

 

 

 

For other field sets we will use similar rule definitions

 

Rule #2                 Qtd/Ytd amounts to be linked as-is

 

This will be done in a similar way

 

 

Rule #3                 All comment fields to be linked as-is

 

There are comment fields with a digit suffix in the name in the list above. We define a rule that will apply to all such fields

 

 

Rule #4                 Quantify fields with names starting with Total to be linked, without the Total prefix

 

 

 

The empty “to” pattern here means a field name “TotalOrderedQty” will be linked to a field “OrderedQty”, taking out the “Total” at the start.

 

This allows us to link fields based on the naming pattern. There are several benefits of doing this:

  1. We did not have to specify rules for each individual field separately
  2. If the number of fields of each of these types changes in future these rules will still handle those.
  3. We have broken down the data structure into field sets of different logical types, for which we can define transformations specific to those

 

What next?

 

The first time I tried this, at this point I almost felt like those fish in the last scene of Finding Nemo, where they have crossed the street in a plastic bag, are into the ocean, but are wondering how to get out of the bag.

 

To solve it, we are going to define the transformations in a similar way, where we can use regex patterns. Let’s look at that in the second part of this post!

In one of the previous posts we looked at Custom Integration Templates that allow you to define your custom data integration flows. Informatica Cloud also has a powerful REST-based API that allows developers to create Custom Integration tasks and run and monitor them from environments outside the Informatica Cloud UI– for example a Java-based or Force.com application. Using Cloud Integration Templates and the API features together allows you to create applications in the environment familiar to your users while behind the scenes relying upon the Informatica integration platform as a service (iPaaS) to move your data.

 

Sample Force.com App

We will illustrate the iPaaS concept here using a sample “White Label” application we have created using Force.com and Apex code. This application runs on Force.com and shows a VisualForce UI to the user. Behind the scenes it creates various Informatica objects to implement the cloud data integration use case. There is no interaction with the Informatica Cloud UI. Informatica Cloud is completely configured and controlled via our REST API.

 

In this example, the use case is to copy data from a Salesforce object, such as Account, Contact or Opportunity, to a different environment. We’ll use a flat file in this example, but it could be any object in any database or application that Informatica Cloud supports. You need to define the field mapping between the source and the target and also configure a filter condition on the data.

 

Using this sample app, you first login to the Informatica Cloud integration service. Once the login is successful you will see a list of Salesforce and Flat File connections in two lists. From these select the right connection of each type. Also specify a date to be used as a filter, to select all records created after that date.

 

 

 

As you select a connection of each type the field mapping control below will show fields from the Salesforce Account object and also from the default target file. You can change the object from Account to Contact or Opportunity.

 

You can drag and drop to map the fields or create expressions, very similar to how you do it on the Informatica Cloud UI.

 

 

Then click “Run” to execute this custom data synchronization task. This custom app invokes the Informatica Cloud REST API to create a Custom Integration task for the template, and runs it. Then it also monitors it to report the status.

 

 

 

APIs used

Login

 

First, we use the Login API to login to Informatica Cloud and provide the user login and password in the request body.

https://app.informaticaondemand.com/ma/api/v2/user/login

 

This API call returns a Session ID and a URL that we need to use for all subsequent calls.

 

GET Connections

 

Next, we invoke the API to get connections. From its response we get all Salesforce connections and all Flat File connections, using the “type” attribute.

https://app.informaticaondemand.com/saas/api/v2/connection

 

Custom Integration Task (“mttask”)

 

Then after the user clicks “Run” after configuring the UI, we invoke a series of API calls. First we create a Custom Integration Task for the template with values for parameters applicable based on the user selection.

https://app.informaticaondemand.com/saas/api/v2/mttask/

 

Then run the task

https://app.informaticaondemand.com/saas/api/v2/job/

 

As it runs we invoke the Activity Log API to check if the job is complete, and when it’s complete we get the run results.

https://app.informaticaondemand.com/saas/api/v2/activity/activityLog

 

Set up

 

  1. An Informatica Cloud account is required. Also a Secure Agent should be configured with that account.
  2. The app uses a Cloud Integration Template that reads from a Salesforce object, writes to a flat file target and performs a set of transforms on the way (see below).
  3. This template should be uploaded to the Informatica Cloud account.
  4. At least one connection each should be created in Informatica Cloud for Salesforce and Flat File (the folder where the files are located).

 

One of the most powerful Informatica Cloud features is Cloud Integration Templates. With Templates, you can develop your own data integration flows to perform functions that you cannot typically do using the standard Informatica Cloud applications.

 

For example, you can join objects/tables (even if there is no source-defined relationship or if they come from different connections), perform aggregations on data, add more targets, include additional data integration logic using PowerCenter mapplets (called “plug-ins”), and so on.

 

In addition you can also use Templates to standardize your data flow patterns. So Cloud Integration Templates offer data integration flow flexibility and at the same time provide a way to standardize your flows.

 

Let’s look at an example.

 

You want to synchronize your Sales data in Salesforce CRM with another system behind your organization’s firewall. Typically you would create Data Synchronization tasks using wizard user interface for each such object. Each task is created independently and the task list will look something like the screenshot below.

 

This works fine as long as your needs are all handled by the features available in the Data Synchronization wizard.

 

However, you may want to add more transformations to each task – for example, write the data to another target table/file, perform a standard lookup, or add some additional control fields to your targets. When you use independent Data Synchronization task you need to remember and configure each transformation identically in each.

 

This is where Cloud Integration Templates can be very useful – to define a “pattern” of data integration and apply it to several individual tasks (called Custom Integration tasks).

 

In this case you can create a generic Template that has a pattern which includes reading from a Salesforce object, performing a bunch of transformations and writing the data to a target CSV file.

 

Informatica Cloud includes a Microsoft Visio add-on tool called the “Cloud Integration Template Designer” that you can download from the Community site.  You can create a Template using this tool, as shown in the screenshot below.

 

Such a Template uses place holder objects and link rules to define a pattern. In this case we read from a source object, pass it through a filter (with a filter condition that could be either fixed in the template or parameterized and provided when we create the Custom Integration task), through a field mapping expression, and write to two targets – the main target for data and a control file where you can write some job related information for your reference.

 

None of these are actual database or application objects. When you create a Custom Integration task from this Template you provide the actual values for each parameter - specific objects for which you want to create tasks – Account, Contact, Lead, Opportunity and so on.

 

This way all of your tasks follow the same pattern. When you need to add a lookup or do some additional data transformations, you just edit the Template and add the new transformation logic. Then, in most cases, the tasks will be automatically updated with the new transformation logic in the Template. You can also have a set of Informatica Cloud REST API commands in a script that you can run each time a Template is modified to automatically generate all tasks again.

 

Cloud Integration Templates also allow you to define many types of transformations that are not available in a standard Data Synchronization or Data Replication task. Using Cloud Integration Templates you can create comprehensive data flows that have historically only been possible by using Informatica PowerCenter.

 

The example above illustrates a very simple use case. You can do much more using Cloud Integration Templates. And that’s not all! You can use the Informatica Cloud REST API to invoke all Templates from your external environment (Java, Apex etc.).

 

We’ll look at that in the next few posts!