Big Data Management : 2018 : December Skip navigation
2018

Introduction

Team Based Development refer to the capabilities in Informatica Big Data Management (BDM) that allow various developers to access, share, collaborate and reuse objects developed by others within the team. BDM has several capabilities that allow developers to work collectively without stepping on each other's work and without accidental overwrites.

Integration with version control system

Model Repository Service (MRS) can be integrated with any supported version control system such as GIT, SVN and Perforce. Model Repository completely abstracts all the complex version control operations to the Informatica developers. As the developers check-in / check-out objects, MRS seamlessly translates these into necessary operations for the underlying version control system. Integrating Model Repository Service with an external version control system is a single step process as demonstrated in the screenshot here:

BDM integration with external version control system such as GIT

 

When integrated with a version control system, MRS will preserve the latest version (current version) of an object in the model repository and all other versions in the external version control system.

Versioned objects

Developers can use the check-in objects into or check out objects out of the version control system. Developers can perform such operations on multiple objects at a time.

Versioned objects

Version History

All the historical versions stored in the external version control system can be directly accessed from the Developer tool itself. The View Version History menu opens the Version History pane.

Collaboration

Multiple Informatica developers can work and collaborate with each other. Developers can in parallel edit and operate on multiple related objects. For example, consider a mapping with a some reusable and non-reusable transformations. While a developer (developer-1) is editing the mapping, another user (developer-2) can edit the data object and a 3rd user (developer-3) can change / update the reusable transformations used within the mapping. Depending on the complexity of the mapping, multiple users can edit several components of it at the same time. Users can also edit Mapplet, Workflows and other related objects at the same time.

Collaboration in BDM

 

Administrators and other super users will be able to view edits in progress from the Administrator console - as described in the next section.

Collaboration Locks in Admin Console

 

Intent based object locking

BDM has in-built capability to acquire write locks for objects that the developers edit. A classic lock acquisition mechanism would acquire the write lock on the object as soon as user opens it in the workspace. While this eliminates the accidental overwrites, it is often an administration overhead when large teams are involved. A developer may just wish to have a read-only copy of a certain object open in their workspace as a reference to something they are working on. With globalization and developer teams spread across the oceans, acquiring a write lock for a user who opens an object makes collaboration a nightmare for developers. BDM uses intent based object locking to provide a more seamless collaboration experience. BDM acquires a write lock on the objects that on the first attempt to edit an object. This way many users can have the object open in the workspace and not interfere with the active users. Intent based locking is available for all top level objects including mappings, profiles and workflows. Locks acquired by the developers are automatically released when the objects are closed.

 

Developers have complete visibility on the objects that are locked by other developers, the time since this the user acquired the lock and other details. Users with elevated privileges

Locked objects in Developer

Administrators and elevated users can leverage the Administrator console to similarly manage the object locks and release the locks that are no longer valid and active

Intent based locks in Administrator

Summary

Informatica's Big Data Management has capabilities that allows various developers to work in parallel in a model repository that is version control enabled. Developers can check-in and check-out objects from the model repository that are seamlessly sent to an external version control system such as GIT. Big Data Management automatically maintains locks on the objects while allowing users to contribute and collaborate.

Introduction

Informatica® Big Data Management allows users to build big data pipelines that can be seamlessly ported on to any big data ecosystem such as Amazon AWS, Azure HDInsight and so on. A pipeline built in the Big Data Management (BDM) is known as a mapping and typically defines a data flow from one or more sources to one or more targets with optional transformations in between. The mappings and other associated data objects are stored in a Model Repository via a Model Repository Service (MRS). In design-time environment, mappings are often organized into folders within projects. A mapping can refer to objects across projects and folders. Mappings can be grouped together into a workflow for orchestration. Workflow defines the sequence of execution of various objects including mappings.

Deployment overview

Mappings, workflows and other objects developed by Informatica developers are stored in the model repository that the MRS is integrated with. These design-time objects are deployed to the run-time DIS for execution. In a typical enterprise, there is more than one Informatica environment and the code developed in the Development domain is deployed to several non-production environment such as QA and UAT before deployed into Production. While the Development environments contain both design-time and run-time services, it is not necessary for the subsequent environments to be configured with both design-time and run-time services. For deploying objects from one environment to another, the objects must be added into containers called Applications. Applications can be deployed to a runtime Data Integration Service (DIS) or to an Application Archive (.iar) file. The application archive file can subsequently be deployed to data integration services in the same or different domain as depicted below.

BDM Deployment Process

 

There are two recommended ways of deployment: Classic deployment model and the CI/CD deployment model. In the example below, the migration and deployment of objects 

Classic deployment

In classic deployment model, the following process is followed:

  1. Metadata / objects that need to be deployed are deployed into the run-time DIS of the development environment
  2. Once unit testing is complete, the objects can be migrated to subsequent environment's MRS (such as QA) via XML export/import or via application export
  3. From the MRS of QA environment, application is rebuilt and deployed to the QA DIS
  4. Once functional testing is complete, the objects are migrated from QA MRS to Production MRS via XML export/import or via application export
  5. From the Production MRS, application is rebuilt and deployed to the Production DIS

 

Classic deployment model in BDM

 

In this approach, a design-time copy of the mappings and workflows are maintained in the MRS of every single environment. Application is rebuilt in each environment and deployed to the corresponding DIS. During migration of objects from one MRS to another, one of the available replacement strategies can be selected. Replacement strategies include replacing objects from the source upon conflict, reusing the objects in the target repository, etc. Upon conflicts, if the objects in the target repository are not replaced from the source, the application built in each environment may not match with that of the other as the dependency resolution can happen with different versions of the objects or different objects altogether

Agile deployment

In Agile deployment model, the following process is followed:

  1. An application archive is built in the Development repository
  2. This application archive (.iar) file is uploaded into a version control system such as GIT
  3. The application archive (.iar) file from version control system is then downloaded and deployed to the Development DIS using infacmd CLI
  4. Once unit testing is complete, the same step is repeated to deploy the application in to QA DIS
  5. Once functional testing is complete, the same step is repeated to deploy the application in to Production DIS

 

Agile deployment in BDM

In this approach, a single application archive file is used across several environments and hence consistency is assured. Though not common, the application archive can optionally be imported into MRS to maintain a design-time copy of the objects.

Automation

infacmd CLI can be used perform deployment in an automated manner. Both of the deployment models described above can be automated using the CLI. Automation server tools such as Jenkins can be used to automate the overall process of deployment as described in the blog: Continuous delivery with Informatica  BDM.

Summary

In Big Data Management, there are many ways to migrate and deploy objects  from one environment to another. Customers can choose the approach that best suits their needs. All approaches can be automated using infacmd CLI and automation tools such as Jenkins.

Introduction

Informatica® Big Data Management allows users to build big data pipelines that can be seamlessly ported on to any big data ecosystem such as Amazon AWS, Azure HDInsight and so on. A pipeline built in the Big Data Management (BDM) is known as a mapping and typically defines a data flow from one or more sources to one or more targets with optional transformations in between. The mappings and other associated data objects are stored in a Model Repository via a Model Repository Service (MRS). In design-time environment, mappings are often organized into folders within projects. A mapping can refer to objects across projects and folders. Mappings can be grouped together into a workflow for orchestration. Workflow defines the sequence of execution of various objects including mappings.

 

Deployment process overview

For mappings and workflows to be deployed and executed in the run-time, they are grouped into applications. Application is a container that holds executable objects such as mappings and workflows. Applications are defined in the Developer and deployed to a Data Integration Service for execution. Once deployed, Data Integration Service persists a copy of the Application. Application can also be deployed to a file known as Informatica Application Archive (.iar) file, which can subsequently be deployed to a Data Integration Service in same or different domain. The overall process flow for deployment in BDM is as shown here:

BDM Deployment Process

Automation

The process of deploying a design-time application to an Informatica application archive (.iar) file can be executed via a infacmd CLI with Object Import Export (oie) plugin. A sample of the deploy application command is as follows:

infacmd.sh oie deployApplication -dn $infaDomainName -un $infaUserName -pd $infaPassword -sdn $infaSecurityDomain -rs $designTimeMRSName -ap $applicationPath -od $Output_Directory

 

The above example uses several user-defined environment variables. They can be named as per the individual organization standards. The password provided is case sensitive. Alternatively, an encrypted password string can be stored in the predefined environment variable INFA_DEFAULT_DOMAIN_PASSWORD. When an encrypted password is used, -pd option is not required. This command is documented in detail in Informatica documentation at Command Reference Guide → infacmd OIE Command Reference → Deploy Application

 

Once the application archive file is created, it can be optionally checked into GIT or other version control system for audit and tracking purposes.

 

Subsequently, the application archive file can be deployed to Data Integration Service of the same or different domain. Typically the application archive file is created out of a development domain and is eventually deployed into QA, UAT and Production domains. This can be achieved via infacmd CLI with Data Integration Service (dis)  plugin. A sample of such deployment command is as follows:

infacmd.sh dis deployApplication -dn $infaDomainName -un $infaUserName -pd $infaPassword -sdn $infaSecurityDomain -sn $dataIntegrationServiceName -a $applicationName -f $applicationArchiveFileName

 

This command is documented in detail in Informatica documentation at Command Reference Guide → infacmd DIS Command Reference → Deploy Application. Once deployment is successful the listApplications and listApplicationObjects in the DIS plugin can be used to get a list of the deployed applications and their contents respectively. This information can be used for post-deployment verification / sanity checks.

 

Integration with Jenkins

The CLI described above can be used to initiate the deployment process from within a Jenkins task. A "Build Step" of type "Execute Shell" can be added to the Jenkins. The step can be configured to execute one of the infacmd commands as shown in the example below

 

BDM deployment in Jenkins

 

A sample template file for Jenkins is attached (Jenkins-Template-App-Deployment) . The template contains the commands to:

  1. Create an Informatica Application Archive (.iar) file
  2. Commit the application archive file to GIT
  3. Deploy the application into DIS

 

Summary

Informatica BDM jobs can be deployed using Jenkins without any need for 3ʳᵈ party plugins. infacmd CLI commands can be directly used in Jenkins just as they can be used in an enterprise scheduling tool.

 

Contributors

  • Keshav Vadrevu, Principal Product Manager
  • Paul Siddal, Big Data Presales Specialist