Configuring Informatica domain with multiple kerberos realm

Version 10

    Background & overview

    Kerberos Domain Controller (KDC) contains a copy of the Kerberos database. The master KDC contains writable copy of the database which is, at regular intervals replicated to slave KDCs. A Kerberos realm is a set of managed nodes that share the same Kerberos database. A Kerberos principal is a user (or service) that is recognized by the Kerberos system and typically is of the format:

    principalName.instanceName@REALM

    The realm name is often same as the DNS domain name in uppercase letters. Informatica Big Data Management (BDM) supports Kerberos authentication for the Informatica domain as well as integrating with Kerberos authenticated Hadoop clusters. When integrating with Kerberos enabled clusters, BDM relies on Kerberos for both metadata import into Developer tool and job execution through Data Integration Service (DIS). Following sections show the configuration of multiple kerberos realm with Informatica domain

     

    Informatica recommends each Informatica Domain to be integrated with one Kerberos Realm.

     

    Server side configuration

    To integrate an Informatica domain with multiple Kerberos realm, a multi-node domain is required. Informatica software can be configured on multiple physical nodes or multiple instances of Informatica software can be configured on the same physical nodes with different set of ports. If Kerberos realms are used with different Hadoop distributions, multiple Data Integration Service (DIS) are required. In this configuration, each DIS is integrated with a unique Kerberos realm and Hadoop distribution. For example let us consider a scenario where Informatica Domain_01 needs to be integrated with a Cloudera Cluster that relies on Kerberos Realm REALM-01and HortonWorks cluster that relies on Kerberos Realm REALM-02. Such a setup requires two Data Integration Services DIS_01 and DIS_02. DIS_01 is configured on a node node_01 and DIS_02 is configured on a different node node_02. Since multiple Hadoop distributions are involved, multiple Cluster Configuration Objects (CCO) are also created. For example, CCO_Cloudera_CDH and CCO_HortonWorks_HDP are created where, CCO_Cloudera_CDH is used to execute jobs on DIS_01 submitted from node_01 that rely on Kerberos realm REALM-01 and CCO_HortonWorks_HDP is used to execute jobs on DIS_02 submitted from node_02 that rely on Kerberos realm REALM-02. Steps for this configuration are as follows:

    • Create DIS_01 through Administrator console or CLI
    • Create DIS_02 through Administrator console or CLI
    • For each DIS, configure the following properties
      • In the Administrator console Manage tab Services and Nodes sub tab, select DIS. On the right hand workspace, go to Properties tab Execution options
        • Specify Kerberos Keytab location in Hadoop Kerberos Keytab property
        • Specify Kerberos service principal name (SPN) in the Hadoop Kerberos Service Principal Name property
        • Specify the Hadoop Distribution Directory based on the Hadoop distribution directory that this DIS needs to connect to. For example, /opt/Informatica/services/shared/hadoop/cloudera_cdh5u8

      • In the Administrator console  Manage tab  Services and Nodes sub tab, select DIS. On the right hand workspace, go to Processes tab Custom Properties
        • Add the custom property with property name JVMOption and value as -java.security.krb5.conf and value as $INFA_HOME/services/shared/security/krb5.conf

    DIS Custom Property

      • At the Operating System level, copy the krb5.conf to below locations
        1. $INFA_HOME/services/shared/security
        2. $INFA_HOME/java/jre/lib/security
    • Perform any post install configuration as documented in the Hadoop Integration Guide
    • Restart the Data Integration Service

     

    Client side configuration

    Each Developer tool is configured to integrate with a certain Hadoop distribution / Kerberos Realm. Developer tool can be configured to use multiple Kerberos REALMS and Hadoop distributions using the -data option in the run.bat. This requires to make multiple copies of the batch file. In the Developer tool install location: For each instance of the run.bat, perform the following configuration:

    • Make multiple copies of the run.bat file - each batch script will point to a Kerberos Realm and a Hadoop distribution
    • Edit each run.bat file in the $INFA_HOME/clients/DeveloperClient location

    • Add the following content to the run.bat file above the developerCore.exe line. Replace the paths, filenames, principal name and the keytab name accordingly
      • set KRB5CCNAME=C:\secureLocation\infaKRB_cc
      • ..\java\jre\bin\kinit.exe -J-D"java.security.krb5.conf=C:\secureLocation\krb5.conf" -k -t C:\secureLocation\principal.keytab principalName/hostName@REALM
      • Where:
        • C:\secureLocation\infaKRB_cc is the location of Kerberos credentials (ticket) cache
        • C:\secureLocation\krb5.conf is the location of the kerberos configuration file
        • C:\secureLocation\principal.keytab is the location pointing to the Kerberos keytab file to be used for metadata import
        • principalName/hostName@REALM is the Kerberos service principal name used for the metadata import
    • Replace the developerCore.exe line with the following:
      • developerCore.exe -data C:\CDH_WS -vmargs -DINFA_HADOOP_DIST_DIR=hadoop\hortonworks_2.6  -Djava.security.krb5.conf=C:\secureLocation\krb5.conf
      • Where:
        • C:\secureLocation\krb5.conf is the actual path for Kerberos configuration file
        • hadoop\hortonworks_2.6 points to the Hadoop distribution the client is expected to connect. Each hadoop distribution is represented as a sub directory within the Informatica Client installation location: $INFA_HOME\clients\DeveloperClient\hadoop. This value must match the value of the property -DINFA_HADOOP_DIST_DIR in developerCore.ini
    • Overall configuration looks as shown below:

    • Restart the Developer tool and launch the Developer tool using run.bat instead of double clicking the Developer.exe file.
    • Perform the steps to configure the Cluster Configuration Object (CCO) on the client side to enable metadata import. This is detailed in the Hadoop Integration Guide
    • Perform any post install configuration as documented in the Hadoop Integration Guide

     

    Support information

    • This approach is certified on Big Data Management 10.2. Prior versions are not certified.
    • This approach is currently supported for BDM Developer tool only
    • This approach is certified only when Hadoop cluster is Kerberized and Informatica Domain is non Kerberized
    • Other client tools such as Analyst and Reference Data Management are not supported
    • Only Big Data Management product is certified with this approach. Other Big Data and non Big Data products including Informatica Data Quality, Enterprise Information Catalog and Intelligent Data Lake are not supported

     

    References

     

     

    Authors

    • Naveen Babu Garla
    • Keshav Vadrevu

     

    Applicable to

    • Informatica Big Data Management 10.1.1 and above