Authors: RashmiRekha00u176ruk... Skip navigation
1 2 Previous Next

Big Data Management

17 Posts authored by: Rashmi Rekha Gogoi

What are we announcing?

Informatica 10.2.2 HotFix 1 Service Pack 2

 

Who would benefit from this release?

This release is for all Big Data customers and prospects who want to take advantage of updated Hadoop distribution support as well as fixes to the core platform, connectivity, and other functionality. You can apply this service pack after you install or upgrade to Informatica 10.2.2 HotFix 1. If you are already on 10.2.2 HotFix1 Service Pack 1, this 10.2.2 HotFix 1 Service Pack 2 can be installed directly.

 

What’s in this release?

Big Data PAM

Distribution Support:

  • Cloudera CDH: 6.2, 6.1, 5.16, 5.15, 5.14, 5.13
  • Hortonworks HDP: 2.6.x, 3.1.x
  • MapR: 6.0.1 with MEP 5.0, 6.1 with MEP 6.0
  • Azure HDInsight: 3.6.x WASB, ADLS Gen1, ADLS Gen2
  • Amazon EMR: 5.16.x, 5.20
  • Google Cloud Dataproc 1.3
  • Databricks 5.1, 5.3
  • WANdisco enabled CDH 5.16 and HDP 2.6.5 on RH7

Big Data Streaming

  • Support for latest Cloudera, Hortonworks, MapR, HDInsight and EMR versions
  • Bug fixes and improvements

Enterprise Data Catalog

  • This update provides bug fixes for functional and performance improvements. Informatica recommends that Enterprise Data Catalog customers on 10.2.2 HotFix 1 / 10.2.2 HotFix 1 Service Pack 1 apply this service pack.

Enterprise Data Preparation

  • Functional and performance improvements
  • EDP now supports WANdisco enabled HDP 2.6.5 on RH

 

Release Notes & Product Availability Matrix (PAM)

The Informatica Global Customer Support Team is excited to announce an all-new technical webinar and demo series – Meet the Experts, in partnership with our technical product experts and Product management. These technical sessions are designed to encourage interaction and knowledge gathering around some of our latest innovations and capabilities across Data Integration, Data Quality, Big Data and so on. In these sessions, we will strive to provide you with as many technical details as possible, including new features and functionalities, and where relevant, show you a demo or product walk-through as well.

 

Topic and Agenda

 

  • Topic: Meet the Experts Webinar - "End-to-End Data Engineering for AI & Analytics on Microsoft Azure"
  • Date: Wednesday, 23 October 2019
  • Time: 9:00 AM Pacific Daylight Time (PDT)
  • Duration: 1 Hour
  • Webinar Registration Link: Webinars – Webcasts | Informatica Talks | Informatica
  • Speakers:
    • Sumeet Agrawal, Product Management, Informatica 
    • Vamshi Sriperumbudur, Product Marketing, Informatica

 

Successful next-generation AI and analytics projects require you to ingest, process, and govern all types of data at all latencies. It can be a challenge—but it doesn't have to be.

 

Our complimentary webinar, "End-to-End Data Engineering for AI & Analytics on Microsoft Azure," shows you how Informatica's Data Engineering portfolio can help you ingest all types of data into the Azure Data Lake Store (ADLS), batched or in real-time. You will learn about:

 

  • Building data pipelines to feed ADLS with data engineering integration in a Spark serverless mode
  • Leveraging Spark Structured Streaming for real-time analytics that extend to the edge
  • Finding, preparing, and operationalizing trusted data with Informatica Enterprise Data Preparation
  • Performing powerful search, data lineage, and impact analysis with Enterprise Data Catalog

 

Whether you've just started to consider Azure for analytics and AI or you're already using it, you won't want to miss this webinar and demo.

What are we announcing?

Informatica 10.2.2 HotFix 1 Service Pack 1

Who would benefit from this release?

The release is for all Big Data customers and prospects who want to take advantage of updated Hadoop distribution support as well as fixes to core platform, connectivity and other functionality. You can apply this service pack after you install or upgrade to Informatica 10.2.2 HotFix 1.

What’s in this release?

Big Data PAM

Distribution Support:

  • Cloudera CDH: 6.2, 6.1, 5.16, 5.15, 5.14, 5.13
  • Hortonworks HDP: 2.6.x, 3.1.x
  • MapR: 6.0.1 with MEP 5.0, 6.1 with MEP 6.0
  • Azure HDInsight: 3.6.x WASB, ADLS Gen1, ADLS Gen2
  • Amazon EMR: 5.16.x, 5.20
  • Google Cloud Dataproc 1.3
  • Databricks 5.1, 5.3

Big Data Streaming

  • Apache Kafka version 2.3.x support

Enterprise Data Catalog

  • The update provides bug fixes for functional and performance improvements. Informatica recommends that Enterprise Data Catalog customers on 10.2.2 HF1 apply this service pack.

Enterprise Data Preparation

  • Ability to change delimiters and text qualifiers during file preparation

Release Notes & Product Availability Matrix (PAM)

The Informatica Global Customer Support Team is excited to announce an all-new technical webinar and demo series – Meet the Experts, in partnership with our technical product experts and Product management. These technical sessions are designed to encourage interaction and knowledge gathering around some of our latest innovations and capabilities across Data Integration, Data Quality, Big Data and so on. In these sessions, we will strive to provide you with as many technical details as possible, including new features and functionalities, and where relevant, show you a demo or product walk-through as well.

 

Topic and Agenda

 

 

To get the most value from your data, you need to maintain a robust, stable production environment. Informatica Big Data Management (BDM) has been enhanced to help you do that with integrated DevOps and DataOps.

 

Join our complimentary webinar, "Operationalize Big Data Management With Integrated DevOps and DataOps," to learn more about what's new in BDM.

 

You will learn about:

  • Leveraging version control systems like Git
  • Invoking Informatica BDM processes from open source technologies like Jenkins
  • Using concurrency, stability, and other operationalization enhancements

 

Don't miss this opportunity to operationalize your big data management and extract more value from your big data.

What are we announcing?

Informatica 10.2.2 HotFix 1

 

Who would benefit from this release?

This release is for all Big Data and Enterprise Data Catalog customers and prospects who want to take advantage of the new capabilities and updated Hadoop distribution support. The release also includes fixes to core platform and connectivity. It includes support for new environments as well as fixes to support stable deployments.

 

What’s in this release?

This release includes Big Data Management, Big Data Quality, Big Data Streaming, Enterprise Data Catalog, and Enterprise Data Preparation capabilities.

  • Big Data PAM Update
    • Distribution Support:
      • Cloudera CDH: 6.2, 6.1, 5.16, 5.15, 5.14, 5.13
      • Hortonworks HDP: 2.6.x, 3.1 (Tech Preview)
      • MapR: 6.0.1 with MEP 5.0, 6.1 with MEP 6.0
      • Azure HDInsight: 3.6.x WASB, ADLS Gen1
      • Amazon EMR: 5.16.x, 5.20
      • Databricks 5.1
    • New Relational Systems:
      • Oracle 18c (Source/Target)
  • Enterprise Data Catalog Updates
    • Scanners
      • SAP HANA (Metadata Only): New scanner for SAP HANA that can extract object and lineage metadata. Lineage metadata includes calculation view to table lineage. Profiling is not supported.
      • ADLS Gen 2(Metadata Only): New scanner for Azure Data Lake Store Gen 2 to extract metadata from files and folders. All formats supported by ADLS Gen 1 scanner are supported for Gen 2. Profiling is not supported.
      • Profiling Warehouse Scanner: Extract profiling and domain discovery statistics from an IDQ or a BDQ profiling warehouse. Users who have already run profiling and enterprise discovery in IDQ/BDQ can now extract these profiling results and visualize them in EDC.
      • SAP PowerDesigner: Extract database model object from physical diagram including internal lineage. Model objects can be linked to physical object from other scanners.
      • (Tech Preview) Lineage Extraction from Stored Procedures: Ability to extract data lineage at the column level for stored procedures in Oracle and SQL Server.
      • (Tech Preview) Oracle Data Integrator: Ability to extract data lineage at the column level with transformation logic from Oracle Data Integration.
      • (Tech Preview) IBM Datastage: Ability to extract data lineage at the column level with transformation logic from IBM Datastage jobs.
      • Enhanced MS SQL Server scanner: support for windows based authentication using the EDC agent
    • Scanner Framework
      • Case insensitive linking: Ability to mark resources as case sensitive/insensitive. A new link ID is now generated for every object based on the above property. Useful for automatic lineage linking where ETL/BI tools refer to the object using a different case compared to the data source.
      • Offline scanner: support added for Sybase, IBM DB2 LUW, IBM DB2 z/OS, Netezza, Mysql, JDBC, PowerCenter, Informatica Platform, File Systems, Tableau, MicroStrategy, Hive, HDFS, Cloudera Navigator, Atlas
      • Custom Scanner Enhancements: The following new features are available for users of custom scanners:
        • Pre-Scripts: Users can now configure pre-scripts that are run before scanner execution. This allows running any custom extraction jobs or setup tasks.
        • File Path: Users can now configure a file path to pick the scanner assets csv, instead of uploading it to the catalog. The file should be either mounted or copied to the Informatica Domain machine with read permissions. This helps with automating and scheduling custom scanner runs.
      • Custom Scanner Framework Enhancements: The following new features are available for the developers of custom scanners:
        • Assign icons:  Ability to assign icons to types in the custom model
        • Detailed Lineage: Custom scanners can now include detailed lineage views which are rendered like transformation lineage from any native scanner.
        • Custom relationships: Ability to add custom relationships to be displayed in the relationship diagrams
      • Business User Experience
        • Search Operators: New search operators - AND, OR, NOT, double quotes, title: and description: for advanced search queries.
        • Search Tabs: Administrators can now create “Search Tabs” designed to personalize search experience by user groups and individual users. These search tabs are created with pre-selected facets that apply to a set of users/groups. EDC creates the following search tabs by default: “All”, “Data Objects”, “Data Elements”, “Business Terms”, “Reports” and “Resources”.
      • EDC Plug-In

        • Enterprise Data Catalog Tableau Extension: Enterprise Data Catalog Extension is a native extension for Tableau dashboard that you can use within Tableau within a Tableau Desktop, Tableau Server, and all the web browsers supported by Tableau version 2018.2.x onwards.
      • Supportability
        • Progress logs for re-index and bulk import/export
        • Utility to collect logs that is now expanded to support Cloudera in 10.2.2 HF1
      • (Tech Preview) Data Provisioning
        • (Tech Preview) Data Provisioning: After discovery, users can now move data to a target where it can be analyzed. EDC works with IICS to provision data for end users. Credentials are supplied by the users for both the source and the target.
          • Supported Sources in this release: Oracle, SQL Server
          • Supported Targets in this release: Amazon S3, Tableau Online, Oracle, Azure SQL DB
        • (Tech Preview) Live Data Preview: Users can now preview source data at the table level by providing source credentials.
      • CLAIRE
        • Intelligent Glossary Associations: The tech preview capability in 10.2.2 for linking glossaries to technical metadata is now GA. Additionally, EDC now supports auto-association of glossaries to objects at the table/file level.
      • PAM
        • Deployment Support
          • Cloudera: CDH 6.2, 6.1, 5.16, 5.15, 5.14
          • Hortonworks: HDP 2.6.x, (Tech Preview) HDP 3.1
        • Source Support
          • Hive, HDFS on CDH 6.1, 6.2
          • Hive, HDFS on HDP 3.1
          • Oracle Data Integrator 11g, 12c
          • Profile Warehouse on Oracle, SQL Server and IBM DB2 for Informatica 10.1.1 HF1, 10.2, 10.2.1, 10.2.2
          • SAP PowerDesigner 7.5.x to 16.x
          • SAP Hana DB 2.0
  • EDP Updates
    • Hadoop distribution support (aligned with Big Data Management)
    • Performance improvement in Preparation
    • Search alignment with Enterprise Data Catalog: Alignment with EDC in terms of search results and user experience (example: search tabs)
  • Connectivity Updates
    • Sqoop mapping with override query using aliases support in Spark mode
    • PAM certification for HBase for ecosystems: Cloudera, Hortonworks, MapR, AWS and Azure.
    • "--boundary-query" for specifying custom SQL query support for Sqoop import
  • Platform PAM Update
    • Oracle 18c - added
    • JVM support update: Azul OpenJDK 1.8.0_212 – updated

 

Release Notes & Product Availability Matrix (PAM)

 

Informatica 10.2.2 HotFix 1 Release Notes: https://docs.informatica.com/big-data-management/shared-content-for-big-data/10-2-2-hotfix-1/big-data-release-notes/abstract.html

 

PowerExchange Adapters 10.2.2 HotFix 1 Release Notes: https://docs.informatica.com/data-integration/powerexchange-adapters-for-informatica/10-2-2-hotfix-1/powerexchange-adapters-for-informatica-release-notes/abstract.html

 

PAM for Informatica 10.2.2 HotFix 1: https://network.informatica.com/docs/DOC-18280

 

You can download the Hotfixes from here.

The Informatica Global Customer Support Team is excited to announce an all-new technical webinar and demo series – Meet the Experts, in partnership with our technical product experts and Product management. These technical sessions are designed to encourage interaction and knowledge gathering around some of our latest innovations and capabilities across Data Integration, Data Quality, Big Data and so on. In these sessions, we will strive to provide you with as many technical details as possible, including new features and functionalities, and where relevant, show you a demo or product walk-through as well.

 

Topic and Agenda

 

 

If you need to integrate and ingest large amounts of data at speed and scale, Informatica has two new big data cloud services to help.

 

Join our complimentary Meet the Experts webinar on July 16 to discover the capabilities of Informatica Intelligent Cloud Services (IICS) Integration at Scale and IICS Ingestion at Scale. You will learn:

  • How to lower overall TCO with CLAIRE-based auto scaling and provisioning of serverless Spark support
  • How to manage streaming and IoT data with real-time monitoring and lifecycle management
  • How to accelerate AI/ML and advanced analytics projects with Informatica Enterprise Data Preparation and DataRobot

 

If you want to create proof of concept for a big data project in just six weeks, turn your data lake into a modern data marketplace, and more, you won't want to miss this deep dive and demo.

The Informatica Global Customer Support Team is excited to announce an all-new technical webinar and demo series – Meet the Experts, in partnership with our technical product experts and Product management. These technical sessions are designed to encourage interaction and knowledge gathering around some of our latest innovations and capabilities across Data Integration, Data Quality, Big Data and so on. In these sessions, we will strive to provide you with as many technical details as possible, including new features and functionalities, and where relevant, show you a demo or product walk-through as well.

 

Topic and Agenda

 

 

Once the host approves your request, you will receive a confirmation email with instructions for joining the meeting.

 

Here is the agenda for the webinar:

  • Spark Architecture
    • Spark Integration with BDM
    • Spark shuffle
    • Spark dynamic allocation
  • Journey from Hive, Blaze, to Spark
  • Spark troubleshooting and self-service
  • Spark Monitoring
  • References
  • Q & A

 

Speaker Details

 

The session will be presented by Vijay Vipin and Ramesh Jha, both Informatica BDM SMEs. They have been supporting our customers for over 5 years and have developed a niche across all aspects of BDM product portfolio.

What are we announcing?

Informatica 10.2.1 Service Pack 2

Who would benefit from this release?

This release is for all Big Data customers and prospects who want to take advantage of updated Hadoop distribution support as well as fixes to core platform, connectivity, and other functionality. You can apply this service pack after you install or upgrade to Informatica 10.2.1.

What’s in this release?

Big Data PAM Update

Applies to Big Data Management, Big Data Quality, and Big Data Streaming

  • Distribution Support
    • Cloudera CDH: 5.11.x, 5.12.x, 5.13.x, 5.14.x, 5.15.x
    • Hortonworks HDP: 2.5.x, 2.6.x
    • MapR 6.0 with MEP 5.x
    • Amazon EMR 5.14.x
    • Azure HDInsight 3.6.x

Enterprise Data Lake

  • Bug fixes for functional and performance improvements

Enterprise Data Catalog

  • Bug fixes for functional and performance improvements

Informatica recommends that all Enterprise Data Catalog customers on 10.2.1 apply this service pack.

Informatica 10.2.1 SP2 Release Notes

PAM for Informatica 10.2.1 SP2

You can download the Hotfixes from here.

What are we announcing?

Informatica 10.2.2 Service Pack 1

 

Who would benefit from this release?

This release is for all Big Data customers and prospects who want to take advantage of updated compute cluster support, updated streaming capabilities and security enhancements as well as fixes to core platform, connectivity, and other functionality. You can apply this service pack after you install or upgrade to Informatica 10.2.2.

 

What’s in this release?

 

Big Data PAM Update

Applies to Big Data Management, Big Data Quality, and Big Data Streaming

 

  • Distribution Support:
    • Cloudera CDH: 5.15, 5.16, 6.1
    • Hortonworks HDP: 2.6.5, 3.1 (Tech Preview)
    • MapR: 6.0.1 with MEP 5.0, 6.1 MEP 6.0
    • Azure HDInsight: 3.6.x WASB
    • Amazon EMR 5.16.x, EMR 5.20
    • Databricks 5.1
  • Security Enhancements:
    • Security enhancements for AWS. The following security mechanisms on AWS are now supported:
      • At rest:
        • SSE-S3
        • SSE-KMS
        • CSE-KMS
      • In transit:
        • SSE-SE
        • SSE-KMS

 

Big Data Streaming

  • Connectivity and Cloud
    • New connectivity: Native connectivity to Amazon S3 targets
    • Connectivity enhancements: Filename port support for HDFS targets
  • Stream processing and analytics
    • Message header support in streaming sources: JMS standard headers support
    • Enhanced MapR distribution support:
      • Support for Kafka in MapR distributions
      • Support for secured MapR Streams

Connectivity

  • Security Enhancements:
    • Certified SQL Server for SSL support with Sqoop

 

Enterprise Data Lake - now renamed to Enterprise Data Preparation

  • Product Rename:
    • With this release, Informatica Enterprise Data Lake is now renamed to Informatica Enterprise Data Preparation.
  • Distribution Support:
    • Cloudera CDH: 5.15, 5.16, 6.1
    • Hortonworks HDP: 2.6.5, 3.1 (Tech Preview)
    • MapR: 6.0.1 with MEP 5.0, 6.1 MEP 6.0
    • Azure HDInsight: 3.6.x WASB
    • Amazon EMR 5.16.x, EMR 5.20
  • Functional Improvements:
    • Users can preview and prepare Avro and Parquet files in the data lake.
    • Users can revert all data type inferencing within a single worksheet during data preparation.
    • Administrators can disable automatic data type inferencing for all worksheets in all projects.

 

Enterprise Data Catalog

  • Distribution Support Updates for EDC External Cluster Support:
    • Cloudera CDH: 6.1
    • Hortonworks HDP: 3.1 (Tech Preview)

 

Release Notes & Product Availability Matrix (PAM)

 

Informatica 10.2.2 SP1 Release Notes: https://docs.informatica.com/big-data-management/shared-content-for-big-data/10-2-2-service-pack-1/big-data-release-notes.html

 

PAM for Informatica 10.2.2 SP1:   https://network.informatica.com/docs/DOC-18072#comment-37896

Executive Summary:

 

Informatica Big Data Management (BDM) and Informatica Big Data Quality (BDQ) mappings having Decimal manipulations and source data size is greater than 256 MB, which when executed on Blaze engine can potentially cause data inconsistencies in decimal port values. This issue is being tracked as bug # BDM-24814 and is known to manifest under the following conditions:

 

  1. Active transformations with Decimal ports

Potential Data loss and dropped rows

    • Filters and Router with Decimal datatype ports on filter condition
    • Joiners with condition ports on Decimal datatype

     2. Passive transformations with Decimal ports

Potential data inconsistency with columns with decimal datatypes getting changed to NULL.

    • Expression with decimal manipulation

 

Affected Software

 

Informatica BDM/BDQ 10.0.x

Informatica BDM/BDQ 10.1.x

Informatica BDM/BDQ 10.2.0, 10.2.0 HF1, 10.2.0 HF2

Informatica BDM/BDQ 10.2.1, 10.2.1 SP1

Informatica BDM/BDQ 10.2.2

 

Suggested Actions

 

Step 1: Refer to Executive Summary and knowledge base article KB-575249 to identify if you are impacted

Step 2: If impacted, perform the following task to resolve the issue:

  • BDM/BDQ 10.2.1 – Apply Service Pack 2 (tentative release date is mid-May)
  • BDM/BDQ 10.2.2 – Apply Service pack 1 (tentative release date is mid-May)
  • BDM/BDQ 10.1.1 HF1 - Apply Emergency Bug Fix (EBF) that is available for download from https://tsftp.informatica.com

/updates/Informatica10/10.1.1 HotFix1/EBF-14519

  • Other BDM/BDQ versions: Please reach out to Informatica Global Customer Support

 

Informatica strongly recommends applying this patch for all Informatica environments that fall into the problem scope defined in the executive summary.

 

Frequently Asked Questions (FAQs) related to this advisory:

 

Q1: What is the scope of this advisory?

A: This advisory is applicable to Informatica Bigdata Management 10.0,10.1.0,10.1.1,10.2.0,10.2.1, 10.2.2 and running mappings in Hadoop pushdown mode using Blaze engine only. This advisory is not applicable if you are using any other versions of Informatica platforms like Informatica DataQuality and PowerCenter.

 

Q2: I am using one of the affected product versions and also have other Emergency Bug Fixes (EBFs) applied. What should I do?

A: You might need a combination EBF that includes the previous fix(es) as well as the fix for the issue covered in this advisory. Please contact Informatica Support to confirm if you would need a combination EBF.

 

Q3: Whom should I contact for additional questions?
A: For all questions related to this advisory, please contact your nearest Informatica Global Customer Support center.

https://www.informatica.com/services-and-training/support-services/contact-us.html

 

Disclaimer

INFORMATICA LLC   PROVIDES   THIS   INFORMATION ‘AS   IS’ WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES  OF  MERCHANTABILITY,  FITNESS FOR A  PARTICULAR  PURPOSE  AND ANY  WARRANTY  OR CONDITION OF NON-INFRINGEMENT

 

Revisions

V1.0 (April 22, 2019): Customer advisory published

The Informatica Global Customer Support Team is excited to announce an all-new technical webinar and demo series – Meet the Experts, in partnership with our technical product experts and Product management. These technical sessions are designed to encourage interaction and knowledge gathering around some of our latest innovations and capabilities across Data Integration, Data Quality, Big Data and so on. In these sessions, we will strive to provide you with as many technical details as possible, including new features and functionalities, and where relevant, show you a demo or product walk-through as well.

 

Topic and Agenda

 

 

Once the host approves your request, you will receive a confirmation email with instructions for joining the meeting.

 

Here is the agenda for the webinar.

 

1. Overview of Blaze architecture and components

2. Blaze configuration (hadoopEnv.properties and beyond)

3. Logs location and collection

4. Common issues and troubleshooting

5. Tips and Tricks

 

This session is intended for BDM customers who are executing their mappings/profiles/scorecards using Blaze execution engine. At the end of this session, customers will be able to get insights into Blaze architecture, various components, and services associated with Blaze, how to troubleshoot the most common issues and how to access/provide logs to Informatica Support, which GCS requires for troubleshooting.

 

Speaker Details

 

The presenter for this session is Sujata, an Informatica GCS veteran, handling IDQ and BDM products for the past 5 years and has developed a niche in troubleshooting Blaze related issues.



What are we announcing?

Informatica Big Data Release 10.2.2

 

Who would benefit from this release?

This release is for all customers and prospects who want to take advantage of the latest Big Data Management, Big Data Quality, Big Data Streaming, Enterprise Data Catalog, and Enterprise Data Lake capabilities.

 

What’s in this release?

This update provides the latest ecosystem support, security, connectivity, cloud, and performance while improving the end-to-end user experience.

 

Big Data Management (BDM)

 

Enterprise Class

  • Zero client configuration: Developers can now import the metadata from Hadoop clusters without configuring Kerberos Keytabs and configuration files on individual workstations by leveraging the Metadata Access Service. Metadata Access Service now supports OS Profiles (when enabled) and can be executed on multiple nodes as a GRID
  • Mass ingestion: Data analysts can now ingest relational data into HDFS and Hive for both initial and incremental loads. Mass Ingestion service can now fetch incremental data based on date columns or numeric columns, persist the last values fetched in the previous run and automatically use them in the subsequent runs.
  • SQOOP enhancements: SQOOP connector now supports high levels of concurrency and the ability to fetch incremental data
  • Bitbucket Support: Big Data Management administrators can now configure BitBucket (in addition to Perforce, SVN, and Git) as the external versioning repository
  • Go-Live assistance: Release managers can now incrementally deploy objects into applications instead of overwriting the entire applications
  • Robustness & Concurrency: Data Integration Service is now highly robust can process 6 times more concurrent requests than it did in previous releases. The startup time of the Data Integration Service is improved by 2x.
  • Resilience: The Data Integration Service can now automatically reparent to the jobs that continue to run on the Hadoop clusters - even after the Data Integration Service experiences a crash or unexpected restart
  • Queuing: Data Integration Service is now enhanced to queue the jobs submitted to it and persists the requests so that the requests do not have to be resubmitted in case the Data Integration Service experiences a crash or unexpected restart
  • REST Operations Hub: Operations teams can now perform REST queries that fetch the job status, row level statistics and other monitoring information for deployed mappings
  • Dynamic mappings: Dynamic mappings can now be used across various data types and various ecosystems including AWS and Azure 

Advanced Spark

  • Advanced Data Integration: Spark now supports high precision decimals and executes the Python transformation many times faster than previous releases
  • Dynamic mappings: Complex data types such as Arrays, Structs and Maps can now be used in dynamic mappings
  • Debugging made easy: With the introduction of automatic Spark-based data preview, developers can now debug advanced spark mappings that contain complex types and stateful functions as easily as they preview the native mappings
  • CLAIRE integration: Big Data Management now integrates with Intelligent Structure Discovery (that is part of Informatica Intelligent Cloud Services) to provide machine learning capabilities in parsing the complex file formats such as Weblogs

Cloud and Connectivity

  • Core connectivity:
    • SQOOP connector is optimized to run faster and is designed to eliminate staging wherever possible
    • HBase sources and targets can be used with dynamic mappings
    • Schema drift is now supported. Changes in the source systems can now be applied on to the Hive targets
    • Data can now be loaded to Hive in Native mode
  • Amazon ecosystem:
    • Developer productivity is increased with the ability to use S3 and Redshift sources and targets in dynamic mappings
    • S3 data objects now support wild card characters in the file names.
    • File names can be dynamically generated for S3 objects using the target based FileName port
    • Many additional properties in the Redshift data objects can now be parameterized.
  • Azure ecosystem:
    • Developer productivity is increased manifold with the ability to use ADLS and WASB sources and targets in dynamic mappings
    • Intelligent Structure Discovery is now supported with Azure ADLS and WASB
  • Azure Databricks: BDM now offers support for managed cluster computation on Azure Databricks
  • Containerization: Implementation teams can now build docker containers of BDM images and deploy them per their enterprise needs

 

Platform PAM Update

 

  • Operating System Update:
    • RHEL - 7.3 & 6.7 - Added
    • RHEL - 7.0 , 7.1 ,7.2 and 6.5 , 6.6 - Dropped
    • SUSE 12 SP2 - Added
    • SUSE 12 SP0 & SP1 -  Dropped
    • SUSE 11 SP4 - Added
    • SUSE 11 SP2 & SP3 - Dropped
  • Database support:
    • Azure SQL DB  - Added
  • Authentication Support:
    • Windows 2012 R2 & 2016 (LDAP and Kerberos) - Added
    • Azure Active Directory (LDAP only) - Added
  • Tomcat Support:
    • v 7.0.88 (No update)
  • JVM Support Update:
    • Azul OpenJDK 1.8.0_192 - Added
    • Oracle Java - Removed
    • Effective Informatica 10.2.2, Informatica platform supports Azul OpenJDK, instead of Oracle Java since Oracle has changed its Java licensing policy, ending public updates for Java 8 effective January 2019. Azul OpenJDK would come bundled with the product.
  • Model Repository - Versioned Controlled
    • BitBucket Server 5.16 Linux (hosting service for repositories) - Added
    • Perforce - 2017.2 - updated
    • Visual SVN - 3.9 - updated
    • Collabnet Subversion Edge - 5.2.2 - updated
  • Others
    • Microsoft Edge Browser (Win 10) 40.15 - updated
    • Internet Explorer -11.x - updated
    • Google Chrome- 68.0.x - updated
    • Safari - 11.1.2 ( MacOS 10.13 High Sierra) - updated
    • Adobe Flash Player  - 27.x - updated

Informatica Docker Utility

  • Use the Informatica Docker Utility to create a custom Docker Container images for Big Data Management and then run the Docker Container image to create an Informatica Domain. The Informatica Docker utility provides a quick and easy process to install the Informatica Domain in Docker Containers.

 

Big Data Streaming (BDS)

 

Enterprise Class Streaming Data Integration

  • CLAIRE integration: Big Data Streaming now integrates with Intelligent Structure Discovery (that is part of Informatica Intelligent Cloud Services) to provide machine learning capabilities in parsing the complex file formats and support dynamically evolving schemas
  • Resilience: Data Integration Service can now automatically reparent to the jobs that continue to run on the Hadoop clusters - even after DIS experiences a crash or unexpected restart
  • Queuing: The Data Integration Service is now enhanced to queue the jobs submitted to it and persist the requests so that the requests do not have to be resubmitted in cases the Data Integration Service experiences a crash or unexpected restart
  • Incremental Deployment: Release managers can now incrementally deploy objects into applications instead of overwriting the entire applications
  • Latest Spark Version support: Big Data Streaming now supports Spark 2.3.1
  • Java Transformation support on HType Data

 

Advanced Cloud Support

  • Amazon ecosystem:
    • Profile based authentication support for AWS Kinesis service
    • Cross Account authentication support for AWS Kinesis Streams service
    • Support for secure EMR cluster with Kerberos authentication
  • Azure ecosystem:
    • Support for deploying streaming mappings with an Azure EventHub source in Azure cloud with an HDInsight cluster

 

Enhanced Streaming Data Processing and Analytics

  • Spark Structured Streaming support
    • Big Data Streaming now supports processing based on event time
    • Big Data Streaming can now integrate “out of order” data and process it in the same order as it was generated at the source
  • Message Header support
    • Better streaming data processing based on message metadata
    • Supports header metadata-based analytics without parsing the complete message
  • Machine Learning and Advanced Analytics support
    • Supports execution of Python script in streaming mapping with improved performance
  • Latest Apache Kafka version support
    • Big Data Streaming now supports Kafka 2.0

 

Intelligent Structure Discovery

 

Intelligent Structure Discovery is now integrated with Big Data Management and Big Data Streaming on Spark to allow high performance parsing of various file types with data drift handling.

  • Performance enhancements
    • Improved runtime performance for some use cases (JSON and XML) by up to 10X compared to the previous release
  • Improved handling of Data Drift with an Unassigned Port
    • Data not identified by the model will be routed to the unassigned port and not dropped
  • Data Type propagation
    • Intelligent Structure Discovery automatically discovers the model field names data types. When importing the model to the platform these Identified fields are propagated to the transformation with the corresponding data type
  • Handling of “Special” Characters
    • Intelligent Structure Discovery Models that contain characters that do not comply with the platform naming convention are automatically replaced with a compliant character
  • Enhanced Parsing Engine
    • Improved handling of XML files (Attributes and namespaces).
    • Support Discovery and parsing of multiple sheet Excel files
  • Improved Design Time
    • The Intelligent Cloud Design is enhanced with a Find functionality and the ability to apply actions on multiple elements (for example rename)

 

Enterprise Data Lake (EDL)

 

Core Data Preparation

 

  • New Advanced Data Preparation functions: Users can utilizemore than 50 new advanced data preparation functions for Statistical, Text, Math, Date/Time manipulations. Window functions help with calculation on a data window such as Rolling Average, Rolling Sum, Lead/Lag, Fill, Sessionize etc. Cluster and Categorize function uses phonetic algorithms to cluster data and then help users easily standardize. Delete duplicate rows function helps removing exact duplicates from the data.
  • Apply Active Rules: Users can Apply external pre-built rules with Active Transformations to support DQ processing like Fuzzy Matching and Consolidation. Expert users can use Informatica Developer Tool to build complex rules including active transformation and then expose those to the data preparation users. This helps collaboration, standardization, extensibility and re-usability.
  • Data Preparation for Avro and Parquet files: Users can add Avro and Parquet files to a project in addition to Hive tables and other file formats such as delimited files and JSONL files. This eliminates the need for creating a Hive table on top of files. They can structure the hierarchical data in row-column form by extracting specific attributes from the hierarchy, and canexpanding (or exploding) arrays into rows in the worksheet.

 

Self-service and Collaboration

  • Functional and UX Improvements: Users can apply conditions during aggregation, reorder sheets as needed. Recipe panel clearly shows steps that failed during data and recipe refresh. The “Back-in-time” functionality is now more on-demand to improve user experience for Edit step, Copy/Paste step etc.
  • Improved CLAIRE based recommendations: Improved user productivity with additional CLAIRE based recommendations for alternate assets upstream and related by PK-FK. During join, users are prompted to review sampling criteria in case of low overlap of join keys. User also get new data prep suggestions based on data types that are handy shortcuts to frequently used functions.
  • Ability to add recipe comments: Users can add comments to various recipe steps, view comments by other users for better collaboration and auditing.
  • Save mappings for Recipes: Users have option to save mappings corresponding to recipes for a worksheet instead of executing full at-scale execution and creation of new output table. This way expert IT users can inspect the mapping and execute at the appropriate time and resource levels.

 

Enterprise Focus

  • Support for S3, ADLS/WASB and MapR-FS files: Users can prepare data files directly from various file systems such as AWS S3, Azure ADLS and WASB and also for MapR-FS in addition to HDFS.
  • Spark Execution: Spark is used as the default execution engine for better performance and it also allows data prep users apply executing rules built with mapplets using advanced transformations such as Python transforms .
  • Autoscaling on AWS EMR: Customers can start with minimal number of EMR nodes and then auto-scale based on rules for resource consumption to lower overall total cost of ownership for data lakes in AWS.
  • Integration with Informatica Dynamic Data Masking: Data protection and governance is improved using Informatica Dynamic Data Masking. Based on DDM policies, data will be masked at various touch points such as preview, prepare, publish and download etc.
  • Scalability improvements: Performance, scalability and longevity improvements have been made in various services to support enterprise scale deployments with large number of users.

 

Enterprise Data Catalog (EDC)

 

  • Collaboration: Data Analysts, Data Scientists and Line of Business users will now be able to find the most relevant, most trusted datasets for their analytic needs faster with Enterprise Data Catalog(EDC) v10.2.2. EDC 10.2.2 includes both top down and bottom up collaboration capabilities that bring to forefront the otherwise deeply siloed knowledge about trustworthiness and usefulness of datasets. This new capability will help data consumers save weeks, sometimes months of efforts in finding and using the right dataset.
    • Dataset Certifications: With EDC v10.2.2, Subject Matter Experts, Data Stewards and Data Owners will be able to certify datasets and data elements adding context information like data usage and constraints. Using EDC’s machine learning based semantic search, EDC will surface these certified datasets at the top of the search results to guide users to use these certified datasets among all other similarly named datasets in the organization.
    • Reviews and Ratings: Data consumers like Data Analysts and Data Scientists can now review and rate datasets. EDC pushes datasets that are rated highly to the top of the search results. There are new facets that are available to narrow down search results to highly rated datasets only.
    • Questions and Answers: Users will be able to use a new question/answer platform that allows subject matter experts to answer the most common questions of the data consumers. This will help data consumers to find experts, ask questions and see answers in the context of the dataset. For subject matter experts, this will mean less work and more reuse of information as they need not response to multiple emails and phone calls for the same queries on data.
  • Change Notifications: With change notifications, EDC will provide data consumers an easy way to stay on top of any metadata changes happening to their data assets. Users will be able to follow any datasets in the catalog and whenever scanners detect any changes to these datasets, both in-app and email notifications will be sent to the user. Additionally super users like database administrators, stewards and owners can follow entire databases and other metadata resources to get notified on any changes happening in the database.
  • Intelligent Business Glossary Associations: One of the most important and most tedious data governance tasks is to associate business glossary terms to physical data assets. In EDC v10.2.2 the glossary association process is a lot more easier. By using the CLAIRE based AI engine, the right business glossary terms are matched with the right physical assets at the data element level. This method uses the data domain discovery and data similarity capabilities to power automatic glossary associations with the goal of making the data stewards and business analysts responsible for this task about 2X more productive by providing these machine learning based assistants.
    • Business Glossary Assignment Report: EDC v10.2.2 includes a new business glossary assignment report at the resource level to help data stewards understand glossary association coverage for a resource in one place. Data Stewards will also be able to curate(accept/reject) all glossary recommendations from this new report as well.
  • Metadata and Profile Filters: Catalog and profile only selected metadata from databases, data warehouses and big data sources. Users will be able to provide both inclusion and exclusion criteria to filter datasets that are cataloged and profiled. The filter criteria can be a list of names or regular expressions that are matched against table/view names.
  • Remote Metadata Scanner: Catalog metadata from data sources that are behind a firewall or are remote with port restrictions. With EDC v10.2.2 a direct network connection from the Catalog to the data source is no longer required. Remote Metadata Scanner Utility can be downloaded and setup in a server close to the data source/in the same network and the extracted metadata can be uploaded to the catalog. Currently only metadata scan is supported for Oracle, SQL Server and Teradata.
  • New Scanners
    • Workday: Manage Workday metadata for governance, risk/compliance andself service analytics
    • Google BigQuery: Manage Google BigQuery metadata for governance, risk/compliance andself service analytics
  • Performance Improvements: EDC v10.2.2 includes a new graph schema that improves the performance of tasks like parameter assignment (63x faster), resource purge (5x faster) and re-index (2.5x faster). Additionally, there are all round scanner performance improvements in the areas of auto-connection assignments (340x faster), SAP Business Objects scanner (1.5x faster), Oracle scanner (2x faster) and IBM Cognos scanner (2x faster).

 

PAM for Informatica 10.2.2: https://network.informatica.com/docs/DOC-18072

 

Informatica 10.2.2 Release Notes

 

Introduction

Customers use Informatica Big Data Management (BDM) product to access metadata (using developer client tool) as well as data (through Data Integration Service) for Hadoop based sources (HDFS files, HBase, Hive, MapR-DB) as well as non-Hadoop sources. One of the major pain points with accessing Hadoop data sources is related to the non-trivial configuration effort that goes in configuring access to the Hadoop systems including the Kerberos configuration using kinit tool, keytab files etc. This document explains how Metadata Access Service simplifies the configuration effort and also enables more secure metadata access from Hadoop data sources.

Metadata access process without Metadata Access Service

The metadata access process before Metadata Access Service to import metadata from a Hadoop data source like HDFS files, HBase, Hive, MapR-DB included the following steps. Note that the below steps were needed to be performed on each developer client box installation that needed to be configured to access metadata from a Hadoop data source.

  1. Informatica developer needs to execute kinit command on the developer client box to get and cache the initial ticket-granting ticket from the KDC server. This requires providing appropriate keytab and krb5 configuration files to individual developers and asking them to execute the command manually before requesting metadata using the developer client. Since keytab files include sensitive information, distributing the same to each developer box appropriately and asking developers to execute these commands manually requires a lot of careful handling on the customer side.
  2. If the cluster is SSL enabled, developer needs to import the corresponding certificates in each developer client installation using keytool commands to import the certificates in the jre folder.
  3. Developer also needs to export the cluster configuration XML from Informatica Admin console and manually extract the zip file and place the same into the 'conf' folder under the appropriate Hadoop distribution folder on the developer client installation.
  4. Developer also needs to update the variable 'INFA_HADOOP_DIST_DIR' defined in 'developerCore.ini' file on each client box under the client installation if connecting to a Hadoop distribution other than the default Cloudera version.
  5. Finally, after performing the above steps (needed on each developer client box), developer can launch the import wizard for the data source in the developer client to import and save the metadata.

As apparent from the above steps, performing the above steps on each developer client box (customer may potentially have tens or even hundreds of developer client installations) is a big hassle. Metadata Access Service was introduced to ease the above configuration and also provide improved security architecture w.r.t metadata access for Hadoop data sources.

 

Metadata access using Metadata Access Service

Metadata Access Service is intended for enabling metadata access to Hadoop data sources(HDFS Files, HBase, Hive and, MapR-DB) from the developer client tool. This is a mandatory service that must be created before any metadata access from Hadoop sources listed can be performed via the developer tool. The service can be created using either Informatica admin console or through command line using infacmd tool.

Metadata Access Service needs to be configured only once by the Informatica Administrator (similar to how existing services like Data Integration Service used for data access by Informatica mappings). If there is a single metadata access service configured and enabled, it'd get picked up automatically by the developer client installations, else the default metadata access service can be selected once at the developer client level and the developer selection gets cached until changed. Metadata Access Service provides a lot of features and advantages compared to the accessing metadata directly from developer client tools.

  1. Metadata Access Service enables configuration of Kerberos specific attributes like keytab location, principal name at a single location.  There is no longer a need to run the kinit command on any developer client boxes since appropriate keytab file location and Service Principal Name can be provided in the MAS configuration in the Admin Console. This is also more secure as we no longer need to distribute and use sensitive information to developer boxes.
  2. Developer can configure multiple metadata access services either for load balancing purpose (to reduce the load on a single service process) or to connect a different Hadoop distribution type (Cloudera, Hortonworks, MapR etc) or a system with a different configuration (keytab, Service Principal Name) is desired. Hence, there is no need to perform steps like downloading cluster configuration files into localHadoop distribution folder on the developer client. Developer can select the appropriate Metadata Access Service name in the developer client as the default service for the current developer client session in case multiple services are configured (similar to how default DIS service is selected).
  3. Configuring access to SSL enabled clusters is also simplified as the SSL certificates for the clusters need to be imported (using keytool command) on a single node (where MAS is configured to run) and not on each developer client box.
  4. Developer can also enable the option to use 'logged in user as impersonation user' similar to DIS to enable current logged in user credentials in developer client box to be used while accessing any Hadoop resources.
  5. Support for centralized logging is enabled, hence, any metadata access related error message would also get captured and persisted in a centralized location (in addition to showing in a popup dialog box as earlier) just like other services like DIS and can be viewed (with features like filtering enabled) in Informatica Admin Console service log console in a similar manner. Without Metadata Access Service, the error messages were shown in a pop-up dialog only in developer tool but could not be retrieved later once the dialog box with an error message is closed by the developer.
  6. Metadata Access Service can be configured to interact with developer client tool over either http or more secure https protocol just like other Informatica services like DIS. Developer can configure the appropriate port (http/https) and keystore/truststore password/file locations (if https is enabled) as part of Metadata Access Service configuration through Admin Console or infacmd.
  7. Support for backup node is also enabled for high availability for Metadata Access Service. Hence, if the primary node where metadata access service runs goes down, the service should come back up on the backup node automatically.
  8. The local Hadoop distribution folders on the developer client boxes are now used during metadata access only if any metadata is accessed from the local file on the same box (where developer client box is installed) eg when a local avro or parquet file is used to import metadata without using a Hadoop File System connection object. Hence, the variable 'INFA_HADOOP_DIST_DIR' needs to be configured (or updated) only if metadata needs to be imported from a local file. In scenarios where a connection is used to import metadata, this variable is no longer required to be configured on the developer client side for metadata access. The size of Informatica Hadoop distribution folder on the developer client is also significantly reduced (by more than 1 GB) as most of the Hadoop distribution related jars/files are required to be deployed as part of only Informatica server-side installation.

Summary

'Metadata Access Service' provides a significant architectural improvement resulting in better security and easier configuration for Informatica Big Data Management client-side developer tool. This reduces the amount of time Informatica developers and administrator need to spend on configuring connectivity to Hadoop adapters like HDFS files, HBase, Hive and MapR-DB for importing metadata into the repository.

Author

Sandeep Kathuria, Senior Staff Engineer

What are we announcing?

Informatica 10.2.1 Service Pack 1

Who would benefit from this release?

This release is for all Big Data Management and Big Data Quality customers and prospects who want to take advantage of the fixes to the Core Platform, Connectivity and installing new Service Pack, or upgrading from previous versions.

What’s in this release?

Big Data Management

  • This update provides the latest Service Pack improving the user experience. All big data customers are recommended to apply this Service Pack

Enterprise Data Catalog

  • This update provides bug fixes for functional and performance improvements. All Enterprise Data Catalog customers on 10.2.1 are recommended to apply this Service Pack.

Enterprise Data Lake

  • This update provides bug fixes for functional and performance improvements.
  • It also includes support for Spark engine execution and Autoscaling in AWS EMR deployments. 

 

Informatica 10.2.1 Service Pack 1 Big Data Release Notes

 

PAM for Informatica 10.2.1

Dear Customer,

 

The Informatica Global Customer Support Team is excited to announce an all-new technical webinar and demo series – Meet the Experts, in partnership with our technical product experts and Product management. These technical sessions are designed to encourage interaction and knowledge gathering around some of our latest innovations and capabilities across Data Integration, Data Quality, Big Data etc. In these sessions, we will strive to provide you with as much technical details including new features and functionalities as possible, and where relevant, show you a demo or product walk-through as well.

 

Topic and Agenda

 

Topic: Meet the Experts Webinar - Sizing and Tuning for Spark in Informatica Big Data 10.2.1

Date: 22 August 2018

Time: 8:00 AM PST

Duration: 1 Hour

 

Informatica Big Data Management is the industry’s best solution for faster, more flexible, and more repeatable data ingestion and integration on Hadoop. Hundreds of organizations have adopted Informatica Big Data Management to take advantage of the power of Hadoop without the risks and delays of manual and specialized approaches.  To help you get the most out of Big Data Management, join this webinar to learn best practices for high-performance tuning, sizing, and security to get the most out of Informatica Big Data Management.

 

Learn about:

 

  • Sizing & Capacity Planning for Informatica’s platform and the underlying Hadoop cluster
  • Special Sizing Guidelines for Cloud environments like AWS and Azure
  • Optimal Deployment Architectures
  • Performance Tuning Tips for getting the most of out of engines like Apache Spark

 

Speaker: Vishal Kamath, Senior Manager, Performance

 

-------------------------------------------------------

To register for this meeting

-------------------------------------------------------

1. Go to https://informatica-events.webex.com/informatica-events/j.php?RGID=r3184b4bb4fb135c8bc85c3f88874273c

2. Register for the meeting.

3. Check for the confirmation email with instructions on how to join

 

To view in other time zones or languages, please click the link:

https://informatica-events.webex.com/informatica-events/j.php?RGID=rd12dbd328d01a6f900a9533009559810

 

 

-------------------------------------------------------

For assistance

-------------------------------------------------------

1. Go to https://informatica-events.webex.com/informatica-events/mc

2. On the left navigation bar, click "Support".

 

You can also contact us at:

network@informatica.com

 

Regards,

MeetTheExperts Team