What are we announcing?
Who would benefit from this release?
This release is for all customers and prospects who need big data management, data quality and data integration solutions.
What’s in this release?
We see a strong interest in our customers to start or expand their big data initiatives. Informatica provides the most comprehensive big data management solution to enable customers to quickly turn big data into business value.
As part of Informatica 10.1.1, we are excited to release two new products:
- Enterprise Information Catalog (EIC), the next generation business-user oriented enterprise-wide metadata catalog.
- Informatica Intelligence Streaming (IIS), which enables continuous data capture and processing for real-time analytics on streaming data.
In this release, we also added new features, updated product availability matrix (PAM) support, improved performance, and expanded connectivity for our existing products.
High-level new capabilities for this release are described in detail below.
Big Data Management (BDM)
- Expand and update support for Hadoop distributions
- Cloudera HDP 5.8
- Hortonworks 2.5
- IBM BigInsights 4.2
- AWS EMR 4.6AWS EMR 5.0
- Microsoft Azure HDInsight 3.4
- TDCH through Sqoop
- Cassandra version upgrade
- Blaze support for HBase
- Silent configuration option for Cloudera Manager and Ambari-based distributions
- Azure HDInsight configuration
- Deployment of Informatica binaries using Ambari Stacks and Services
- Ambari integration
- Eliminated relational database client installation through
- JDBC support for Lookup transformation
- JDBC support for data quality transformations
- Advanced Hive functionality with support for
- Create, append, and truncate tables
- Partitioned and bucketed Tables
- Char, Varchar, Decimal 38 data types
- Quoted identifiers in column names
- SQL-based authorization
- SQL overrides and Hive Views
- Partitioning support for the Data Masking Option
- Advanced transformations: Update Strategy, global sort order through the Sorter transformation, Data Process transformation
- Summary report capabilities in the Blaze Job Monitor
- Enhanced Spark capability with:
- Spark 2.0 support
- Security support for Sentry, Ranger, operating system profiles, and transparent encryption
- Hive and Sqoop lookup support
- Java transformation support
- Binary data type support
- HBase support
- Performance optimizations
- Enhanced cloud support
- Single-click BDM image deployment on Microsoft Azure and Amazon AWS marketplace
- Connectivity for AWS S3 and RedShift
- Better security on the MapR Hadoop cluster with support for MapR tickets
- Additional connectivity
• Sqoop for Blaze and Spark mode of execution
- Enhanced installation and configuration
- Enhanced the Blaze engine with advanced capabilities
• Hive connected and unconnected lookups
- Enhanced workflows with support for nested gateways and Control tasks
Intelligent Data Lake (IDL)
- Improvements in data preparation
• Ability to select columns, filter rows, and randomization for sampling
- Lookup function
- Sentry storage and table-level authorization
- Cloudera CDH 5.8, Hortonworks 2.5
- Windows 10 Edge browser 38.14, Safari 9.1.2
- SUSE 12
- Data preview and ingestion from external RDBMS sources (using JDBC through Sqoop)
- Publication performance improvements with Blaze
- Export data from the lake to an external RDMS (using JDBC through Sqoop)
- Export data from the lake as a Tableau data extract file
- Granular activity tracking for import, export, publish, copy, delete, etc.
- Enhanced security support
• Ranger storage, table, row-level authorization, and masking policy support
- Updated PAM Support
Enterprise Information Catalog (EIC) – New standalone product with the 10.1.1 release
Enterprise Information Catalog helps data architects, data stewards, and data consumers analyze and understand large volumes of metadata in the enterprise. Users can extract metadata for many objects, organize the metadata based on business concepts, and view data lineage and relationship information for each object. In essence, it is the ‘Google’ for the enterprise, providing a unified view of all data assets and their relationships.
Enterprise Information Catalog maintains a catalog. The catalog serves as a centralized repository that stores all the metadata extracted from different external sources. Enterprise Information Catalog extracts metadata from external sources such as databases, data warehouses, business glossaries, data integration resources, or business intelligence reports. For ease of search, the catalog maintains an indexed inventory of all the assets in an enterprise. Assets represent the data objects such as tables, columns, reports, views, and schemas. Metadata and inferred statistical information in the catalog include profile results, information about data domains, and information about data relationships.
The early version of EIC was part of the BDM package is now a standalone product. in 10.1.1.
New capabilities and enhancement for this release are:
• Effective Metadata Management
- Business Glossary Integration: Integrated Business Glossary ensures alignment of business concepts with technical data assets. It also maximizes accuracy of searches for data assets using business terminology as well as navigate relationships.
- Column Level Lineage and Impact Analysis: Column/Metric level data lineage helps track data from origin to destination through multiple ETL flows. The detailed visualization helps in impact analysis to assess impact of any changes. It also helps in identifying the right source for any specific field in any given report, file, or table.
- Resource-Level Security: With resource-level security, catalog administrators can restrict metadata access to users and groups ensuring controlled visibility of non-public resources in the catalog.
- Synonym Support: Users can directly upload a synonym file to the catalog. These synonyms are then used by the system to match asset names in search by referring to them with their synonyms.
- Smart Domains (Domain by Example): With smart domains, catalog users can associate domains to data assets directly in the catalog. System learns from these associations to automatically associate the domain to similar columns across the enterprise.
- Data Similarity: Data similarity uses machine learning techniques to cluster similar columns to compute the extent to which data in two columns are the same. Data similarity is internally used by smart domains for domain propagation. It is also available as a relationship in the column relationship diagram.
- Domain Curation: Users can prove or reject existing domain associations in rule-based and smart domains
- Domain Proximity: Columns describing the same entity are generally found together in data assets. Domain proximity utilizes these groupings while performing inference, penalizing conformance when proximal domains are not found in the same table or file.
- Domain Management: New domain management capabilities allow users to add, view, and edit domains directly from LDM Administrator.
• Open and Extensible Platform
- Universal Connectivity Framework: EIC 10.1.1 allows users to connect to a broad range of enterprise data sources including databases, data warehouses, big data systems, BI systems, cloud applications and more. This connectivity is provided through metadata bridges from our partner MITI.
- New Connectivity
- Hive/HDFS on EMR and HDInsight: Metadata scanning support for Hive on EMR and HDInsight distributions.
- OBIEE: Support for extracting BI report metadata from OBIEE
- SAP R/3: New scanner for SAP applications
- Microsoft SSIS: New scanner for extracting lineage metadata from SSIS
• Performance Enhancements
- Profiling on Blaze is up to 25X faster than Hive on MapReduce
- 50% faster Metadata Ingestion compared to 10.1 .
- Similarity Inference that scales linearly with additional resources.
- Simplified cluster configuration which uses the Ambari or Cloudera Manager URL to determine other parameters automatically.
- Pre-Validation checks to report all deployment errors upfront. Helps with fixing all deployment errors quickly instead of going through an iterative process
- Improved logging with removal of redundant messages
• PAM Support
• Hadoop Distribution Deployment support
- Cloudera 5.8
- Hortonworks 2.5
- New versions added for existing scanners
- IBM DB2 11.1
- Microsoft SQL Server 2016
- New Scanners
- Hive/HDFS on EMR 5.0
- Hive/HDFS on HDInsight 3.4
- OBIEE 11
- Microsoft SSIS 2008R2 and 2012
- SAP R/3 5 and 6
Informatica Intelligent Streaming (IIS) – New product with the 10.1.1 release
Informatica Intelligent Streaming enables customers to design data flows to continuously capture, prepare, and process streams of data with the same powerful graphical user interface, design language, and administration tools used in Informatica's Big Data Management.
Out of the box, IIS provides pre-built high-performance connectors such as Kafka, JMS, HDFS, NoSQL databases, and enterprise messaging systems as well all data transformations to enable a code-free method of defining the customer's data integration logic.
Informatica Intelligent Streaming builds on the best of open source technologies in an easy-to-use enterprise-grade offering. In tandem with BDM's data processing capabilities, it provides a single platform for customers to discover insights and build models that can be then operationalized to run in near real-time and capture and realize the value of high-velocity data.
It will significantly reduce the time and effort organizations require to build, run and maintain streaming-based data integration architectures and allow them to focus on building low-latency data delivery mechanisms for real-time reporting, alerting and/or visualizations.
Initially built to execute leveraging the Streaming libraries in Apache Spark, it can scale out horizontally and vertically to handle petabytes of data while honoring business service level agreements (SLAs). The automatic generation of whole classes of data flows at runtime based on design patterns means that the business logic is only lightly coupled to the runtime technology, allowing for future application of that logic in the next generation of frameworks, as they mature.
- IIS provides the following capabilities:
- Allows users to create and execute streaming (continuous-processing) mappings
- Leverages the Spark streaming engine as the execution engine which provides high scale and availability
- Provides management and monitoring capabilities of streams at runtime
- At-least-once delivery guarantees
- Granulate lifecycle controls based on number of rows processed or time of execution
- IIS comes with the following Streaming/Messaging/Big Data adapters
- Source: Kafka, JMS
- Target: Kafka, JMS, HBase, Hive, HDFS
- IIS in combination with VDS can also source data from various Streaming sources such as Syslog, TCP, UDP, flat file, MQTT, etc.
- IIS supports following data types and formats (only for payloads with simple or flat hierarchies)
- IIS supports the following transformations:
- (New with IIS) Window transformation is added for streaming use cases with the option of sliding and tumbling windows
- Filter, Expression, Union, Router, Aggregate, Joiner, Lookup, Java and Sorter transformations can be used with streaming mappings and are executed on Spark
- Lookup transformations can be used with Flat file, HDFS, Sqoop, and Hive
- Hadoop Distribution Support
- Cloudera 5.8
- Apache Spark 2.0
- Cloudera Distributed Spark 1.6
- Hortonworks 2.5
- Apache Spark 2.0
- Security Support
- Kerberized Hadoop Cluster Support
Platform PAM Update
- Operating System Update:
• Solaris 11
- Windows 10 Client support
- SQL Server 2016
- IBM DB2 11.1
- Oracle RAC / SCAN certification for PC and Mercury
- Chrome 54.x
- v 7.0.70
- Database Support Update:
- Web Browser Update:
• Microsoft Edge Browser (Windows 10)
- Tomcat Support Update:
- JVM Support Update:
- Oracle Java 1.8.0_102
- IBM JDK 184.108.40.206
- V10.1.1 PAM Highlights:https://kb.informatica.com/proddocs/PAM%20and%20EOL/1/PAM%20for%20Informatica%20Platform%20v10.1.1.xlsx?myk=10%201%201%20pam
Informatica Upgrade Advisor
- Informatica Upgrade Advisor assesses existing Informatica environment and checks for upgrade readiness. The tool runs a list of rules and provides an upgrade readiness report. Effective in version 10.1.1, you can run the Informatica Upgrade Advisor to check for actions that you need to take before you perform an upgrade.
Informatica Data Quality (IDQ)
- Exception management
- Task-based data security features
- Centralized auditing enabling enterprise-wide deployment
- Nested parallel execution providing performance boost
- Terminate workflow task
- Reference data pushdown optimizations for Hadoop
- No database driver installation required on compute nodes for reference data
- Synchronized pushdown of address validation data on compute nodes
- Address validation
- AV 5.9 integration
- ISTAT for Italy
- INE Code for Spain
- AV 5.9 integration
PowerExchange Mainframe and CDC
- PAM updates
- Improved or extended functionality
- Windows 10 client support
- DB2UDB V11.1
- I5/OS 7.3
- SQL-Server 2016
- Solaris support re-established
· New functionality
- SQL-Server access from Linux
- SMF reporting enhancements
- DB2 read (via Datamap) “LOB” datatype support
- Netezza Multiple Schema Support
- This can be consumed at multiple places within Metadata Manager (Lineage, Catalog, etc.)
- Support for both single and multiple schemas
- Can view all artifacts for all multiple schemas within the Catalog object
- Addresses use cases where table is part of one schema and the corresponding view is part of another schema
- Platform XConnect Improvements
- Removed the need for workflow dependencies to be deployed to applications for metadata load
Profiling and Discovery
- Scorecard Dashboard Drilldown
- Scorecard dashboards will now allow users to drill down to the details and navigate them towards actionable results
- A separate drilldown pane is provided to view the drilldown results in the Scorecard homepage
- Hive/Hadoop Connection Merge for Blaze Mode
- Hive and Hadoop connections are merged and seen as “Hadoop” for run time environments
- Blaze mode will be the preferred mode of big data execution while Hive will be used as a fallback option for functional issues
- For customers upgrading to 10.1.1, execution mode of pre-10.1.1 profiles will switch to Blaze after upgrade
- Blaze Support for Profiling Drilldown
- Both profile and scorecard drilldown operations are now pushed down to Blaze (when the execution mode is set to Blaze)
- Drastic reduction of profiling drilldown time while leveraging the benefits of performance optimized Blaze environments (vs the Data Integration Service)
- Profile-level logs will continue to be available while logs for Yarn jobs are available under the Blaze Grid Manager