How to bring custom metadata in EDC (ETL example)

Version 3

    In this article we will show how to define a new model to import metadata in the catalog to represent programs (and their logic) that moves data across system in the enterprise. Programs should be searchable object in the catalog with an instantiation of the object in the lineage diagram for easy understanding of the object implication in the data lineage. Programs will be classified within categories as we want to represent “Mapping” program as well as other type of programs such as reporting programs, archiving programs, etc.

     

    Create a model

     

    Follow this document for more information on models and how to manage them.

     

    Model for program logic representation

     

    As we want to be able to represent the program as an object in the catalog that could contain members (incoming parameters, outcoming outputs). The model to represent the program logic can be derived from the core model using core.DataSet and core.DataElement. The program should be part of a category. We can represent the model this way:

     

    Program Logic Model Diagram

     

     

    Here is the representation of the model in the EDC administration UI

     

    EDC Model Admin

     

    The model XML file can be viewed here:

     

    Custom-Scanners/proglogic_indent.xml at master · Informatica-EIC/Custom-Scanners · GitHub

     

    Create a resource type

    To allow the creation of a resource to load the metadata, we then need to create a resource type that will reference the models to be used to import the metadata as well as the class(es) which will be used to create connection endpoints used to link other resources. A resource type can point to multiple models, this means that all models classes and attributes will be available to be loaded. In this case we can create a resource type that uses the Proglogic model we created and use ProgramType for connection types. The connection type will determine at which level the auto connection assignment will be performed following the hierarchy of object to match with the other resource (e.g. Database/schema/Table).

     

    EDC Resource Type Admin

     

    Create a resource and load the metadata

    To load the metadata, we need to create a resource that will allow use to provide the metadata to be loaded as CSV files. The metadata is separated in 2 types of files

    • Object files, the filenames can be any name and contains the list of objects (class instances) to create in the catalog.
    • Link file, the filename must be links.csv and contains the links between the objects loaded as part of the resource.

    When creating the resource, you can provide the files as a zip file (CSV files should be at the root of the zip file and have headers containing the headers provided when you download the sample file for the resource type.).

     

    EDC Resource admin

     

    content file can he found here

     

    • The object file looks like the below file and creates the ProgramCategory, Program and its ports

     

    EDC Objects CSV

     

    The results look like the below for the a program overview:

     

    EDC Program Overview

     

    Links contains the parent child associations

     

    EDC Links CSV

     

    Program Port overview looks like:

     

    EDC Program Port Overview

     

     

    Create lineage with external object in the catalog

     

    Now to link the program we have created, we need to import lineage information with other metadata source. For this, you can use a custom lineage resource that will allow to bring import the lineage via CSV file.

     

     

    EDC Custom Lineage admin

     

    an example of the custom lineage file can be found here

     

    EDC Custom Lineage CSV

     

    Summary lineage at program level is provided by the DataSetDataFlow associations from core model

    Field level lineage is provided by the DirectionalDataFlow associations from core model

     

     

    Here is the lineage view at the program level:

     

    EDC Table Level lineage

     

     

    Here is the lineage view at the field level:

     

    EDC Field Level Lineage

     

     

    Validate the custom metadata content

     

    Before loading custom metadata you can validate the content of the zip file you create and make sure that the created content will be imported properly, the following knowledge base article details how to use the validation utility.

     

    https://kb.informatica.com/h2l/HowTo%20Library/1/1270-HowToUseCustomMetadataValidationUtilitywithEnterpriseDataCatalog.p…