Advanced custom lineage metadata ingestion

Version 1

    We saw in a previous article that we could document the transformation logic using custom metadata, in this article we will see how we can leverage EDC functionality to define with more detail and show the transformation logic more clearly in the catalog using the new custom metadata framework features available in 10.4.

    Define a model

    EDC has a pre-built model to derive from called ETL model, you can derive the model you want to use to represent your metadata from the ETL model.

     

    ETL Model

     

    • Derive the “ETL model” to create your Data Flow objects
      • Mapping instance will render source/target panel, Controlling asset panel

    Mapping overview

      • Transformations, Groups and Fields are secondary objects (not searchable, no overview)
    • Use associations to define lineage at multiple level (Summary, detailed, control)
      • core.DataSetDataFlow, Summary lineage association between 2 datasets (Tables, Views, etc)

     

    Summary lineage data set level

     

      • core.DirectionalDataFlow: Summary lineage association between 2 data elements (Table Columns, File Fields, etc)

     

    Summary lineage data element level

     

      • com.infa.ldm.etl.DetailedDataSetDataFlow: Transformation logic lineage association between 2 datasets, use com.infa.ldm.etlcore.Operation to document transformation.

     

    Detailed lineage dataset level

     

      • com.infa.ldm.etl.DetailedDataFlow: Transformation logic lineage association between 2 data elements use com.infa.ldm.etlcore.Operation to document transformation
      • com.infa.ldm.etl.DetailedDataSetDataElementDataFlow: Transformation logic lineage association between data element from data sets and transformation logic objects (e.g. mapping)

     

    Detailed lineage Datalement level

     

      • com.infa.ldm.etl.DetailedDataSetMappingSetDataFlow: Lineage association to render asset summary lineage and impact analysis. To be set between data sets (table, etc) and transformation logic object (e.g. mapping)

     

    Asset Summary lineage and impact

     

     

      • core.DataSetControlFlow: Lineage association to render asset control summary. To be set between data sets (table, etc)

     

    Control summary

     

     

    Build custom metadata

    To show how to build custom metadata we will work on an example. We define a custom model with the following package name: org.demo.custom.etl001 with the following classes with associated super class to inherit some of the behavior or the out-of-the-box ETL model:

    Class name

    Super Class

    Mapping

    1. com.infa.ldm.etlcore.Mapping, core.DataSet

    Transformation

    1. com.infa.ldm.etlcore.Transformation

    Group

    1. com.infa.ldm.etlcore.Group

    Field

    1. com.infa.ldm.etlcore.Field

     

    In addition, we also reuse icons from the out-of-the-box model in order to show an icons for the custom object we will be ingesting to the catalog. The model definition contains a zip file name icons.zip like:

     

    Model definition zip file

     

    Model definition zip file icons

     

     

    Define objects

    In our example, we want to represent a data mapping called “mapping_aggregatorTx” which is composed by 3 transformations and propagate the fields among those transformation with associated data transformation.

    We define those objects in the objects_Agg.csv (the file can be name anything but must be prefixed with “objects_”)

     

    Object CSV file

     

    Define links between objects

    We define the parent child relationship between the object in the links_agg.csv (the file can be name anything but must be prefixed with “links_”)

     

    Link CSV file

     

     

    Define lineage

    We define the lineage relationship between the object in the lineage_agg1.csv (the file can be name anything but must be prefixed with “lineage_”). Note that we have added the attribute com.infa.ldm.etlcore.Operation for some of the link that we want to display the data transformation applied on the fields

     

    Lineage CSV file

     

     

    See attached example to start ingesting advanced custom lineage metadata