We saw in a previous article that we could document the transformation logic using custom metadata, in this article we will see how we can leverage EDC functionality to define with more detail and show the transformation logic more clearly in the catalog using the new custom metadata framework features available in 10.4.
Define a model
EDC has a pre-built model to derive from called ETL model, you can derive the model you want to use to represent your metadata from the ETL model.
- Derive the “ETL model” to create your Data Flow objects
- Mapping instance will render source/target panel, Controlling asset panel
- Transformations, Groups and Fields are secondary objects (not searchable, no overview)
- Use associations to define lineage at multiple level (Summary, detailed, control)
- core.DataSetDataFlow, Summary lineage association between 2 datasets (Tables, Views, etc)
- core.DirectionalDataFlow: Summary lineage association between 2 data elements (Table Columns, File Fields, etc)
- com.infa.ldm.etl.DetailedDataSetDataFlow: Transformation logic lineage association between 2 datasets, use com.infa.ldm.etlcore.Operation to document transformation.
- com.infa.ldm.etl.DetailedDataFlow: Transformation logic lineage association between 2 data elements use com.infa.ldm.etlcore.Operation to document transformation
- com.infa.ldm.etl.DetailedDataSetDataElementDataFlow: Transformation logic lineage association between data element from data sets and transformation logic objects (e.g. mapping)
- com.infa.ldm.etl.DetailedDataSetMappingSetDataFlow: Lineage association to render asset summary lineage and impact analysis. To be set between data sets (table, etc) and transformation logic object (e.g. mapping)
- core.DataSetControlFlow: Lineage association to render asset control summary. To be set between data sets (table, etc)
Build custom metadata
To show how to build custom metadata we will work on an example. We define a custom model with the following package name: org.demo.custom.etl001 with the following classes with associated super class to inherit some of the behavior or the out-of-the-box ETL model:
In addition, we also reuse icons from the out-of-the-box model in order to show an icons for the custom object we will be ingesting to the catalog. The model definition contains a zip file name icons.zip like:
In our example, we want to represent a data mapping called “mapping_aggregatorTx” which is composed by 3 transformations and propagate the fields among those transformation with associated data transformation.
We define those objects in the objects_Agg.csv (the file can be name anything but must be prefixed with “objects_”)
Define links between objects
We define the parent child relationship between the object in the links_agg.csv (the file can be name anything but must be prefixed with “links_”)
We define the lineage relationship between the object in the lineage_agg1.csv (the file can be name anything but must be prefixed with “lineage_”). Note that we have added the attribute com.infa.ldm.etlcore.Operation for some of the link that we want to display the data transformation applied on the fields
See attached example to start ingesting advanced custom lineage metadata