Custom Metadata Model in EDC

Version 2

    What’s a model?

     

    In EDC, you can define models to represent metadata coming from different systems. There are predefined models in the catalog that are used to load metadata for supported sources.

    The predefined models can also be reused to define new model by extension or derivation (an object class can have a superclass), or to simply define new resource types to allow bringing the metadata via CSV files for unsupported metadata sources.

     

    One of the models that can be reused easily if the core model. It defines principal classes allowing to represent most of the metadata sources is the core model.

     

    EDC Core Model

     

    The core model is the base for many of the scanner allowing use to represent most of the datastore type of sources (relational databases, file systems, etc)

    • Data Source is the main container (e.g. Database, root dir)
    • Data Set is the entity that holds the data (e.g. Table, View, Files)
    • Data Element is the individual information storage (e.g. fields, attributes, etc)

    Lineage link are relationship that define data movement

    • DataSourceDataFlow, lineage link between 2 data sources
    • DataSetDataFlow, lineage link between 2 data sets
    • DirectionalDataFlow, lineage link between 2 data elements

    Other relationships can be parent-child relationship, or other type of relationship

    • DataSourceDataSet, parent child relationship between data source and data set
    • DataSetDataElement, parent child relationship between data set and data element
    • PkFkRelatedDataSets, JoinRelatedDataSets, LookupRelatedDatasets are relationship used to represent primary key, foreign key relationship between data sets, data sets related as join together in a mapping or used as a lookup to enriched data.

    Deriving classes from the core model will allow to benefit from features available for the core model.

    • Attributes defined for the core model classes
    • DataSet class has certifications, reviews and comments, questions and answers capabilities

    The model definitions are available in the EDC administration UI, by going on the page manage > models. You can also download the model definition XML by selecting a model and clicking on the export button.The model XML definition file contains all the information required to define the model in the catalog. It contains:

    • Model (package) information (name, description, version)
    • Classes available in the model
    • Attributes associated with each the classes
    • Associations between the classes defined in the model.

    In the next sections, we will provide templates to use to build more easily a model.

     

    Model template

    Below is a template for the XML file to build in order to create a model.

     

    EDC Model Template

     

    It is also available on github following this link: https://github.com/Informatica-EIC/Custom-Scanners/blob/master/Model_Templates/model_template.xml

     

    For each of the section (Classes, attributes, associations), a template is also available on GitHub.

     

    Class template

    EDC Class Template

    https://github.com/Informatica-EIC/Custom-Scanners/blob/master/Model_Templates/class_template.xml

     

    Attribute template

    EDC Attribute Template

     

    https://github.com/Informatica-EIC/Custom-Scanners/blob/master/Model_Templates/attribute_template.xml

     

    Association template

     

    EDC Association template

     

    https://github.com/Informatica-EIC/Custom-Scanners/blob/master/Model_Templates/association_template.xml