Enterprise Data Catalog: Mind map & Lifecycle

Version 1

    EDC Mind map

     

    EDC Lifecycle

     

     

    Ingestion

    An external or Informatica data source is ingested through a EDC resource. A resource is a repository object that represents an external data source in EDC world.The management of the external data source as an EDC data asset is accomplished at resource level.

    Every data asset  ingested through a resource has metadata information “cataloged” in EDC for data assets like table or views from RDBMS/Hive, basic profiling is also executed.

    Profiling

    Enterprise data discovery and profiling is an optional operation , that can be executed if desired.

    During an enterprise discovery or profiling activity catalog service passes control to data integration service. DIS stores the intermediate results in associated profiling warehouse. As a final step the profiling information is fetched and entered into EDC catalog inside Hadoop cluster.

    EDC profiles can be executed in native , Hive or Blaze execution engines.

    Classify

    Technical metadata is enhanced by assigning semantic meaning to the technical assets. EDC provides data domains and composite domains that elongate the metadata with semantic meaning.

    Index

    Metadata ingested and cataloged during ingestion is indexed for faster search and retrieval

    Annotate

    Business definition to technical assets can be assigned by handshaking Axon business glossary terms with EDC technical assets.

    Discover

    All assets Ingested , classified and annotated can be discovered in catalog search. The metadata can be curated by users during discovery process.

    User interaction

    Users interact with EDC using a web-based client, users interact with the catalog with two separate web-based user interfaces.

    • Catalog Administrator- Using Catalog admin users can
      • Monitor the schedule or ad-hoc execution of metadata scans
      • Manage resource security, i.e. who can do what with the metadata resources
      • Create a new or edit existing resource information.
      • Create custom attributes or customize or complete missing lineage inside of EDC

     

    • Catalog search -  This is primary interface for users of EDC, using catalog UI users can
      • search data asset.
      • curate and enrich metadata.
      • relate physical data assets to logical data assets.

      The search interface completes the search text with matching assets in the catalog. Even if the user types an incorrect , EDC engine compares the typed text with assets in catalog and tries to autocorrect the typed text