If you are using both Enterprise Data Catalog and Google Catalog, you may want to replicate some of the tags from Google Catalog into EDC.

 

this post contains sample code/scripts to enable tag replication & instructions on how to install and use it.

 

If you have comments/feedback about this process - leave any comments here.

 

Features

  • extracts the tag model from google catalog (does not currently work for policy tags - leave a comment if you need/want this)
  • uses a configuration file to map google catalog tags to EDC custom attributes
    default - matching attribute/tag labels will automatically be synchronized.  you can also choose to map to google catalog tags to different custom attributes in EDC
  • for any table/view or file in google catalog that has tags:-
    • will lookup the object in EDC, if found - generates an entry into a bulk import (csv) file
  • if there are changes to the EDC content, will submit a bulk import job via API
  • uses google provided command-line utility scripts to extract google catalog tag metadata

 

Process Flow

google data catalog tag sync - process flow

this tag replicator process will use 3 google data catalog utility scripts to get details about the tag templates & tag usage & will then import those values into EDC via a bulk import csv file.

  • datacatalog-util tag-templates - generates a csv file with a list of all of the tag fields
  • datacatalog-util - tags export - exports tag values into csv files for each tag template
  • datacatalog-util filesets export - exports tag values for filesets
  • gcp_tag_sync.py
    • reads the csv file output from the google catalog scripts
    • connects to EDC - to lookup the object that a tag or set of tags is connected to
      • if found - will compare the current values in EDC to the values in the exported tags from google catalog
        • if different - will add an entry for the EDC bulk import csv file (the file name is time-stamped, so you can see what changes over time)
    • if there are entries in the EDC bulk import file - will import the csv file into EDC using the API

 

Example Screen prints:

google data catalog has this google bigquery table - with tags

Google data catalog - customers_cdc bigquery table - with tags

after replication into edc -

EDC bigquery table after tag replication

Pre-requisites

  • python 3.6+
  • internet connection, to download dependent libraries, like datacatalog-*, requests via pip
  • a google data catalog project, that has objects with tags
  • google service principal credentials file - .json format with access to the project(s) you want to use for tag replication. 
    this file will contain the private key, email address etc used to authenticate/connect to GCP
  • EDC v10.4.x, or 10.5+

 

Installation

  • download the scripts code from this post (see attachement at end): google_catalog_edc_sync-0.1.0.zip
  • unzip to your preferred folder location - can be on edc/infa server, or any other machine
    • note: the start script is created/tested for linux & macOS only.  you can convert gcp_catalog_tag_replicator.sh to a powershell script
  • make sure all .sh files are executable
    chmod +x *.sh
  • create a python virtual environment - e.g.
    python3 -m venv .venv
  • if you change the name/folder of the virtual environment (.venv) you will also need to edit gcp_catalog_tag_replicator.sh

  • activate your virtual environment.  Linux/macOS
    source .venv/bin/activate
    Windows - .venv\Scripts\activate.ps1
  • download required python packages (requires internet connection).
    pip install -r requirements.txt

    Note: the google-cloud-datacatalog packages have dependencies on large libraries like pandas and numpy - these could take a few minutes to download

Configuration

  • EDC Configuration
    • to setup a connection to EDC (creates a .env file with url and credentials)
      python setupConnection.py
      enter edc url - e.g. https://catalogserver:9085
      enter user & pwd or security domain\user and pwd
      • Note:  if you don't want to use this process - set values for INFA_EDC_URL and INFA_EDC_AUTH environment variables
    • see setting up EDC API connection for an example
    • create any EDC custom attributes required to store Google Data Catalog tag values
  • Google Data Catalog Configuration
    • copy the service principal .json file to the current folder (or edit gcp_catalog_tag_replicator.sh to reference it's location)
    • edit gcp_catalog_tag_replicator.sh - setting values for
      • GOOGLE_APPLICATION_CREDENTIALS - with the .json credentials file
        e.g export GOOGLE_APPLICATION_CREDENTIALS=edc-datacatalog-sandbox-xxx6x6b3b5dc.json
      • PROJECT_ID - project to use when connecting to google data catalog
      • optionally - set the .env file with catalog credentials - default is .env -  could use alternate files for dev/test/prod EDC instances
  • Tag Sync Settings
    • edit attribute_map.csv - to list the google data catalog tags that you want to replicate. 
    • this file has 2 columns gc_attribute & edc_attribute_label
    • for each gc_attribute - if an EDC custom attribute exists with the same name, you do not need to add an entry in the edc_attribute_label column (it will assume it is mapped to the same custom attribute label)
    • if you need to map to an attribute in EDC with a different label, then enter the attribute label in the edc_attribute_label column

 

 

Execution

to start the google data catalog to EDC tag sync - us the script gcp_catalog_tag_replicator.sh - it does not take any parameters.

when the process runs - it will create a bulk import csv file, for import into EDC

 

execute the gcp_catalog_tag_replicator.sh script - output should look like below

 

/gcp_catalog_tag_replicator.sh > log.txt

extracting objects from google data catalog

using GOOGLE_APPLICATION_CREDENTIALS=edc-datacatalog-sandbox-bdf6a6b3b5dc.json

using PROJECT_ID=edc-datacatalog-sandbox

INFO:root:

INFO:root:===> Export Tag Templates [STARTED]

INFO:root:

INFO:root:Exporting the Tag Templates...

INFO:root:

INFO:root:GET Tag Template: projects/edc-datacatalog-sandbox/locations/us-central1/tagTemplates/data_discoverability

INFO:root:--------------------------------------------------

INFO:root:     Found!

INFO:root:

INFO:root:GET Tag Template: projects/edc-datacatalog-sandbox/locations/us-central1/tagTemplates/quality_template

INFO:root:--------------------------------------------------

INFO:root:     Found!

INFO:root:

INFO:root:GET Tag Template: projects/edc-datacatalog-sandbox/locations/us-central1/tagTemplates/dg_template

INFO:root:--------------------------------------------------

INFO:root:     Found!

INFO:root:Check the generated file at: ./out/edc-datacatalog-sandbox/gcp/tag_templates.csv

INFO:root:

INFO:root:==== Export Tag Templates [FINISHED] =============

INFO:root:

INFO:root:===> Export Tags [STARTED]

INFO:root:

INFO:root:Exporting the Tags...

INFO:root:

INFO:root:Looking for Tags from Template: data_discoverability...

INFO:root:No Tags found for Template: data_discoverability.

INFO:root:

INFO:root:Looking for Tags from Template: quality_template...

INFO:root:Loading Tags from Entry: projects/edc-datacatalog-sandbox/locations/us/entryGroups/@bigquery/entries/cHJvamVjdHMvZWRjLWRhdGFjYXRhbG9nLXNhbmRib3gvZGF0YXNldHMvY2F0YWxvZ19kYXRhc2V0L3RhYmxlcy9jdXN0b21lcnNfY2Rj...

INFO:root:==> Tags from Template: quality_template exported.

INFO:root:

INFO:root:Looking for Tags from Template: dg_template...

INFO:root:Loading Tags from Entry: projects/edc-datacatalog-sandbox/locations/us/entryGroups/@bigquery/entries/cHJvamVjdHMvZWRjLWRhdGFjYXRhbG9nLXNhbmRib3gvZGF0YXNldHMvdHBjZHMvdGFibGVzL3dlYl9zYWxlcw...

INFO:root:Loading Tags from Entry: projects/edc-datacatalog-sandbox/locations/us/entryGroups/@bigquery/entries/cHJvamVjdHMvZWRjLWRhdGFjYXRhbG9nLXNhbmRib3gvZGF0YXNldHMvY2F0YWxvZ19kYXRhc2V0L3RhYmxlcy9jdXN0b21lcnNfY2Rj...

INFO:root:==> Tags from Template: dg_template exported.

INFO:root:

INFO:root:SUMMARY TABLE

INFO:root:

+------------------------------------------------------------------------------------------+--------------+------------------------+------------------------+---------------------------+-------------------------+---------------------------+------------------------------+-------------------------+

| template_name                                                                            |   tags_count |   tagged_entries_count |   tagged_columns_count |   tag_string_fields_count |   tag_bool_fields_count |   tag_double_fields_count |   tag_timestamp_fields_count |   tag_enum_fields_count |

|------------------------------------------------------------------------------------------+--------------+------------------------+------------------------+---------------------------+-------------------------+---------------------------+------------------------------+-------------------------|

| projects/edc-datacatalog-sandbox/locations/us-central1/tagTemplates/data_discoverability |            0 |                      0 |                      0 |                         0 |                       0 |                         0 |                            0 |                       0 |

| projects/edc-datacatalog-sandbox/locations/us-central1/tagTemplates/quality_template     |            6 |                      0 |                      6 |                         0 |                       0 |                        24 |                            0 |                       0 |

| projects/edc-datacatalog-sandbox/locations/us-central1/tagTemplates/dg_template          |            2 |                      2 |                      0 |                         3 |                       6 |                         0 |                            0 |                       9 |

+------------------------------------------------------------------------------------------+--------------+------------------------+------------------------+---------------------------+-------------------------+---------------------------+------------------------------+-------------------------+

INFO:root:

INFO:root:Check the generated summary file at: ./out/edc-datacatalog-sandbox/gcp/summary.csv

INFO:root:Check additional files for templates with tags at: ./out/edc-datacatalog-sandbox/gcp

INFO:root:

INFO:root:==== Export Tags [FINISHED] =============

INFO:root:

INFO:root:===> Export Filesets [STARTED]

INFO:root:

INFO:root:Exporting the Filesets...

INFO:root:Check the generated file at: ./out/edc-datacatalog-sandbox/gcp/filesets.csv

INFO:root:

INFO:root:==== Export Filesets [FINISHED] =============

google catalog export complete

starting edc tag replication...

reading input files from ./out/edc-datacatalog-sandbox/gcp

reading tag_templates file ./out/edc-datacatalog-sandbox/gcp/tag_templates.csv

field id/label = data_expert /

field id/label = data_owner /

field id/label = max_date / Most Recent Date

field id/label = mean_val / Mean Value

field id/label = std_dev / Standard Deviation

field id/label = score / Quality Score

field id/label = date_created / Date Created

field id/label = null_values / Number of Null Values

field id/label = min_val / Min Value

field id/label = median_val / Median Value

field id/label = unique_values / Number of Unique Values

field id/label = max_val / Max Value

field id/label = zeros / Number of Zero Values

field id/label = count / Number of Values

field id/label = data_category_competitor / Data Category Competitor

field id/label = data_asset_expert / Data Asset Expert

field id/label = data_category_location / Data Category Location

field id/label = data_asset_owner / Data Asset Owner

field id/label = data_category_customer / Data Category Customer

field id/label = data_creation / Data Creation Time

field id/label = data_retention / Data Retention

field id/label = broad_data_category / Broad Data Category

field id/label = data_domain / Data Domain

field id/label = data_confidentiality / Data Confidentiality

field id/label = data_category_financial / Data Category Financial

field id/label = data_category_hippa / Data Category Health

field id/label = data_origin / Data Origin

field id/label = environment / Environment

field id/label = data_ownership / Data Ownership

field id/label = data_category_employee / Data Category Employee

field id/label = data_asset_documentation / Data Asset Documentation

finished reading tag templates

reading common env/env file/cmd settings

ready to check .env file .env

loading from .env file .env

read edc url from .env value=https://infalab01.infaaws.com:9085

replacing edc url with value from .env

replacing edc auth with INFA_EDC_AUTH value from .env

finished reading common env/.env/cmd parameters

validating connection to https://infalab01.infaaws.com:9085

api status code=200

validated connection: 200 {'releaseVersion': '10.5.0.1', 'buildVersion': '58', 'buildDate': 'Mon Jun 14 08:41:18 UTC 2021'}

listing edc custom attributes

edc custom attributes found = 13

label=Quality Score id=com.infa.appmodels.ldm.LDM_5bb82b5b_aa54_427c_a653_812727176a26

label=GCP Source id=com.infa.appmodels.ldm.LDM_5f52d16e_2956_4e03_94b8_68e88c468aa4

label=Data Confidentiality id=com.infa.appmodels.ldm.LDM_8fbce877_d7cd_4146_9bc1_f1711cd910f7

label=Tag id=com.infa.appmodels.ldm.LDM_2bf2b0cc_3496_4078_9dc1_823331a1a1f4

label=Broad Data Category id=com.infa.appmodels.ldm.LDM_8572c55b_3319_4150_8560_993afe1a4bb2

label=Data Category id=com.infa.appmodels.ldm.LDM_e2bff05a_c6d1_4464_8119_f7087ed0c171

label=Business Owner id=com.infa.appmodels.ldm.LDM_a14752b3_88c8_4ad1_a61f_d445197addf4

label=Data Retention id=com.infa.appmodels.ldm.LDM_ea7efef2_ac7f_447c_ac80_bb4b2dc096ca

label=Has PII id=com.infa.appmodels.ldm.LDM_45b40d8b_bbbd_43e5_b7c2_5bafa920ffd6

label=Department id=com.infa.ldm.ootb.enrichments.departmentName

label=Business Description id=com.infa.ldm.ootb.enrichments.businessDescription

label=Display Name id=com.infa.ldm.ootb.enrichments.displayName

label=URL id=com.infa.ldm.ootb.enrichments.URL

reading attribute mapping file attribute_map.csv

attributre map: Quality Score edc=

found using gcp label Quality Score - id=com.infa.appmodels.ldm.LDM_5bb82b5b_aa54_427c_a653_812727176a26

attributre map: Standard Deviation edc=

cannot find correspinding attribute for Standard Deviation

attributre map: Number of Unique Values edc=

cannot find correspinding attribute for Number of Unique Values

attributre map: Number of Null Values edc=

cannot find correspinding attribute for Number of Null Values

attributre map: Mean Value edc=

cannot find correspinding attribute for Mean Value

attributre map: Max Value edc=

cannot find correspinding attribute for Max Value

attributre map: Number of Values edc=

cannot find correspinding attribute for Number of Values

attributre map: Number of Zero Values edc=

cannot find correspinding attribute for Number of Zero Values

attributre map: Date Created edc=

cannot find correspinding attribute for Date Created

attributre map: Min Value edc=

cannot find correspinding attribute for Min Value

attributre map: Most Recent Date edc=

cannot find correspinding attribute for Most Recent Date

attributre map: Median Value edc=

cannot find correspinding attribute for Median Value

attributre map: Data Category Employee edc=

cannot find correspinding attribute for Data Category Employee

attributre map: Data Origin edc=GCP Source

found using edc label GCP Source - id=com.infa.appmodels.ldm.LDM_5f52d16e_2956_4e03_94b8_68e88c468aa4

attributre map: Data Retention edc=

found using gcp label Data Retention - id=com.infa.appmodels.ldm.LDM_ea7efef2_ac7f_447c_ac80_bb4b2dc096ca

attributre map: Data Asset Documentation edc=URL

found using edc label URL - id=com.infa.ldm.ootb.enrichments.URL

attributre map: Data Category Customer edc=

cannot find correspinding attribute for Data Category Customer

attributre map: Data Category Location edc=

cannot find correspinding attribute for Data Category Location

attributre map: Data Ownership edc=

cannot find correspinding attribute for Data Ownership

attributre map: Data Creation Time edc=

cannot find correspinding attribute for Data Creation Time

attributre map: Data Category Health edc=

cannot find correspinding attribute for Data Category Health

attributre map: Broad Data Category edc=

found using gcp label Broad Data Category - id=com.infa.appmodels.ldm.LDM_8572c55b_3319_4150_8560_993afe1a4bb2

attributre map: Data Asset Owner edc=

cannot find correspinding attribute for Data Asset Owner

attributre map: Data Confidentiality edc=

found using gcp label Data Confidentiality - id=com.infa.appmodels.ldm.LDM_8fbce877_d7cd_4146_9bc1_f1711cd910f7

attributre map: Data Asset Expert edc=

cannot find correspinding attribute for Data Asset Expert

attributre map: Data Category Competitor edc=

cannot find correspinding attribute for Data Category Competitor

attributre map: Environment edc=

cannot find correspinding attribute for Environment

attributre map: Data Domain edc=Data Category

found using edc label Data Category - id=com.infa.appmodels.ldm.LDM_e2bff05a_c6d1_4464_8119_f7087ed0c171

attributre map: Data Category Financial edc=

cannot find correspinding attribute for Data Category Financial

updated attribute map = {'Quality Score': {'id': 'com.infa.appmodels.ldm.LDM_5bb82b5b_aa54_427c_a653_812727176a26', 'label': 'Quality Score'}, 'Data Origin': {'id': 'com.infa.appmodels.ldm.LDM_5f52d16e_2956_4e03_94b8_68e88c468aa4', 'label': 'GCP Source'}, 'Data Retention': {'id': 'com.infa.appmodels.ldm.LDM_ea7efef2_ac7f_447c_ac80_bb4b2dc096ca', 'label': 'Data Retention'}, 'Data Asset Documentation': {'id': 'com.infa.ldm.ootb.enrichments.URL', 'label': 'URL'}, 'Broad Data Category': {'id': 'com.infa.appmodels.ldm.LDM_8572c55b_3319_4150_8560_993afe1a4bb2', 'label': 'Broad Data Category'}, 'Data Confidentiality': {'id': 'com.infa.appmodels.ldm.LDM_8fbce877_d7cd_4146_9bc1_f1711cd910f7', 'label': 'Data Confidentiality'}, 'Data Domain': {'id': 'com.infa.appmodels.ldm.LDM_e2bff05a_c6d1_4464_8119_f7087ed0c171', 'label': 'Data Category'}}

initializing edc bulk import file: out/edc_bulk_import_20210809_10_43_30_278.csv

reading summary csv - ./out/edc-datacatalog-sandbox/gcp/summary.csv

found template: data_discoverability

found template: quality_template

found template: dg_template

reading tag file: ./out/edc-datacatalog-sandbox/gcp/data_discoverability.csv

  file not found ./out/edc-datacatalog-sandbox/gcp/data_discoverability.csv skipping tag import for template

reading tag file: ./out/edc-datacatalog-sandbox/gcp/quality_template.csv

processing row 1

processing row 2

processing row 3

processing row 4

linked resource=//bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc score=3.0 for column=address

finding bigquery item...

finding q=core.autoSuggestMatchId:/CATALOG_DATASET/CUSTOMERS_CDC/ADDRESS

searching using parms: {'offset': 0, 'pageSize': 10, 'q': 'core.autoSuggestMatchId:/CATALOG_DATASET/CUSTOMERS_CDC/ADDRESS', 'fq': 'id:*/edc-datacatalog-sandbox/*'}

exactly 1 object found...

processing row 5

processing row 6

processing row 7

processing row 8

linked resource=//bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc score=3.0 for column=city

finding bigquery item...

finding q=core.autoSuggestMatchId:/CATALOG_DATASET/CUSTOMERS_CDC/CITY

searching using parms: {'offset': 0, 'pageSize': 10, 'q': 'core.autoSuggestMatchId:/CATALOG_DATASET/CUSTOMERS_CDC/CITY', 'fq': 'id:*/edc-datacatalog-sandbox/*'}

exactly 1 object found...

processing row 9

processing row 10

processing row 11

linked resource=//bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc score=3.0 for column=country

finding bigquery item...

finding q=core.autoSuggestMatchId:/CATALOG_DATASET/CUSTOMERS_CDC/COUNTRY

searching using parms: {'offset': 0, 'pageSize': 10, 'q': 'core.autoSuggestMatchId:/CATALOG_DATASET/CUSTOMERS_CDC/COUNTRY', 'fq': 'id:*/edc-datacatalog-sandbox/*'}

exactly 1 object found...

processing row 12

processing row 13

processing row 14

processing row 15

processing row 16

linked resource=//bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc score=3.0 for column=phone

finding bigquery item...

finding q=core.autoSuggestMatchId:/CATALOG_DATASET/CUSTOMERS_CDC/PHONE

searching using parms: {'offset': 0, 'pageSize': 10, 'q': 'core.autoSuggestMatchId:/CATALOG_DATASET/CUSTOMERS_CDC/PHONE', 'fq': 'id:*/edc-datacatalog-sandbox/*'}

exactly 1 object found...

processing row 17

linked resource=//bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc score=4.0 for column=postalcode

finding bigquery item...

finding q=core.autoSuggestMatchId:/CATALOG_DATASET/CUSTOMERS_CDC/POSTALCODE

searching using parms: {'offset': 0, 'pageSize': 10, 'q': 'core.autoSuggestMatchId:/CATALOG_DATASET/CUSTOMERS_CDC/POSTALCODE', 'fq': 'id:*/edc-datacatalog-sandbox/*'}

exactly 1 object found...

processing row 18

processing row 19

processing row 20

processing row 21

processing row 22

processing row 23

processing row 24

linked resource=//bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc score=3.0 for column=state

finding bigquery item...

finding q=core.autoSuggestMatchId:/CATALOG_DATASET/CUSTOMERS_CDC/STATE

searching using parms: {'offset': 0, 'pageSize': 10, 'q': 'core.autoSuggestMatchId:/CATALOG_DATASET/CUSTOMERS_CDC/STATE', 'fq': 'id:*/edc-datacatalog-sandbox/*'}

exactly 1 object found...

reading tag file: ./out/edc-datacatalog-sandbox/gcp/dg_template.csv

processing row 1

linked resource=//bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/tpcds/tables/web_sales data_domain=SALES for column=

finding bigquery item...

finding q=core.autoSuggestMatchId:/TPCDS/WEB_SALES

searching using parms: {'offset': 0, 'pageSize': 10, 'q': 'core.autoSuggestMatchId:/TPCDS/WEB_SALES', 'fq': 'id:*/edc-datacatalog-sandbox/*'}

exactly 1 object found...

processing row 2

linked resource=//bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/tpcds/tables/web_sales broad_data_category=CONTENT for column=

finding bigquery item...

reading from cache //bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/tpcds/tables/web_sales/

processing row 3

processing row 4

linked resource=//bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc broad_data_category=CONTENT for column=

finding bigquery item...

finding q=core.autoSuggestMatchId:/CATALOG_DATASET/CUSTOMERS_CDC

searching using parms: {'offset': 0, 'pageSize': 10, 'q': 'core.autoSuggestMatchId:/CATALOG_DATASET/CUSTOMERS_CDC', 'fq': 'id:*/edc-datacatalog-sandbox/*'}

exactly 1 object found...

processing row 5

processing row 6

processing row 7

processing row 8

linked resource=//bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc data_asset_documentation=http://wiki.corp.com/customer_data_asset for column=

finding bigquery item...

reading from cache //bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc/

processing row 9

processing row 10

linked resource=//bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc data_origin=SALESFORCE for column=

finding bigquery item...

reading from cache //bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc/

processing row 11

linked resource=//bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc data_domain=MARKETING for column=

finding bigquery item...

reading from cache //bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc/

processing row 12

processing row 13

processing row 14

processing row 15

linked resource=//bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc data_confidentiality=SENSITIVE for column=

finding bigquery item...

reading from cache //bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc/

processing row 16

processing row 17

processing row 18

linked resource=//bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc data_retention=30_DAYS for column=

finding bigquery item...

reading from cache //bigquery.googleapis.com/projects/edc-datacatalog-sandbox/datasets/catalog_dataset/tables/customers_cdc/

writing bulk import row for biqquery-datacatalog-sandbox://edc-datacatalog-sandbox/catalog_dataset/customers_cdc

finished

**************************************************

        edc searches : 8

    edc objects found: 8

objects to be updated: 1

edc objects not found: 0

      skipped objects: 0

        unmapped tags: 13 values: {'Data Category Financial', 'Number of Values', 'Number of Null Values', 'Data Category Customer', 'Data Category Location', 'Number of Unique Values', 'Data Asset Expert', 'Data Category Competitor', 'Data Asset Owner', 'Environment', 'Data Category Employee', 'Data Ownership', 'Data Category Health'}

      missing objects:

      skipped objects:

**************************************************

ready to upddate edc using out/edc_bulk_import_20210809_10_43_30_278.csv for 1 objects

url=https://infalab01.infaaws.com:9085/access/2/catalog/jobs/objectImports

{'packages': 'com.infa.appmodels.ldm,com.infa.ldm.ootb.enrichments'}

{'file': ('edc_bulk_import_20210809_10_43_30_278.csv', <_io.TextIOWrapper name='./out/edc_bulk_import_20210809_10_43_30_278.csv' mode='rt' encoding='UTF-8'>, 'text/csv')}

response=200

Google cloud tag replication finished

 

Setting up an EDC API Connection

use python setupConnection.py

python setupConnection.py                           

edc api configuration utility:

 

enter catalog url - http(s):\<server>:port : https://infalab01.infaaws.com:9085

enter user id: admin

password for user=admin:

 

validating that the information you entered is valid...

validating connection to https://infalab01.infaaws.com:9085

  api status code=200

valid connection

  {'releaseVersion': '10.5.0.1', 'buildVersion': '58', 'buildDate': 'Mon Jun 14 08:41:18 UTC 2021'}

 

to make this a repeatable process - you can set the following enviroment variables, or an .env* file

 

to set an env variable - linux/mac:

  export INFA_EDC_URL=https://infalab01.infaaws.com:9085

  export INFA_EDC_AUTH="Basic YWRtaW46YWRtaW4="

 

for Powershell:

  $env:INFA_EDC_AUTH=https://infalab01.infaaws.com:9085

  $env:INFA_EDC_AUTH="Basic YWRtaW46YWRtaW4="

 

for windows cmd:

  set INFA_EDC_URL=https://infalab01.infaaws.com:9085

  set INFA_EDC_AUTH=Basic YWRtaW46YWRtaW4=

 

 

or - create a .env file with those settings.

  Note:  if you create a file named '.env' - it will be automatically used by other scripts, or you can over-ride with the -v setting

 

a .env file in the current folder does not exist, should i create it? (y or n)?:y

creating .env.....

file created  .env