1 of 1 people found this helpful
Hi Felipe Cabral
For below dataflow we can try the below
DB2 -> Mapping -> flat file -> Python -> Postgres
DB2 -> Mapping --> Can be achieved via the connection assignement after the PC resource loads are run
Mapping -> flat file --> Can be achieved via the connection assignement after the PC resource loads are run
flat file -> Python
We can create a custom model for Python and load the contents from python and get the metadata into EDC.
Once done using custom lineage links can be created from
flat file -> Python and Python -> Postgres
Use custom lineage resource type to load only summary lineage between the flat file and Postgres
Postgres metadata can be ingested into EDC as a JDBC or a UCF resource, (Preferably former)
Let me know if this helps
Just increasing the information, create python program as custom metada was unhelpful. We can´t do it. We spend a lot of time trying any ways to do it and not work well, but can´t. The problem is, after create the custom metada we can´t associate the object into the custom data lineage, even if we put all "object id path".
The solution for this case was:
1 - Create an empty file in the directory (example command line: echo > main.py)
2 - Create a resource connection as flat file and put the directory where the empty file was created)
After the process, the object id generated was able to point the object in the .csv of custom data lineage.
Do, we can do the complete process.
I hope that this information can be helpful.
i am not sure i understand - are you trying to represent the python files using the file system scanner? if so - that will work only if you select unstructured when configuring the scanner. this will create an unstructured file object called main.py. this file can then be linked using custom lineage
Exactly what I have done! I mapped the main.py as an unstructured file.
The problem did not happen when creating the python program as the custom metadata, but so, referencing the object id in the custom data lineage. It not works!
The same problem appears when we tried to associate with UCF connection with PowerCenter mapping/workflow or some process that need to be associated with some resource.
I believe that you can get some "bug" of EDC and needs to be investigated!
In my case, I do not need to view the python code in my data lineage, just represent the object in the "workflow".
To do it, I have done the steps described before!
I hope that this post can be helpful to the community!
Why dont you use custom lineage to create summary lineage between the flat file and Postgres
I have done it and worked perfectly.
So, I posted a new reply just to improve the experience about the custom metadata, used for build python program, it doesn't worked to me.
I created a new empty file and I used the unstructured resource connection to bring it to edc. How I said previously.
To build the data lineage, I used a custom data lineage and works fine .
Thanks so much .
I'm looking for something similar at a customer. So, they have lots of data stored in Hive tables. Usiing Python or scala, they transform and/or load data from these Hive tables to i.e. S3 or other targets. So, my question is: How do I get the python code visible in EDC using a custom model? From the replies in this chain, a main.py is mentioned. What process creates this main.py file? If you have any documentation about loading Python and Scala into EDC, could you please share it?
Thank you for your time and help.