9 Replies Latest reply on Apr 19, 2021 3:48 AM by Reza Joemrati

    How to make data lineage .out files

    Felipe Cabral Active Member

      Dears,

       

      Currently we are using EDC 10.2.1.

       

       

      We have approximately 250 mappings process where your targets is flat files.

      Basically the process read from DB2 and write in flat file. this flat file target will be reading by python process and after will be write into Postgre.

       

       

      DB2 -> Mapping -> flat file -> Python -> Postgre

       

       

      I'm looking for this solutions:

       

       

      Quick way to create a data lineage using flat file other than .csv. I found this KB saying that is not possible create data lineage using assignment connection for other than .csv, but, in the bottom, show a message about new features that will be release in EDC 10.2.1, Is it possible?( https://kb.informatica.com/faq/7/Pages/20/523750.aspx?myk=Enterprise%20data%20catalog%20flat%20file)

       

       

      I know that python programs isn´t supported natively, but, we can create a custom object to show python program into the data lineage.

      We would like know if is necessary create a custom program containing the same ports that come from flat file or, if we can create just one "generic" program without ports and, using it to all data lineage.

       

       

      Example A

      DB2 -> Mapping -> flat file -> Python -> Postgre

      DB2 -> Mapping -> flat file -> Python -> Postgre

      DB2 -> Mapping -> flat file -> Python -> Postgre

       

       

       

       

      Example B

      DB2 -> Mapping -> flat file ->             -> Postgre

      DB2 -> Mapping -> flat file -> Python -> Postgre

      DB2 -> Mapping -> flat file ->             -> Postgre

        • 1. Re: How to make data lineage .out files
          Srinivas Pai Guru

          Hi Felipe Cabral

           

          For below dataflow we can try the below

           

          DB2 -> Mapping -> flat file -> Python -> Postgres

           

            DB2 -> Mapping --> Can be achieved via the connection assignement after the PC resource loads are run

            Mapping -> flat file  --> Can be achieved via the connection assignement after the PC resource loads are run

           

            flat file -> Python

           

            We can create a custom model for Python and load the contents from python and get the metadata into EDC.

            Once done using custom lineage links can be created from

            flat file -> Python  and Python -> Postgres

           

            OR

           

            Use custom lineage resource type to load only summary lineage between the flat file and Postgres

           

          Postgres metadata can be ingested into EDC as a JDBC or a UCF resource, (Preferably former)

           

           

          Let me know if this helps

           

           

          Regards

          Srinivas

          1 of 1 people found this helpful
          • 2. Re: How to make data lineage .out files
            Felipe Cabral Active Member

            Dears,

             

            Just increasing the information, create python program as custom metada was unhelpful. We can´t do it. We spend a lot of time trying any ways to do it and not work well, but can´t. The problem is, after create the custom metada we can´t associate the object into the custom data lineage, even if we put all "object id path".

             

            The solution for this case was:

            1 - Create an empty file in the directory (example command line: echo > main.py)

            2 - Create a resource connection as flat file and put the directory where the empty file was created)

             

            After the process, the object id generated was able to point the object in the .csv of custom data lineage.

            Do, we can do the complete process.

            I hope that this information can be helpful.

             

            Kind Regards,

             

            Felipe Cabral

            • 3. Re: How to make data lineage .out files
              Darren Wrigley Guru

              Hi Felipe,

               

              i am not sure i understand - are you trying to represent the python files using the file system scanner?  if so - that will work only if you select unstructured when configuring the scanner.  this will create an unstructured file object  called main.py.  this file can then be linked using custom lineage

              • 4. Re: How to make data lineage .out files
                Felipe Cabral Active Member

                Hi Darren!

                 

                Exactly what I have done! I mapped the main.py as an unstructured file.

                The problem did not happen when creating the python program as the custom metadata, but so,  referencing the object id in the custom data lineage. It not works!

                The same problem appears when we tried to associate with UCF connection with PowerCenter mapping/workflow or some process that need to be associated with some resource.

                 

                I believe that you can get some "bug" of EDC and needs to be investigated!

                 

                In my case, I do not need to view the python code in my data lineage, just represent the object in the "workflow".

                 

                To do it, I have done the steps described before!

                 

                I hope that this post can be helpful to the community!

                 

                Kind Regards,

                 

                Felipe Cabral

                • 5. Re: How to make data lineage .out files
                  Srinivas Pai Guru

                  Felipe,

                   

                   

                  Why dont you use custom lineage to create summary lineage between the flat file and Postgres

                   

                   

                  Regards

                  Srinivas

                   

                  • 6. Re: How to make data lineage .out files
                    Felipe Cabral Active Member

                    Hi Srinivas

                     

                    I have done it and worked perfectly.

                     

                    So, I posted a new reply just to improve the experience about the custom metadata, used for build python program, it doesn't worked to me.

                     

                    I created a new empty file and I used the unstructured resource connection to bring it to edc. How I said previously.

                     

                    To build the data lineage, I used a custom data lineage and works fine .

                     

                    Thanks so much .

                     

                    Kind regards

                     

                    Felipe Cabral

                    • 7. Re: How to make data lineage .out files
                      Reza Joemrati New Member

                      Hi Darren,

                       

                      I'm looking for something similar at a customer. So, they have lots of data stored in Hive tables. Usiing Python or scala, they transform and/or load data from these Hive tables to i.e. S3 or other targets. So, my question is: How do I get the python code visible in EDC using a custom model? From the replies in this chain, a main.py is mentioned. What process creates this main.py file? If you have any documentation about loading Python and Scala into EDC, could you please share it?

                      Thank you for your time and help.

                      Reza

                      • 8. Re: How to make data lineage .out files
                        Darren Wrigley Guru

                        you could start by reviewing these articles - which show how to use the custom metadata framework.

                         

                        How to bring custom metadata in EDC (ETL example)

                        Advanced custom lineage metadata ingestion