3 Replies Latest reply on Jun 16, 2020 9:39 AM by Darren Wrigley

    Unit Testing a Data Lineage

    Noor Basha Shaik Guru

      Hello all,


      what are some best practices and proven approaches for unit testing a data lineage. Assume I like to ingest metadata from two sources: Informatica PowerCenter & Oracle database. There are 100s of tables and complex mappings.


      How do I test if the lineage inferred is accurate or not...one way is to look at the connected dots at PowerCenter level, and see if the same appears in the lineage also. But, that could be very manual and time consuming.


      Is it really possible to automate this? or else, any better approach with automation included.




        • 1. Re: Unit Testing a Data Lineage
          Darren Wrigley Guru

          It is hard to automate this - as you would need to know what the correct result from the mapping.


          What i usually look for is :-

          • did connection assignment happen for all connections
          • were there any missing links
          • 2. Re: Unit Testing a Data Lineage
            Sharad Suryavanshi Active Member

            We have the same issue. Hundreds of mappings with extremely parameterized calculations at columns. Missing links only useful for identifying the missing connections to the objects but not to the column level calculations.


            I'm surprised that this is not answered with a approach for verifying if all my PC content is pulled correctly in EDC. How do we confirm to the business that the lineage is good at every hop.

            • 3. Re: Unit Testing a Data Lineage
              Darren Wrigley Guru

              the lineage within the PC metadata (detailed lineage) should be ok - the scanner should be importing all of the transformations and ports & the lineage/operations used.


              you can use the relationships api endpoint to query both summary and/or detailed lineage relationships

              even at the summary level - the PC scanner will store some operation information (com.infa.ldm.etl.pc.Operation) that can be used to identify what expression logic is used.