3 Replies Latest reply on Oct 7, 2021 3:50 PM by Darren Wrigley

    Spark code scanner in EDC 10.5

    Sridhar Raju New Member

      Hi Team,

       

      We are using Spark for ETL processing between the data sources.

      We would like to extract the lineage from Spark and capture in EDC.

      Does EDC 10.5 support extracting the lineage information from Spark ?

       

      I went through the PAM for Informatica Platform v10.5 document but couldn't figure out if it's feasible.

       

      Regards,

      Sridhar

        • 1. Re: Spark code scanner in EDC 10.5
          user186817 Guru

          Hi Sridhar,

           

          Are you talking about Informatica mappings executed using the Spark engine or your own Spark applications created outside Informatica?

           

          Regards,
          Lluís

          • 2. Re: Spark code scanner in EDC 10.5
            Sridhar Raju New Member

            Hi Lluís,

             

            Here is the architecture where Spark is being used as an ETL tool from Level 0 to Level 1

            So, it is out of informatica and Spark SQL used an ETL process.

            Do we have scanners available to extract the information from Spark SQL code ?

             

             

            • 3. Re: Spark code scanner in EDC 10.5
              Darren Wrigley Guru

              is the sql separate from the code? - e.g. in a .sql file vs embedded in python/scala/java code?  if yes we can probably scan with a db specific scripts scanner.  what is the target db?

               

              we don't have a spark scanner at the moment - the closest to that is a new databricks notebooks scanner that will scan pyspark code (python code using the spark api) looking for file and database i/o.  this is the first scanner that reads code that uses the spark api & what we learn from that will be used for a more generalized scanner at some point.