3 Replies Latest reply on Apr 29, 2021 9:44 AM by Darren Wrigley

    How do I capture lineage in a data lake where Spark SQL is used to transform the data?

    John Quillinan Guru

      I have a data lake where the data sources is accessible via Hive.  Data is transformed using Spark SQL between the layers or zone in the data lake. EDC supports Hive via a native connector.

       

      How do I capture the lineage between the layer/zones if Spark SQL is being run to process the data?