I am going to assume that any sql processing/statements are embedded into code & will use dataframes to sore/process/aggregate and eventually write data.
what language are you using? since spark is not a language it self, it would usually be implemented using python/scala/java/r
given this - it is a difficult problem to solve. we are working on a longer term strategy for language analysis (like pyspark) but that may take some time to implement
if you have a mapping spec - then a custom solution might be possible - assuming the mapping spec is accurate.
if you have any examples - you can direct message to me and we could take a look to see what might be possible.
It is Spark scala. If I heard you correctly, I am hearing a Custom Lineage Resource?
I believe that the environment includes Atlas. Could I extract lineage from Atlas? If so, do the developers need to be doing anything specific in order for Atlas to capture lineage?
you could try and use the atlas scanner. if you open atlas, can you see lineage for spark sql jobs?