1 Reply Latest reply on Mar 18, 2019 3:23 PM by Srinivas Pai

    Why do EDC Purge jobs dump huge amounts of data into Phoenix / exdocs?

    Scott Lee Active Member

      I see this pattern in EDC purge job logs where there are many pairs of entries looking something like this:

      2019-03-15 14:49:08,365 [ExecuteThread-{Task_{resource}_Purge_Purge}] INFO  com.infa.products.ldm.ingestion.client.phoenix.dao.PhoenixIngestionClientDAO- Phoenix query execution time [3,971] ms for [Write Exdocs in bulk].
      2019-03-15 14:52:40,813 [ExecuteThread-{Task_{resource}_Purge_Purge}] INFO  com.infa.products.ldm.ingestion.client.phoenix.dao.PhoenixIngestionClientDAO- Total exdocs being added in bulk [1,000]

      Even for resources with few assets, these can go on for dozens of minutes.  I know that Phoenix is a SQL-on-HIVE engine, although maybe there is something else going on here... Can someone explain the what and why of this to me?


      One more point of info - the actual deletion of the Resource's contents happens almost immediately if you search the catalog.  So this appears to be a post-hoc step.  I just wish it would finish up quicker so i can reload.


      Any advice on tuning this would also be appreciated, if it is unavoidable.