0 Replies Latest reply on Nov 28, 2021 5:17 AM by Bar Albo

    Running Similarity Profiling based only on Name is not working

    Bar Albo New Member

      Hello,

       

      I am using Informatica EDC v10.5.1 and I am having trouble running the similarity profiling feature:

       

      Problem 1: the custom option for profiling large tables with JDBC Scanner  "-DcustomSampling" overrides the number of rows to consider in the Sampling Option in "Similarity Profile Data Preparation and Value Frequency Settings" section -

       

      In the JDBC Scanner resource (in my case for Vertica database), in the section of Similarity Profile Data Preparation and Value Frequency Settings, when I enable the similarity profile option, the profiling takes way too much time. For the purpose of scanning large tables I use the the custom option of -DcustomSampling. Based on run times, this option is clearly overrides the number of rows to consider in Sampling Option in the Similarity Profile section and makes the profiling too slow.

       

      E.g "-DcustomSampling"='1000000' + Sampling Option in the Similarity Profile section: First N Rows, N=10 is not working.

      It will take N as 1000000 and not 10 as expected.

       

      Problem 2: Running Similarity Discovery resource based on Name only is not working if you don't run Value Frequency profile in the JDBC resource (it reaches completion but doesn't do anything) -

       

      I don't really need the Value Frequency profile as it appears in the section of "Similarity Profile Data Preparation and Value Frequency Settings" but I still want to run similarity profile based on name. To my understanding, running the Similarity Discovery resource based only on Name without enabling the similarity profile in the Similarity Profile Data Preparation and Value Frequency is not possible at the current version. Because similarity profile based on name requires only metadata (the columns names) and not profiling of any sort (especially Value Frequency), I expected it to work.

       

      For people who need only Similarity Profile based on Name (which should require only columns name and not any value frequency), running the Value Frequency profile is redundant (and in my case too slow).

       

       

      Thanks,