2 Replies Latest reply on May 18, 2020 6:49 PM by Utkarsh Pandey

    Looking for more info on "Save Source Data" option

    Brian T Vance Active Member

      Hi All,

       

      I'm trying to understand the "Save Source Data" option under the "Similarity Profile Data Preparation and Value Frequency Settings".

       

      In the documentation, I read the following :

       

      Save Source DataChoose one of the following options:

      • -Yes. The profiling scanner prepares data to discover similar columns based on column names, column patterns, unique values, and value frequencies. It also computes value frequencies. The scanner then persists the computed information in Apache HBase. The computed information persists in Apache HBase until you choose to delete or purge the resource.
      • -No. The profiling scanner prepares data to discover similar columns based on column names, column patterns, and unique values. The scanner then persists the computed information in Apache HBase. The computed information persists in Apache HBase until you choose to delete or purge the resource.

       

      It seems like the only difference is that if you choose "Yes" then the scanner will discover the value frequencies. Otherwise, they are exactly the same. Is my understanding correct on this?

       

      For some sources, we do not want the ability to view any data at all. Even though this could be handled via security, we would rather just not have the data present at all. And this seems like a good option to handle that but still be able to see all the other metadata available.

       

      Thanks!

        • 1. Re: Looking for more info on "Save Source Data" option
          Ken Guyette Guru

          That is correct.  Save Source Data = Value Frequency.  One other note, in order to see the value frequency you would need to get the privileges on the resource in the catalog administrator to "Metadata and Data Read" or "All Permissions" to see the value frequency.  Read or Read and Write will not show it from our experience. 

          • 2. Re: Looking for more info on "Save Source Data" option
            Utkarsh Pandey New Member

            Hi Brian ,

             

            Profiling

             

            Only imports summary level information about the profiled objects – like % null, distinct, non-distinct + discovered patterns No data is stored in the catalog for this process.

             

             

            Value-Frequencies

             

            It is an additional/optional component that can also be imported.

             

            In this case – the unique values and the count of occurrences of these values are stored in the catalog (it is encrypted)

             

             

            You can control who can see these column value frequencies using the following two settings at the service level (Informatica Administrator console):

             

             

            View Data

             

            View Sensitive Data  (meaning any objects connected to PII data domains)

             

            In the Administrator Console, navigate to Security-> Users tab. Select user, go to privileges section, click on edit option and choose the Catalog service privileges.

             

                  Annotation 2020-04-22 130311.png