7 Replies Latest reply on Dec 21, 2020 7:29 PM by Eugene Yeo

    Rule to Check for Dupes?

    Active Member

      Is there any way to create a rule (or a rule specification) that will check for duplicate values?

       

      Example: Let's say I have a file with 4 records.  One of the fields is AccountNumber.  I want to make sure that AccountNumber is unique.  If not unique - I want this to show up on a Scorecard as 'bad data'.

       

      AccountNumber

      1

      2

      3

      3

       

      I want to show that the last two records violate a uniqueness rule.

       

      I don't see how to create a rule to test for duplicates.

       

      How can I accomplish what I'm trying to do?

       

      Ideally I would like to see this 'violation' in both a Profile and a Scorecard.

        • 1. Re: Rule to Check for Dupes?
          PJGeisinger Guru

          Your duplicate rule requires comparison across rows. Therefore, a simple expression rule (which evaluates a single row at a time) is not sufficient.

           

          In the Analyst tool, create a mapping spec which uses the aggregator transform to produce a count of a non-null field with the Group By fields including all of your chosen key (just Account Number in your example above). Your output should include all fields in the chosen key and also the computed row count for that key (output by aggregator).

           

          Profile this mapping spec and add row count to a scorecard. Valid value for row count is 1. Drill down will show detail of Account Number.

          • 2. Re: Rule to Check for Dupes?
            Active Member

            Thank you Patrick.

             

            I was able to create a mapping specification.  I selected 'Virtual Target'.  I assume this is correct for my purposes because I do not need to store the output of this mapping (I just need to show/use the results in either Profile and/or Scorecard).

             

            So the mapping outputs the Account Number and it's associated record count within the dataset.

             

            Example: 

            AccountNumber          Count

            1                                 1

            2                                 1

            3                                 2

             

            Which tells me that account number 3 is duplicated.

             

            My next question is - how do I use this in my Profile?  I thought that I could use this as I would a rule, in the Profile - but not the case.

             

            How do I get this onto my Scorecard?

             

            (I'm assuming that it must be part of the Profile first before it can be added to Scorecard)

            • 3. Re: Rule to Check for Dupes?
              PJGeisinger Guru

              Yes, you must fist run the Profile on the Mapping Spec. Then,go to the detailed view for the column 'Count' that you specify in your response. From the detail view of the column 'Count', you will choose 'Add to Scorecard' from the action menu. By adding the 'Count' column to the scorecard as a metric, you will walk through the wizard and select Count value of 1 as the only valid value. Run the scorecard, and you will see the metric (with default name Count unless you changed it in the wizard), shows 3 Total rows and 1 invalid row for a score of 67%. By right-clicking the metric and selecting Drilldown you will be able to see which Account numbers were valid (unique or row count =1) or invalid (duplicated or row count >1).

              • 4. Re: Rule to Check for Dupes?
                vaishali joshi New Member

                Hi Everyone,

                 

                we referred this thread to create a mapping spec on a particular metric e.g. id and created count column for tracking the duplicates. This approach is valid to check one metric i.e. in this case 'id'.

                 

                In our requirements, we need to do this check on around 40 metrics that may or may not be exposed in different scorecards. so do we have to create 40 different mapping specs and then follow the same steps as done for id column or we have a workaround on the same. Many thanks!

                • 5. Re: Rule to Check for Dupes?
                  Eugene Yeo New Member

                  Are there any permissions required to add the column in the mapping spec to a scorecard? I don't see an option to 'Add to Scorecard' from the action menu from thee detail view.

                  • 6. Re: Rule to Check for Dupes?
                    Namratha Dhoolipalla Guru

                    Hi Eugene,

                     

                    You cannot add a Column from a Rule Specification to a Scorecard.

                    A scorecard is the graphical representation of valid values for a column in a profile. You can add columns to a scorecard only from a Profile.

                     

                    Thanks,

                    Namratha

                    1 of 1 people found this helpful
                    • 7. Re: Rule to Check for Dupes?
                      Eugene Yeo New Member

                      Thanks Namratha,

                       

                      I figured out what was wrong/missing. I was trying to add (to the scorecard) the column from the profile summary in the mapping spec, instead of a column from a profile created from the mapping spec.

                       

                      To get it working:

                      1. After creating the mapping spec in Analyst,  go to Library > Mapping Specifications > (go to actions on the Mapping Spec that was created) Create Profile.

                      2. Go to the newly created profile. The column from the newly created profile can then be added to a scorecard.

                       

                      Regards,

                      Eugene Yeo