9 Replies Latest reply on May 12, 2020 6:54 PM by Abhishek Singh

    Implementing multiple match rules in IDQ

    Abhishek Singh Active Member

      Hello,

       

      I have to implement multiple match rules in IDQ in the following way:

       

      First Name AND Last Name AND Gender -  All Exact

      First Name AND Last Name AND (Mobile OR Phone) - All Exact

      First Name AND Last Name - Fuzzy AND Email - Exact

       

      My query regarding the transformation:

       

      1) Key Generator -  How should I determine the strategy and field name while generating the keys for key generator transformation. Should determining the field name be based on the that attribute which is having maximum no. of distinct values and least no. of nulls?

      2) Match - What strategies (both for exact and fuzzy), I should apply in my Match transformation to implement the above match rules?

       

      Thanks,

      Abhi

        • 1. Re: Implementing multiple match rules in IDQ
          Abhishek Singh Active Member

          Could anyone please respond here?

          • 2. Re: Implementing multiple match rules in IDQ
            Abhishek Singh Active Member

            user156342

            Could You please respond to my query. This is critical. Thank you for your time.user156342

            • 3. Re: Implementing multiple match rules in IDQ
              Robert Whelan Guru

              Hi,

              The general advice for a field for a key is least number of nulls, ideally none and the best chance of grouping potential matches. For this reason it is recommended to avoid using one of the fields you will match on, however as you want exact matches I'll suggest the opposite.

               

              • First Name AND Last Name AND Gender -  All Exact
                • As you want only to match records where all 3 fields are an exact match I would suggest merging the data and pass to a Key Generator using the String strategy. Set the Length to 0 as this will build the group based on the full merged string.
                • As you have already grouped the records based on the string, no matching is required, each group is a set of matching records
              • First Name AND Last Name AND (Mobile OR Phone) - All Exact
                • Same as above, no Match Tx required.
              • First Name AND Last Name - Fuzzy AND Email - Exact
                • As you want only records where email is an exact match, use the string strategy to group on email.
                • Regarding the Match strategy to use for first and last name, either Edit Distance or Bigram.
              • 4. Re: Implementing multiple match rules in IDQ
                Abhishek Singh Active Member

                Thanks, Robert for your response.

                 

                 

                1) in the first match rule, I wrongly added Gender. It is Birthdate. If I still go in that way which you mentioned in point 1, the nulls in any of these will be regarded as duplicate which shouldn't be the case.

                 

                John Doe null

                John Doe null

                 

                or

                 

                John  Doe 17-09-1985

                null   Doe 17-09-1985

                 

                The above pairs should not be regarded as duplicate. Duplicate is only when all 3 are populated. In that case, how the matching rule will be implemented. Particularly, the strategies and weighing scores.

                 

                Note: There is Name column too in the table. Can we have this column to generate the group key in key generator transformation using Soundex with length 10?

                 

                2) In the second match rule, as it is mentioned that First Name and Last name should be populated and either of Mobile OR Phone to be populated. In this scenario how matching rule implementation would differ from the 1st matching rule?

                 

                3) As these are 3 matching rules, we use multiple match transformations in single mapping and collating using Decision? What would be the if-else-then statement in Decision transformation

                 

                4) I have to mark the master record as master and duplicate as duplicate for that master. How we can implement this too

                • 5. Re: Implementing multiple match rules in IDQ
                  Robert Whelan Guru

                  Hi,

                  1) If you only wish to identify duplicates you could simply filter any records containing nulls so they never get passed to the KeyGen or Match Tx

                   

                  2) The second rule you could create the KeyGen on firstname & lastname using String strategy which will only group record which have exactly the same firstname & lastname and then use a Match Tx to match on the phone fields.

                   

                  3) For a Decision Tx I'd suggest totaling the score for all you match strategies and make a decision based on that. The Developer Transformation Guide explains the syntax.

                   

                  4) How do you determine which is the Master record?

                  • 6. Re: Implementing multiple match rules in IDQ
                    Abhishek Singh Active Member

                    Hi Robert, Thanks for the response.

                     

                    I have a few more queries here:

                     

                    1) For all the fields marked as Exact match should be a part of keygroup in key generator transformation?

                     

                    2) If there is any field marked as Exact but with OR condition, should it come under match transformation with the appropriate strategy and weighing score?

                     

                    3) If there is any field marked as Fuzzy, should it come under match transformation?

                     

                    4) which are the ports from each of the match transformations I should connect to decision transformation?

                     

                    5) What should be the next transformation in mapping if I have to segregate master record and rest of the records with each record labelled with master and corresponding child?

                     

                    Please let me know if I could send the mapping XML to make yourself more familiar with this.

                     

                    Thanks.

                    • 7. Re: Implementing multiple match rules in IDQ
                      Robert Whelan Guru

                      1) For fields marked as exact match the Key Gen Tx can effectively replace the Match Tx as any records with the same key are an exact match if you use the String strategy with length 0 as this considers the full string.

                       

                      2) Fields with an OR condition can be handled by the Key Gen or Match Tx. The important point is the Match Tx is designed to match records which are no necessarily an exact match, but this comes with a processing overhead so if other Tx can be used as we are only looking for exact matches it will help with resources.

                      If you are matching millions of records this is important, if it's smaller data sets, either Tx is fine.

                       

                      3) Yes, if you need to pair non-identical records the Match Tx should be used.

                       

                      4) It's good practice to pass all the ports through each Tx as it makes mapping easier to read and if the port is not used in any of the Tx functions it is simply passed through with no impact on performance.

                       

                      5) How are you deciding which is the master record?

                      • 8. Re: Implementing multiple match rules in IDQ
                        Abhishek Singh Active Member

                        Thanks Robert. Just to confirm few things:

                         

                        1) For all the fields which are part of match rules with EXACT - is it fine if I remove the special characters from the fields in an Exp Tx before it gonna be part of Key Gen Tx or Match Tx or both?

                         

                        2) I have multiple match rules to implement. So should my mapping look something like this as below?

                         

                        (picture for representation purpose only)

                         

                        or should I implement multiple match rules in parallel ( not sequentially as above) and collate them using decision tx followed by exception - dup record management?

                         

                        3) Master is decided based on below logic

                         

                                  a) In a duplicate set, if only 1 record exists from source 'X' and all the other duplicate records are from legacy system, then the record from source 'X' will be considered as Master record.

                         

                                  b) In a duplicate set, if multiple records exists from source 'X' along with other duplicates from legacy system, then the oldest record from source 'X' will be considered as Master record.

                         

                                  c) In a duplicate set, no record exists from source 'X', but other duplicates from legacy system, then the oldest record from legacy system will be considered as Master record.

                         

                                  Now, What should be my mapping look like if I have to segregate master record and rest of the records with each record labelled with master and corresponding child?

                        • 9. Re: Implementing multiple match rules in IDQ
                          Abhishek Singh Active Member

                          user156342

                          Hi Robert,

                          Apart from my above queries, could you please tell me how exactly we can enrich the master records.

                          This is to update any of the missing/incorrect values in the master from the corresponding duplicate records.

                           

                          Thanks,

                          Abhi

                          user156342