4 Replies Latest reply on Jan 30, 2019 1:23 PM by Meenakshi Mahapatra

    Writing functional duplicate addresses to a flat file in IDQ

    Meenakshi Mahapatra New Member

      Hi,

       

      We are using the Informatica Address validator to find the accurate mailable addresses . Now we want to write the duplicate addresses from the list spit out by the Address validator alongwith the Primary key associated with those addresses in the source table into a flat file.

       

      e.g - Primarykey  Address(from Address validator)

                111               ABC

                198               ABC

                223               XYZ

                556               XYZ

       

      We want to write these results into a flat file.

        • 1. Re: Writing functional duplicate addresses to a flat file in IDQ
          Robert Whelan Guru

          Add a Match Tx (you could also use an Aggregator if you only want to check for exact matches) after the Address Validator.

          If using a Match Tx filter the results by clusterSize and write any records where the clusterSize is > 1 to your flatfile.

          If using an Aggregator, you want to write any groups with > 1 record to the flatfile.

          • 2. Re: Writing functional duplicate addresses to a flat file in IDQ
            Meenakshi Mahapatra New Member

            I tried using a Match Tx as you have explained above but it was not generating the Group keys properly.

            If I understood correctly from the documentation for Match Tx : the Group key lets the Match Tx compare and find duplicates only within the same group. But in my case, I should be able to parse and find duplicate complete addresses across all the rows and not within a group only.

            I tried both String and Soundex Strategies for Key generation , but still it didnt assign correct group keys because of which the clusters and linkscore were not calculated properly.

            FYI , I used Bigram Distance "match strategy" (weight 0.5 ) with the Completeaddress field (from Address validator) as "Match Fields" in the Match Tx.

             

            Now , I am going to try the second option of using an aggregator.

            • 3. Re: Writing functional duplicate addresses to a flat file in IDQ
              Robert Whelan Guru

              You're understanding of the Group Key is correct, however it is aimed at ensuring a balance between performance and good match results. While it would be ideal to match every record against others in the dataset, if you have millions of records this can take days.

               

              When choosing a Group Key is is recommended to not use a field you will then use in your Match strategies, however in your specific use-case, you have standardized the addresses using the AV Tx and only want to find exact duplicates. Therefore I'd suggest using one or more of the fields you will match on to compare the addresses.

              In fact you could probably just use the Key Generator Tx to perform the task of finding duplicates. If you include enough of the address elements in a composite key, each group should only contain identical records and you can filter the output to your flatfile to groups of > 1 record.

              • 4. Re: Writing functional duplicate addresses to a flat file in IDQ
                Meenakshi Mahapatra New Member

                I tried creating the Group key using multiple fields from the Address validator i.e.  AddressElementsStreetCompleteWithNumber1 , AddressElementsSub_buildingComplete1 and LastLineElementsPostcodeComplete with String strategy but it was unable to parse the Apartment/Unit numbers and put similar addresses with varying Apartment/Unit numbers into the same Cluster , thus tagging them as duplicates , which is not as expected.