9 Replies Latest reply on May 1, 2020 11:48 AM by Erik Little

    DEI/DEQ - Data Validation between source and target

    Erik Little Active Member

      DEQ 10.4.0.1

       

      I appreciate any ideas on how to accomplish this

       

      Similar to my question on making sure the number of rows read from the source matches the number of rows written to the target.  This validation would make sure that the data matches the source after it has been written.  I would think that there would need to be a filter applied to ensure that only the records that were read/written for that run are evaluated but that's fairly straight forward.

      I know that I can use a match transformation to do a field match (dual Source).   The "Data Quality Management for Developers" class only has one source in its example and I can't find any documentation on using a dual source. 

       

      do I need to use two "key generator" transformations or one?

       

      The match algorithms are confusing to say the least.  I wish they were explained from the context of data type.  For instance if you wanted to validate that the account number in source matches what was moved to the target.  I think it would be Hamming Distance to evaluate account numbers.

       

      If anyone has a cheat sheet for picking the algorithm based on data type that would be great.  The help documentation only mentions string next to each of the algorithm examples.

       

      Thanks,

      Erik

        • 1. Re: DEI/DEQ - Data Validation between source and target
          user126898 Guru

          Hey Erik,

           

          Is there a key field(s) that we can join the source and target datasets to compare rows to each other?

           

          thanks,

          Scott

          • 2. Re: DEI/DEQ - Data Validation between source and target
            Erik Little Active Member

            user126898

             

            I am using the all cusomters csv that informatica gave us as test data just to get the steps down.  In it there is a Record ID column which appears to be unique so I think that is what can be used.

             

            Thanks,

            Erik

            • 3. Re: DEI/DEQ - Data Validation between source and target
              user126898 Guru

              In the case you have a key you can match on between source and target you could just use an MD5 function in an expression transformation.  Basically you concat all the fields on a data row besides the key field into the MD5 which is a checksum output.  You join to the two systems based on the key field(s).  If the two MD5 fields match then the records are the same if not then there is an issue.

               

              This is a much easier approach then trying to use the very complex match/comparsion transformations.

               

              thanks,

              Scott

              • 4. Re: DEI/DEQ - Data Validation between source and target
                Erik Little Active Member

                user126898

                 

                I think this is what I need to do to accomplish what you are saying.

                 

                 

                Create a Mapping: 

                 

                 

                Source: read the data --> create an expression transformation, create a variable port and concat all the fields into the variable, and then perform the md5 on the varible.  Pass the variable to an Out port.  

                 

                 

                Target: read the data --> create an expression transformation, create a variable port and concat all the fields into the variable, and then perform the md5 on the varible.  Pass the variable to an Out port.

                 

                 

                In the Mapping join the 2 expressions on the id field and evaluate if the md5's match.

                 

                 

                Does this sound like what you are describing?

                 

                Thanks,

                Erik

                • 5. Re: DEI/DEQ - Data Validation between source and target
                  user126898 Guru

                  --my comments.

                   

                  Create a Mapping:

                   

                   

                  Source: read the data --> create an expression transformation, create a variable port and concat all the fields into the variable, and then perform the md5 on the varible.  Pass the variable to an Out port. 

                       --you dont need to use a variable port.  You can do the concat and MD5 directly in the expression of the output port

                   

                   

                  Target: read the data --> create an expression transformation, create a variable port and concat all the fields into the variable, and then perform the md5 on the varible.  Pass the variable to an Out port.

                       --you dont need to use a variable port.  You can do the concat and MD5 directly in the expression of the output port

                   

                  In the Mapping join the 2 expressions on the id field and evaluate if the md5's match.

                       --use a joiner transform to join the two data streams, joining on the id field.  then use an expression to compare the MD5 fields.

                   

                  If anything does not match you can write the rows out to a file/table for future analysis.  You can also put a email notification in the workflow if any records are written to the target send an email to someone to check the bad records.

                   

                   

                   

                  thanks,

                  Scott

                  • 6. Re: DEI/DEQ - Data Validation between source and target
                    Erik Little Active Member

                    user126898

                    I figured out I can't use the CONCAT function (CONCAT only works on 2 strings) in a port expression but I did find that this code validates with in a port expression.

                    MD5(ORDER_NO||PO_NUM||ORDER_DATE||SHIP_DATE||COMPANY||ADDR1||ADDR2||ADDR3||ADDR4||COUNTRY_CODE||CONTACT||TITLE||PHONE||EMAIL||SP_NO||ITEM_NO||QUANTITY||ITEM_DSC||SUPID||UNIT_COST||PAY_MTHD||PAY_TERM||CURRENCY)

                     

                    I have my join set to the RECORD_ID = RECORD_ID1 and o_SOURCE_MD5 = o_TARGET_MD5.  I couldn't figure out where to do an expression on the MD5 in the joiner except in the Join tab. 

                     

                     

                    It seems to have worked.  I will need to do some testing to verify that it works as expected.

                    Thanks again for all your help!  Once I confirm that it does I will mark as answered.

                    • 7. Re: DEI/DEQ - Data Validation between source and target
                      user126898 Guru

                      after the joiner I would pass the records to an expression and do a test on the MD5 fields and then pass to a filter which you can output to any thing.  The joiner works like you have it but if you want to do some additional testing you can use the expression approach.

                       

                      glad it is working out for now.

                      • 8. Re: DEI/DEQ - Data Validation between source and target
                        Erik Little Active Member

                        Here are the two options I have found.  I read the 2 different sources, concat all the fields except the key field into an MD5, at the joiner i join on the key field and the MD5 field, I can then write it out.



                        This one starts off like the above except it differs a the joiner.  The joiner just filters based on the Key field, then the filter is where I remove the records that don't match on the MD5.  Then I write it out.





                        • 9. Re: DEI/DEQ - Data Validation between source and target
                          Erik Little Active Member

                          I got my mapping to join and filter using the MD5 and then I wrote out the records that matched to a file and the ones that didn't match to a different file.

                           

                           

                          I have used the rows read and rows written from a workflow as part of an email notification in a workflow.  I want to get the number of rows written for the match file and the number of rows written for the failed to match file.  I am having trouble figuring out how to capture that information and put it into a variable that I can pass to the workflow notification task.