4 Replies Latest reply on Oct 12, 2020 8:32 AM by JanLeendert Wijkhuijs

    Interview question

    DEEPIKA R Active Member

      Hi All ,

       

      Question :Suppose Source have 1 billions of  records and after loading 50K records session got failed and when re-run the job it throwing Unique constraint error .How to solve ?

      My answer : It's because of duplication of data where we trying to reloading the same data again .

      1.Choose the Resume from last checkpoint from session properties (To load from where it stopped)

      2.If Target don't have more records then truncate the table with pre sql query and load the data .

       

      May I know the other alternatives to load the data ?

        • 1. Re: Interview question
          JanLeendert Wijkhuijs Active Member

          Hi Deepika,

          I would investigate the cause of the failure in the first place. You want to avoid situations where the process fails after 50K records. What if it fails again after 150K records?
          And there is a 3rd option as well, adapt the mapping to check whether records are already in the target and filter them.

          Regards,
          JanLeendert

          • 2. Re: Interview question
            Seema Yaligar Seasoned Veteran

            Hi Deepika,

             

            You can use lookup transformation to check existing records and insert other records.

            Or there is an option "truncate target", instead of pre SQL this option can be used to truncate the target table before loading.

             

            Thanks,

            Seema

            • 3. Re: Interview question
              DEEPIKA R Active Member

              Hi Seema ,

               

              Thanks for replying .when we have more records it takes lot of time to match the records and it decreases the performance .Target had already billions of records which we are not supposed to truncate .

               

               

              Regards,

              Deepika

              • 4. Re: Interview question
                JanLeendert Wijkhuijs Active Member

                Hi Deepika,

                In that case it can be useful to create a hash value (e.g. MD5 hash) and check on this.
                Another option would be adding some metadata to the target table (e.g. inserted timestamp or session starttime) to be able to delete the latest inserted records.

                Hope this helps.

                Regards,
                JanLeendert

                1 of 1 people found this helpful