3 Replies Latest reply on Nov 19, 2020 6:45 AM by Nico Heinze

    IDQ - strategies reading data from source

    anna Di Martino New Member

      Hi all,

       

      we are starting working on IDQ 10.4 and we would appreciate you suggestions about best practices related to the business case described below.

       

      We need to put "under DQ control" some tables coming from an Oracle DB. We have designed some mappping and DQ rules in order to check data related to some logical entities, where a logical entity is the result of joining two or more tables. Now, our question is: what is the best method to process the source data?

       

      Here some options we are evaluating:

       

      1) create a mapping composed by 3 kinds of transformations: READ, DECISION, EXCEPTION, where the "READ" one directly reads data from source systems (in incremental o full way) and writes results of DQ rules into bad/issue/reject/good tables created into a new "DQ data DB"

       

      2) create two mappings:

        • the first one with READ + WRITE transformations in order to write source data in our  "DQ data DB" staging area
        • the second one with READ, DECISION, EXCEPTION, working in the same way described in solution 1) but starting from copied data

       

      3) using PWC for data extraction process that ends writing source data in our  "DQ data DB" staging area and then IDQ only for applying DQ controls starting from data copied by PWC

       

      4) other options.

       

      Thanks in advance for your support.

      Anna

       

       

       

       

        • 1. Re: IDQ - strategies reading data from source
          Nico Heinze Guru

          Hi Anna,

           

          there is one (in my opinion strong) reason in favour of #2 or #3:

          A staging DB (assuming it's "physically" different from the DB where the original source data are stored) will allow you to impose less burden on the original source DB; just reading the data from there and simply staging them to a staging DB will definitely create the least possible traffic on the source DB server. In many customer environments the source systems are under heavy load almost constantly, meaning that each additional task put onto these servers will make the source system slower, so this staging approach can really help to decrease this additional load.

           

          Now on to #2 vs. #3. I would consider two things. First, how much knowledge about PowerCenter do you have in-house? In the long run you will need someone on-site who can help with PowerCenter related questions, even though this someone need not necessarily be busy with PowerCenter all day long. So you need some PowerCenter experience in-house.

          On the other side as far as I can tell PowerCenter needs less system resources than IDQ, so approach #3 will probably help you use the system resources of the IDQ machine better than performing this staging process in IDQ itself.

          But please be aware that you need a PowerCenter license to do this in PowerCenter, as far as I understand the license model of Informatica the IDQ license alone does not allow you to run PowerCenter. So this probably will incur additional costs for you.

           

          There is a fourth approach, assuming you're willing to work more with PowerCenter.

          You can develop your load processes in IDQ (and this way utilise the features of the Developer GUI) and then transfer those processes to PowerCenter, including the IDQ transformations. This might be an option for you, and in fact you may get performance gains this way.

          Again, assuming you do have PowerCenter know-how in-house and an appropriate license.

           

          In short: from a performance perspective I would prefer approach #3 (staging in PowerCenter and then running IDQ to check the data) or #4 (developing the staging and checking processes in IDQ and then executing them in PowerCenter).

           

          Regards,

          Nico

          • 2. Re: IDQ - strategies reading data from source
            anna Di Martino New Member

            Hi Nico and thaks a lot for your fast and precise reply!

            About PWC, our customer has the license and we also have some skilled persons belonging to another group but, if not really necessary, we would prefer to keep the DQ branch separated from PWC branch becasue of people, deployment processes and so on.

            For this reason we would proceed with option #2 trying to implement it in the most performing way. About this this we have two more questions:
            a) from a mapping point of you, do you think that the best way to load data from a source DB to a target DB, is two use 2 mappings (READ + WRITE and READ + DECISION + EXCEPTION) as we are supposing?
            b) about the first mapping (READ + WRITE) is there any best practice? On the READ transformation we didn't find any particular option while on the WRITE one it should be possible to work on the option "LOAD TYPE = BULK (instead of NORMAL)". The BULK setting should speed up the loading process but our test didn't show any difference on timing.

            Thanks again!
            Anna

            • 3. Re: IDQ - strategies reading data from source
              Nico Heinze Guru

              Hi Anna,

               

              from my personal perspective I would opt for approach #2, meaning to separate the staging process from the analysis process. For one simple reason: debugging and extending the system is so far easier with cleanly separated functionalities.

               

              Other than that I'm the wrong person to respond any further. I've never worked with IDQ myself, so I don't have any real experience with it. Let's hope other people more knowledgeable in IDQ than me will respond here.

               

              Regards,

              Nico