5 Replies Latest reply on Sep 18, 2020 3:42 AM by Nico Heinze

    Significant increase in profile time from 8M to 10M records. Configuration issue?

    Aris Fernandez New Member

      I am new to Informatica and I have been tasked with understanding the company's IDQ installation so that it can be used with our data.

      I have been trying to run profiles. For a long time profiles larger than 1-3 Million records would crash the server until a recent specs upgrade and now larger profiles run without issue. I tried to follow the recommended specs found here for individual IDQ use. The machine currently has a 4 core CPU, 30 GB of RAM, and a 1 TB drive (4x2 TB is recommended but that is too much, I think).


      I was able to run a profile with 8.5 million records (12 columns and 3 rules) in and it took 2 hours and 48 mins. That is about 1.20 milliseconds per records, and it compares to smaller profiles I have run before. But when last night I ran 10 million records, it took 10 hours and 14 minutes, that is 3.68 ms per record, which is more than 3 times the time it took for a profile with 15% less records.


      The data source is a logical data object that reads from a database, so I thought maybe the source DB is the issue, but I checked and the time it takes to reach "Read [#] rows, read [0] error rows for source table [table] instance name [instance]" for both profiles scales similarly to the scaling in record count.


      The real difference is in the time between " Writer run started" and "Writer run completed" (on the logs, I mean). For the 8.5M case, the writer run takes 2 hours 14 mins, while for 10M it takes 9 hours 32 mins.


      And last night was the first time that the 10M run even finish at all. Previously (with just 12 columns and no rules), when I tried 10M records, it would just get stuck and nothing would happen. The DIS temp directory would not grow in size after a while [stuck at a few MB]. For the profiles that I am refering now, the 8.5M case had a disTemp of over 300 GB, while for last night's 10M, disTemp grew to 400 GB.


      Could this be related to the DIS properties/settings, or could this be something else like the profile warehouse DB not being adequate? Below are images with the current set up. I have also attached logs for both profile runs.


      I apologize for the long thread message.


      DIS settings:

      Data Integration Service properties


      DIS process settings:

      DIS process properties

        • 1. Re: Significant increase in profile time from 8M to 10M records. Configuration issue?
          Nico Heinze Guru

          Not that I had any experience with IDQ, but to me such a sudden increase in runtime always indicates that the memory used so far is no longer sufficient.

          Can you please try to re-run the profile with 8M records and monitor system resources (in particular RAM and CPU usage) during that time? I dare to suspect that the machine gets close to some memory limit with 8M records, and that this memory limit is exceeded when running the profile against 10M records.




          • 2. Re: Significant increase in profile time from 8M to 10M records. Configuration issue?
            Robert Whelan Guru


            For both tests, you mention that 12 columns are chosen. Are both tests on the same dataset, i.e. is the 8.5m records a subset of the 10m record?

            If they are the same dataset then it would suggest you're hitting some resource bottleneck as Nico suggested.


            If they are different datasets then a direct comparison cannot be made as the processing of the data will have too many variations e.g. column precision, data variation which will impact on the calculation of frequency etc.

            1 of 1 people found this helpful
            • 3. Re: Significant increase in profile time from 8M to 10M records. Configuration issue?
              Aris Fernandez New Member

              The source of data are the same (Same Postgres table through JDBC) but the data pulled from the DB is at random (random data pulled and then randomly sorted on top of that). Since the table that it comes from only has ~20 million records, I would wager that the a good chunk of the rows in the 8.5M set and 10M set are the same (but different order).


              Just to make sure, I will try again with flat files or a fixed query, just to compare. But the fact that earlier attempts at 10M records profiles resulted in the process being pretty much stuck, I think that the data is not the problem.


              Now, if it is a bottle neck as Nico says, could it be due to a configuration issue? The computer has more RAM than the specs I found here, and those claim it can do 10M-record profiles.

              • 4. Re: Significant increase in profile time from 8M to 10M records. Configuration issue?
                Robert Whelan Guru


                Thanks for confirming regarding the dataset.

                You mention the selection of records is random. Are you using the profile configuration to specify a random selection? If so the next test I'd try is profile the first 8.5m records and then the first 10m. This ensures overlap in the records processed and if you see the same difference it's most likely a resource problem.


                Regarding configuration, it seems unlikely that adding 1.5m records would cause such a deterioration in performance if the current configuration was good for 8.5m records.

                Looking at the settings you shared, I can see a few of the profile options are changed from the defaults e.g. Maximum Profile Execution Pool Size where the default is 10. Were these changed based on recommendations or from your own testing?


                I would also echo Nico's comments above about monitoring CPU & Memory during both tests and see how your resources are being used.

                • 5. Re: Significant increase in profile time from 8M to 10M records. Configuration issue?
                  Nico Heinze Guru

                  It may well be that running a profile against 8.5 million records and a profile against 10 million records will both run smoothly on the same machine, but that need not always be the case (hence the suggestion to monitor the machine). Let me explain with an example:


                  Let's assume that you work on a Windows server with 32 GB of RAM. Out of these 32 GB of RAM, the operating systen, the Informatica domain, and the application services eat up a total of 5 GB. Meaning that the machine has free memory of 27 GB.

                  Let's further assume that the profile on 8.5 million records needs a total of 14 GB of RAM. All fine so far.

                  Now let's estimate how much RAM the profile for 10 millions records will need. The first approximation is: 14 GB * (10/8.5), roughly equalling 16.4 GB.

                  And here comes a problem: this is more than half of the total memory of the machine. And this means (at least as far as I understand Windows) that Windows will start to swap memory to disk (namely to the swap file).

                  And swapping memory to disk means loads of disk I/O and hence a considerable delay of processing. Not by 10% or 20%, no, it's more likely to be factors >3. Disk I/O is so much slower than RAM access that (at least as of my personal experience) factors of 5-10 are not uncommon when Windows starts swapping.

                  In other words: if the profile on 8.5 million records runs for a total of 160 minutes, then I wouldn't wonder if the same profile on 10 million records would need 12 hours.

                  This is no joke. I mean it. It need not be that this is the case for you, but it MAY be.


                  Hence my suggestion to monitor RAM and CPU usage constantly during the profile run.