1 Reply Latest reply on Sep 13, 2017 12:27 PM by Nico Heinze

    Joiner transformation for performance improvement

    Prathap S Seasoned Veteran

      Hi All,

       

      I have a question on joiner transformation,If we select master as fewer rows and detail as larger rows then it improve performance as join based on master rows.

       

      But in the below scenario how it improve the performance,since whichever way it joins,same number of joins required..

       

      TAB1:                    TAB2:

       

      col1                        col2     

      -------                      -------  

      1                                1  

      1                                1 

                                        1

                                        1

                                        1

        • 1. Re: Joiner transformation for performance improvement
          Nico Heinze Guru

          This is one case in which this simplified statement about the Joiner input streams does not really apply. The truth is a bit more complicated.

           

          It turns out that you have to find out which data stream contains the smallest number of distinct records. For example, suppose we have a master stream of 10,000 records and a detail stream of 50,000 records. Then the simplified version of the Joiner statement says, use the smaller input stream as the master stream.

          However, let's suppose that those 10,000 records are distinct. Let's also suppose that the detail stream consists of only 500 distinct records, each occurring 100 times in the data stream.

          In this case the second input stream (with 500 distinct records) should be taken as the master stream. This will ensure that the cache file for the master stream is as small as possible, which in turn speeds up building the cache file and improves session performance this way (not to mention that less RAM will be needed for the cache).

           

          Your case is even more interesting from a theoretical point of view. With 2 resp. 5 records it should not matter which data stream is master and which one is detail. But in reality you might have 10,000s repetitions of each record in the respective input stream. In this case you have to try and find out yourself whether it works better to define stream #1 as master and stream #2 as detail or the other way round.

          To be honest I can't tell which one makes more sense in your sample case.

           

          Regards,

          Nico

          1 of 1 people found this helpful