5 Replies Latest reply on Jul 21, 2021 9:51 AM by Nico Heinze

    Max for multiple Ports or Columns

    Igor Pinto New Member

      Hi

       

      I have this information in a table and i need to get for eatch Client_id the line whit the max Date_Start and the Max Date_End but i don't want to use two aggregators:

       

      for client_id 1 i sould have this line

      07/12/2006; 07/12/2006; 1; B

      for client_id 1 i sould have this line

      11/05/2020; 08/07/2021; 2; Y

       

       

      Date_StartDate_End
      Client_ID
      Description
      07/12/200607/12/20061B
      07/12/200220//12/20121A
      11/03/200020/12/20111C
      11/05/202008/06/20212B
      11/05/202008/07/20212Y

       

      Can you help me to find de best way to do it without using one agg for the max Date_Start and another for the max Date_End.

        • 1. Re: Max for multiple Ports or Columns
          JanLeendert Wijkhuijs Guru

          Hi,

           

          Your question in combination with your examples is a little confusing.
          To identify the line with the highest Date_Start you can read the data sorted on Client_ID and Date_Start (descending).

          When you create a variable port being the last port in an expression var_prev_Client_ID and put in Client_ID, you can create another variable port (integer named var_ident_Date_Start) above it in which you identify whether it is the row with the highest Date_Start.
          IIF(Client_ID <> var_prev_Client_ID, TRUE, FALSE)
          You can do the same for the highest Date_End.Later in the mapping you can filter on var_ident_Date_Start is TRUE or var_ident_Date_End is TRUE.

          If you just need the highest values you can use one aggregator in which you do the aggregations for multiple columns.

          Regards,
          JanLeendert

          • 2. Re: Max for multiple Ports or Columns
            Nico Heinze Guru

            Another option which needs zero or one Sorter(s), one Joiner, and one Filter and works like this:

             

            Sort the data by Client_ID. Depending on the source system type and the data type of the client ID, this can be done either in the source system or using a Sorter transformation.

             

            Forward the original data into an EXP.

             

            Within the EXP, identify several flags.

            First you need to know whether the current record belongs to the same Client_ID as the previoss record. Let's name this flag port v_is_new_client (type Integer).

            Second you need to identify the smallest Start_Date per client ID. For a new record per client ID (meaning "v_is_new_client" is set to TRUE), simply store the current start date to a variable port for the start date, otherwise store either the current start date (if it's smaller than the current variable port) or keep the previous start date.

            Similarly you need to identify the largest End_Date per client ID. For a new record per client ID (meaning "v_is_new_client" is set to TRUE), simply store the current end date to a variable port for the end date, otherwise store either the current end date (if it's larger than the current variable port) or keep the previous end date.

             

            Forward the output of the EXP to a Joiner which joins the original input stream with the output of the EXP by client ID.

            Forward the output of the Joiner to a Filter transformation. This Filter lets pass through only records with start date = <minimum start date identified by the EXP> or end date = <maximum end date identified by the EXP>.

             

            Regards,

            Nico

            • 3. Re: Max for multiple Ports or Columns
              Nico Heinze Guru

              Forgot to mention that - because the data must be sorted prior to being fed into the EXP - the Joiner can (and should) be set to Sorted Input, meaning that cache file sizes will be as small as possible and that the processing time will be shortened as far as possible as well.

               

              Regards,

              Nico

              • 4. Re: Max for multiple Ports or Columns
                Vlad Ponomarenko Guru

                Hi,

                 

                As I understand your example, DATE_START has more priority under DATE_END since the third line
                "07/12/2002      20//12/2012      1      A"is ignored.


                So you can use RANK TX with
                Group by port = client_id

                Rank port = to_char(DATE_START,'YYYYMMDD')||to_char(DATE_END,'YYYYMMDD').
                Top/Bottom = Top
                Number of Ranks =1

                Regards, Vlad

                • 5. Re: Max for multiple Ports or Columns
                  Nico Heinze Guru

                  Good idea, but still it's advisable to feed the Ranker with sorted input. Just a side note.

                   

                  Regards,

                  Nico