Suppose you have a source table consisting of 4 million records.
Further suppose that each record in this table has a total width of app. 23 KB.
Now multiply 23 KB with 4 million, and you get 92 GB of disk space.
Having written that, there might be ways to circumvent building such huge cache files. However, we cannot tell without further analysis and more information. For example, what are your sources, how large are they, how are they structured (a rough description is enough), how are the sources related, what does the mapping do.
And last not least: which version of PowerCenter / Informatica platform do you work with on what OS / HW platform?
Thanks for responding to the question.
we are pulling data from 2 sql server tables and loading it to flat file. There is foreign key relationship between two tables
we have an expression immediately after the two SQ, then a sorter
SQ --> EXP-->SRT(asc on PK)--> joining with JNR below
SQ-->EXP-->SRT(asc on FK and desc on update date field)-->AGG--> SRT-->JNR(normal joining both Sources)-->EXP-->TRGT
Its a concurrent workflow.
1st run - 124K, 316K
2nd run - 108K, 274K
3rd run - 322K, 894K
4th run - 371K,1009K
5th run - 459K,1278K
6th run - 20K, 58K
7th run - 7K, 18K
Informatica 10.2, UNIX OS
Please let me know,
how I can reduce the sorter temp file getting generated
That's not sufficient information. How large is each record? On average, as the Sorter almost always sorts only according to "real" data and not in the defined maximum width.
Below are the session statistics:
Sorter - For Transformation [srt_C], memory used is  bytes and row size is 202664 bytes.
TRANSF_1_4_1 SORT_40422 SORT_40422 End of output from Sorter Transformation [srt_C]. Processed 459021 rows (93027031944 input bytes; 471752245248 temp I/O bytes). : (Sun Oct 06 22:17:59 2019)
TRANSF _1_4_1 SORT_40426 Performance Warning! Sorter Transformation [srt_C] required 6-pass sort with
471752245248 temp I/O bytes. The cache size was so small that it required multiple passes of intermediate disk I/O with the temp file. This severely
impacted the performance. The cache size should only need to be a small fraction of input data size, in addition to the minimum size about 8MB, to
achieve reasonable performance. Please try to set it to at least 427MB for 2-pass sort (1-pass temp I/O), or better 91503MB for 1-pass in-memory sort.
That message states it clearly: the Sorter needs its 92 GB of memory because that's the size of the data to be processed. The only thing you can do is to improve the speed of the session by increasing the cache size from 16 MB to let's say 95 MB; that should be sufficient to speed up the session heavily.
Is there a way that I can reduce the size of the temp file(PMSORT -92 GB) that is being created
There might be (depending on your source system), but I cannot tell you whether you can go this way.
The Sorter is one of those transformations which obey the so-called "transformation scope". Usually this transformation scope is set to All Input, meaning that these transformations cache all input data before they start processing data.
If you are able to somehow "pre-sort" the data before feeding them into the mapping into smaller chunks, then you could use a Transaction Control and an EXP to break up the whole data stream into "transactions" (actually simply larger chunks of records). Then you can set the Sorter to Transformation Scope = Transaction, making it cache only all records per "transaction" and not for the whole input stream.
Of course this requires that the transformations following the Sorter are (if they can be set up this way) set to Transformation Scope = Transaction as well; otherwise the whole story wouldn't have any positive effect.
The whole thing is not really complicated, but may I advice you to first play around with a small data set in order to get a feeling how this "transactioning" works before you go and change this huge mapping. All necessary details can be found in the Designer and the Transformation guide for PowerCenter, but nonetheless this "transaction scoping" is not the best-explained feature of PowerCenter.
Just saw that I forgot to mention one example of what this "pre-sorting" might mean.
Imagine you have to process measurement data from 2,500 weather measuring stations all over the country.
Further imagine that the sensor data are not sorted per measuring station, but that you get the data for all measurement stations as one single file per measurement station.
In such a case you can make use of the "transaction scoping": use an EXP and a Transaction Control to start a new "transaction" with each new input file; then use a Sorter (set to Transformation Scope = Transaction) to sort the data per input file (i.e. per measurement station); now you can process the sensor data per "transaction" (meaning per input file).
In such cases you can often save many GB of disk space and RAM for caches. And in this example you can easily save 99% of the disk space and RAM needed to cache all input data.
Hope that makes it clear why I stated that you have to see whether this "transaction scoping" might be helpful for you or not.