I have the same problem. Did you get a chance to find a solution
Can you try adding the following run-time property to your mapping: spark.sql.shuffle.partitions and set the value to like 10 and then rerun your mapping. This may effect performance as you may have less partitions and larger data going through the executors.
Yes, I had applied the same property to resolve the issue. But my performance got impacted with datasets of 200GB. Probably , the best approach is to merge the files post creation.
Hi Vardhan ,
do you fine any solution for this issue any property we can set at run time or to merge the files after the load .
our prod is down due to this , you quick response is much appriciated.
You can try control the number of partitions using any of the following flags:
But you may face performance issue.
control the number of partitions using any of the following flags:
HI Puneeth ,
Thanks for your response . we are using bdm 10.2.1 . as per my knowledge Spark.CoalescePartitions will work from informatica 10.2.2 .
if the above values are working for you in 10.2.1 , can you please suggest the value to each component so i will give a try in my environment and where we have to set this property either in mapping run time or hadoop spark propeties .
Awaiting for your response . Thanks in Advance !!!
If you are on version 10.4, you can also use the Flex.PartitionSpec2 flag.
That is a little bit tricky to use, but gives you more control if you know how to use it.
I recommend raising a Support Case to get detailed instructions.
There's an EBF to get the Spark.CoalescePartitions on top of Informatica 10.2.1.
Please raise a Support case to get detailed instructions from a GCS engineer.
You can refer the below KB article:
Setting these properties could impact performance and hence please be careful while overriding this property at Hadoop connection level.