Can you please tell us the total number of partitioned folders are present in HDFS for the Hive table for the 2-3 partitioned columns ?
Since the query will go through all partitions, it must be the reason why it takes a lot of time. Also, I believe the same delay should be seen while running MSCK REPAIR TABLE via beeline as well.
Thanks for the reply.
We have many partitions(6000+). We also believe that all partitions scan is the root cause for delay. But is there a way we can avoid this with Blaze or DIS properties?
Unfortunately, if Blaze Natively writes into Hive, it does so by writing the data into partitioned HDFS folders and then using MSCK REPAIR TABLE so that Hive discovers the changes in partitions as well as the new partitions.
Can you please tell me if Spark can be used here instead of Blaze ?
You can try adding hive.msck.path.validation=skip in the JDBC Hive connection string as below: