What does "huge" mean?
50 million records?
Each record containing 25 KB of data?
A column profile for 180 distinct columns?
Please be a bit more precise about what you're trying to do.
in the meantime we reduced the amount of records by subsetting, and discover the attributes with sql
but our starting point was a partinioned table with > 2 billion records
2 billion records is not really a small starting point.
Also profiling is a complex task which involves a whole lot of fuzzy logic and cross-analysis of many, many, MANY data sources. It's really a complicated thing to do.
Considering even a basic profile on 2 bn records as "running slow"... I would never expect such a fast result, to be honest. Not under these preconditions.