How to improve performance for slow Spark jobs using DataFrame and JDBC connection?

All of the aggregation operations are performed after the whole dataset is retrieved into memory into a DataFrame collection. So doing the count in Spark will never be as efficient as it would be directly in TeraData. Sometimes it’s worth it to push some computation into the database by creating views and then mapping those … Read more