How to bucket the range of values from a column and count how many values fall into each interval in scala?

You can use the scala Bucketizer. There’s a good example here:
https://spark.apache.org/docs/2.2.0/ml-features.html#bucketizer

After you use the bucketizer you have a dataframe with a bucket index (i.e index 1, 2, and 3 might correspond to values 1-5, 6-10, 11-15, respectively). You can do a .groupBy and .agg (or use SQL) to get a count of records in each index group).

Leave a Comment