Your code is almost correct – with two issues:
-
The return type of your function is
DataFrame
, but the last line isaggregated.show()
, which returnsUnit
. Remove the call toshow
to returnaggregated
itself, or just return the result ofagg
immediately -
DataFrame.groupBy
expects arguments as follows:col1: String, cols: String*
– so you need to pass matching arguments: the first columns, and then the rest of the columns as a list of arguments, you can do that as follows:df.groupBy(cols.head, cols.tail: _*)
Altogether, your function would be:
def groupAndAggregate(df: DataFrame, aggregateFun: Map[String, String], cols: List[String] ): DataFrame ={
val grouped = df.groupBy(cols.head, cols.tail: _*)
val aggregated = grouped.agg(aggregateFun)
aggregated
}
Or, a similar shorter version:
def groupAndAggregate(df: DataFrame, aggregateFun: Map[String, String], cols: List[String] ): DataFrame = {
df.groupBy(cols.head, cols.tail: _*).agg(aggregateFun)
}
If you do want to call show
within your function:
def groupAndAggregate(df: DataFrame, aggregateFun: Map[String, String], cols: List[String] ): DataFrame ={
val grouped = df.groupBy(cols.head, cols.tail: _*)
val aggregated = grouped.agg(aggregateFun)
aggregated.show()
aggregated
}