combine text from multiple rows in pyspark
One option is to use pyspark.sql.functions.collect_list() as the aggregate function. from pyspark.sql.functions import collect_list grouped_df = spark_df.groupby(‘category’).agg(collect_list(‘name’).alias(“name”)) This will collect the values for name into a list and the resultant output will look like: grouped_df.show() #+———+———+ #|category |name | #+———+———+ #|A |[A1, A2] | #|B |[B1, B2] | #+———+———+ Update 2019-06-10: If you wanted your … Read more