Array Intersection in Spark SQL

Since Spark 2.4 array_intersect function can be used directly in SQL

spark.sql(
  "SELECT array_intersect(array(1, 42), array(42, 3)) AS intersection"
).show()
+------------+
|intersection|
+------------+
|        [42]|
+------------+

and Dataset API:

import org.apache.spark.sql.functions.array_intersect

Seq((Seq(1, 42), Seq(42, 3)))
  .toDF("a", "b")
  .select(array_intersect($"a", $"b") as "intersection")
  .show()
+------------+
|intersection|
+------------+
|        [42]|
+------------+

Equivalent functions are also present in the other languages:

Leave a Comment