You can use the “left anti” join type – either with DataFrame API or with SQL (DataFrame API supports everything that SQL supports, including any join condition you need):
DataFrame API:
df.as("table1").join(
df2.as("table2"),
$"table1.name" === $"table2.name" && $"table1.age" === $"table2.howold",
"leftanti"
)
SQL:
sqlContext.sql(
"""SELECT table1.* FROM table1
| LEFT ANTI JOIN table2
| ON table1.name = table2.name AND table1.age = table2.howold
""".stripMargin)
NOTE: it’s also worth noting that there’s a shorter, more concise way of creating the sample data without specifying the schema separately, using tuples and the implicit toDF
method, and then “fixing” the automatically-inferred schema where needed:
import spark.implicits._
val df = List(
("mike", 26, true),
("susan", 26, false),
("john", 33, true)
).toDF("name", "age", "isBoy")
val df2 = List(
("mike", "grade1", 45, "baseball", new java.sql.Date(format.parse("1957-12-10").getTime)),
("john", "grade2", 33, "soccer", new java.sql.Date(format.parse("1978-06-07").getTime)),
("john", "grade2", 32, "golf", new java.sql.Date(format.parse("1978-06-07").getTime)),
("mike", "grade2", 26, "basketball", new java.sql.Date(format.parse("1978-06-07").getTime)),
("lena", "grade2", 23, "baseball", new java.sql.Date(format.parse("1978-06-07").getTime))
).toDF("name", "grade", "howold", "hobby", "birthday").withColumn("birthday", $"birthday".cast(DateType))