Pyspark filter dataframe by columns of another dataframe

Left anti join is what you’re looking for:

df1.join(df2, ["userid", "group"], "leftanti")

but the same thing can be done with left outer join:

(df1
    .join(df2, ["userid", "group"], "leftouter")
    .where(df2["pick"].isNull())
    .drop(df2["pick"]))

Leave a Comment