Everything that is passed to an UDF is interpreted as a column / column name. If you want to pass a literal you have two options:
-
Pass argument using currying:
def comparatorUDF(n): return udf(lambda c: c == n, BooleanType()) df.where(comparatorUDF("Bonsanto")(col("name")))
This can be used with an argument of any type as long as it is serializable.
-
Use a SQL literal and the current implementation:
from pyspark.sql.functions import lit df.where(comparatorUDF(col("name"), lit("Bonsanto")))
This works only with supported types (strings, numerics, booleans). For non-atomic types see How to add a constant column in a Spark DataFrame?