How to pass a constant value to Python UDF?

Everything that is passed to an UDF is interpreted as a column / column name. If you want to pass a literal you have two options:

  1. Pass argument using currying:

    def comparatorUDF(n):
        return udf(lambda c: c == n, BooleanType())
    
    df.where(comparatorUDF("Bonsanto")(col("name")))
    

    This can be used with an argument of any type as long as it is serializable.

  2. Use a SQL literal and the current implementation:

    from pyspark.sql.functions import lit
    
    df.where(comparatorUDF(col("name"), lit("Bonsanto")))
    

    This works only with supported types (strings, numerics, booleans). For non-atomic types see How to add a constant column in a Spark DataFrame?

Leave a Comment