Pyspark: Replacing value in a column by searching a dictionary

You can use either na.replace:

df = spark.createDataFrame([
    ('Tablet', ), ('Phone', ),  ('PC', ), ('Other', ), (None, )
], ["device_type"]), 1).show()
|     Mobile|
|     Mobile|
|    Desktop|
|      Other|
|       null|

or map literal:

from itertools import chain
from pyspark.sql.functions import create_map, lit

mapping = create_map([lit(x) for x in chain(*deviceDict.items())])[df['device_type']].alias('device_type'))
|     Mobile|
|     Mobile|
|    Desktop|
|       null|
|       null|

Please note that the latter solution will convert values not present in the mapping to NULL. If this is not a desired behavior you can add coalesce:

from pyspark.sql.functions import coalesce
    coalesce(mapping[df['device_type']], df['device_type']).alias('device_type')
|     Mobile|
|     Mobile|
|    Desktop|
|      Other|
|       null|

Leave a Comment