pySpark mapping multiple columns
From my understanding, you can create a map based on columns from reference_df (I assumed this is not a very big dataframe): map_key = concat_ws(‘\0’, PrimaryLookupAttributeName, PrimaryLookupAttributeValue) map_value = OutputItemNameByValue and then use this mapping to get the corresponding values in df1: from itertools import chain from pyspark.sql.functions import collect_set, array, concat_ws, lit, col, create_map … Read more