Tuning parameters for implicit pyspark.ml ALS matrix factorization model through pyspark.ml CrossValidator

Ignoring technical issues, strictly speaking neither method is correct given the input generated by ALS with implicit feedback.

  • you cannot use RegressionEvaluator because, as you already know, prediction can be interpreted as a confidence value and is represented as a floating point number in range [0, 1] and label column is just an unbound integer. These values are clearly not comparable.
  • you cannot use BinaryClassificationEvaluator because even if the prediction can be interpreted as probability label doesn’t represent binary decision. Moreover prediction column has invalid type and couldn’t be used directly with BinaryClassificationEvaluator

You can try to convert one of the columns so input fit the requirements but this is is not really a justified approach from a theoretical perspective and introduces additional parameters which are hard to tune.

  • map label column to [0, 1] range and use RMSE.

  • convert label column to binary indicator with fixed threshold and extend ALS / ALSModel to return expected column type. Assuming threshold value is 1 it could be something like this

    from pyspark.ml.recommendation import *
    from pyspark.sql.functions import udf, col
    from pyspark.mllib.linalg import DenseVector, VectorUDT
    
    class BinaryALS(ALS):
        def fit(self, df):
            assert self.getImplicitPrefs()
            model = super(BinaryALS, self).fit(df)
            return ALSBinaryModel(model._java_obj)
    
    class ALSBinaryModel(ALSModel):
        def transform(self, df):
            transformed = super(ALSBinaryModel, self).transform(df)
            as_vector = udf(lambda x: DenseVector([1 - x, x]), VectorUDT())
            return transformed.withColumn(
                "rawPrediction", as_vector(col("prediction")))
    
    # Add binary label column
    with_binary = dfCounts.withColumn(
        "label_binary", (col("rating") > 0).cast("double"))
    
    als_binary_model = BinaryALS(implicitPrefs=True).fit(with_binary)
    
    evaluatorB = BinaryClassificationEvaluator(
        metricName="areaUnderROC", labelCol="label_binary")
    
    evaluatorB.evaluate(als_binary_model.transform(with_binary))
    ## 1.0
    

Generally speaking, material about evaluating recommender systems with implicit feedbacks is kind of missing in textbooks, I suggest you take a read on eliasah‘s answer about evaluating these kind of recommenders.

Leave a Comment