How MapReduce work in Apache Spark and Scala?

Assume – for example – that these are coordinates.

Are (x,y) and (y,x) the same coordinates? Certainly not!

Therefore, mapreduce must not assume that the order of a tuple is irrelevant by default. (That does not say it can’t be done, just that the system must not assume this as default behavior)

If you want this behavior, simply output appropriate tuples:

if x < y:
    pairs.append( (x,y) )
else:
    pairs.append( (y,x) )

More Related Contents:

How to define partitioning of DataFrame?
Encoder error while trying to map dataframe row to updated row
Case class equality in Apache Spark
Spark Scala: How to convert Dataframe[vector] to DataFrame[f1:Double, …, fn: Double)]
How can I change column types in Spark SQL’s DataFrame?
how to make saveAsTextFile NOT split output into multiple file?
Dropping a nested column from Spark DataFrame
Spark losing println() on stdout
Spark DataFrame: does groupBy after orderBy maintain that order?
Filling gaps in timeseries Spark
Spark UDAF with ArrayType as bufferSchema performance issues
Apache Spark how to append new column from list/array to Spark dataframe
How to get ID of a map task in Spark?
Spark: produce RDD[(X, X)] of all possible combinations from RDD[X]
What are possible reasons for receiving TimeoutException: Futures timed out after [n seconds] when working with Spark [duplicate]
Parsing multiline records in Scala
Why does join fail with “java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]”?
Create new Dataframe with empty/null field values
Encode an ADT / sealed trait hierarchy into Spark DataSet column
Explanation of fold method of spark RDD
Apache Spark, add an “CASE WHEN … ELSE …” calculated column to an existing DataFrame
Replace null values in Spark DataFrame
Spark RDD default number of partitions
Why does Spark RDD partition has 2GB limit for HDFS?
Calculate Cosine Similarity Spark Dataframe
How to print the contents of RDD?
Spark2.1.0 incompatible Jackson versions 2.7.6
Coalesce reduces parallelism of entire stage (spark)
Scala-Spark Dynamically call groupby and agg with parameter values
Joining two dataframes without a common column

More Related Contents:

Leave a Comment Cancel reply