How MapReduce work in Apache Spark and Scala?

Assume – for example – that these are coordinates.

Are (x,y) and (y,x) the same coordinates? Certainly not!

Therefore, mapreduce must not assume that the order of a tuple is irrelevant by default. (That does not say it can’t be done, just that the system must not assume this as default behavior)

If you want this behavior, simply output appropriate tuples:

if x < y:
    pairs.append( (x,y) )
else:
    pairs.append( (y,x) )

Leave a Comment