Skewed dataset join in Spark?

Pretty good article on how it can be done: https://datarus.wordpress.com/2015/05/04/fighting-the-skew-in-spark/ Short version: Add random element to large RDD and create new join key with it Add random element to small RDD using explode/flatMap to increase number of entries and create new join key Join RDDs on new join key which will now be distributed better … Read more

How to join query in mongodb?

To have everything with just one query using the $lookup feature of the aggregation framework, try this : db.User.aggregate( [ // First step is to extract the “friends” field to work with the values { $unwind: “$friends” }, // Lookup all the linked friends from the User collection { $lookup: { from: “User”, localField: “friends”, … Read more