How to get element by Index in Spark RDD (Java)

This should be possible by first indexing the RDD. The transformation zipWithIndex provides a stable indexing, numbering each element in its original order.

Given: rdd = (a,b,c)

val withIndex = rdd.zipWithIndex // ((a,0),(b,1),(c,2))

To lookup an element by index, this form is not useful. First we need to use the index as key:

val indexKey = withIndex.map{case (k,v) => (v,k)}  //((0,a),(1,b),(2,c))

Now, it’s possible to use the lookup action in PairRDD to find an element by key:

val b = indexKey.lookup(1) // Array(b)

If you’re expecting to use lookup often on the same RDD, I’d recommend to cache the indexKey RDD to improve performance.

How to do this using the Java API is an exercise left for the reader.

Leave a Comment