Spark ML VectorAssembler returns strange output

There is nothing strange about the output. Your vector seems to have lots of zero elements thus spark used it’s sparse representation.

To explain further :

It seems like your vector is composed of 18 elements (dimension).

This indices [0,1,6,9,14,17] from the vector contains non zero elements which are in order [17.0,15.0,3.0,1.0,4.0,2.0]

Sparse Vector representation is a way to save computational space thus easier and faster to compute. More on Sparse representation here.

Now of course you can convert that sparse representation to a dense representation but it comes at a cost.

In case you are interested in getting feature importance, thus I advise you to take a look at this.

Leave a Comment