Create single row dataframe from list of list PySpark

I find it’s useful to think of the argument to createDataFrame() as a list of tuples where each entry in the list corresponds to a row in the DataFrame and each element of the tuple corresponds to a column.

You can get your desired output by making each element in the list a tuple:

data = [([1.1, 1.2],), ([1.3, 1.4],), ([1.5, 1.6],)]
dataframe = sqlCtx.createDataFrame(data, ['features'])
dataframe.show()
#+----------+
#|  features|
#+----------+
#|[1.1, 1.2]|
#|[1.3, 1.4]|
#|[1.5, 1.6]|
#+----------+

Or if changing the source is cumbersome, you can equivalently do:

data = [[1.1, 1.2], [1.3, 1.4], [1.5, 1.6]]
dataframe = sqlCtx.createDataFrame(map(lambda x: (x, ), data), ['features'])
dataframe.show()
#+----------+
#|  features|
#+----------+
#|[1.1, 1.2]|
#|[1.3, 1.4]|
#|[1.5, 1.6]|
#+----------+

Leave a Comment