How to check if spark dataframe is empty?

For Spark 2.1.0, my suggestion would be to use head(n: Int) or take(n: Int) with isEmpty, whichever one has the clearest intent to you.

df.head(1).isEmpty
df.take(1).isEmpty

with Python equivalent:

len(df.head(1)) == 0  # or bool(df.head(1))
len(df.take(1)) == 0  # or bool(df.take(1))

Using df.first() and df.head() will both return the java.util.NoSuchElementException if the DataFrame is empty. first() calls head() directly, which calls head(1).head.

def first(): T = head()
def head(): T = head(1).head

head(1) returns an Array, so taking head on that Array causes the java.util.NoSuchElementException when the DataFrame is empty.

def head(n: Int): Array[T] = withAction("head", limit(n).queryExecution)(collectFromPlan)

So instead of calling head(), use head(1) directly to get the array and then you can use isEmpty.

take(n) is also equivalent to head(n)…

def take(n: Int): Array[T] = head(n)

And limit(1).collect() is equivalent to head(1) (notice limit(n).queryExecution in the head(n: Int) method), so the following are all equivalent, at least from what I can tell, and you won’t have to catch a java.util.NoSuchElementException exception when the DataFrame is empty.

df.head(1).isEmpty
df.take(1).isEmpty
df.limit(1).collect().isEmpty

I know this is an older question so hopefully it will help someone using a newer version of Spark.

More Related Contents:

Leave a Comment Cancel reply