For Spark 2.1.0, my suggestion would be to use head(n: Int)
or take(n: Int)
with isEmpty
, whichever one has the clearest intent to you.
df.head(1).isEmpty
df.take(1).isEmpty
with Python equivalent:
len(df.head(1)) == 0 # or bool(df.head(1))
len(df.take(1)) == 0 # or bool(df.take(1))
Using df.first()
and df.head()
will both return the java.util.NoSuchElementException
if the DataFrame is empty. first()
calls head()
directly, which calls head(1).head
.
def first(): T = head()
def head(): T = head(1).head
head(1)
returns an Array, so taking head
on that Array causes the java.util.NoSuchElementException
when the DataFrame is empty.
def head(n: Int): Array[T] = withAction("head", limit(n).queryExecution)(collectFromPlan)
So instead of calling head()
, use head(1)
directly to get the array and then you can use isEmpty
.
take(n)
is also equivalent to head(n)
…
def take(n: Int): Array[T] = head(n)
And limit(1).collect()
is equivalent to head(1)
(notice limit(n).queryExecution
in the head(n: Int)
method), so the following are all equivalent, at least from what I can tell, and you won’t have to catch a java.util.NoSuchElementException
exception when the DataFrame is empty.
df.head(1).isEmpty
df.take(1).isEmpty
df.limit(1).collect().isEmpty
I know this is an older question so hopefully it will help someone using a newer version of Spark.