Finally I found the reason. It is because Yarn kills the executor (container) because the executor is memory overhead. Just turn up values of spark.yarn.driver.memoryOverhead
or spark.yarn.executor.memoryOverhead
or both.
More Related Contents:
- Finding duplicates from large data set using Apache Spark
- Using a column value as a parameter to a spark DataFrame function
- Find maximum row per group in Spark DataFrame
- Split Spark Dataframe string column into multiple columns
- While writing to hdfs path getting error java.io.IOException: Failed to rename
- Multiple Aggregate operations on the same column of a spark dataframe
- Partitioning in spark while reading from RDBMS via JDBC
- How to control partition size in Spark SQL
- How to import multiple csv files in a single load?
- How to access element of a VectorUDT column in a Spark DataFrame?
- Spark DataFrame: count distinct values of every column
- How to split a list to multiple columns in Pyspark?
- How to save/insert each DStream into a permanent table
- Spark load data and add filename as dataframe column
- What is the maximum size for a broadcast object in Spark?
- PySpark: how to resample frequencies
- How to find mean of grouped Vector columns in Spark SQL?
- Generate a Spark StructType / Schema from a case class
- How does createOrReplaceTempView work in Spark?
- PySpark – get row number for each row in a group
- How to find count of Null and Nan values for each column in a PySpark dataframe efficiently?
- Keep only duplicates from a DataFrame regarding some field
- pyspark: count distinct over a window
- Array Intersection in Spark SQL
- PySpark: How to fillna values in dataframe for specific columns?
- What does “Correlated scalar subqueries must be Aggregated” mean?
- How to improve broadcast Join speed with between condition in Spark
- How to calculate Median in spark sqlContext for column of data type double
- Spark SQL referencing attributes of UDT
- PySpark error: AttributeError: ‘NoneType’ object has no attribute ‘_jvm’