What is the concept of application, job, stage and task in spark?

The main function is the application.

When you invoke an action on an RDD, a “job” is created. Jobs are work submitted to Spark.

Jobs are divided into “stages” based on the shuffle boundary. This can help you understand.

Each stage is further divided into tasks based on the number of partitions in the RDD. So tasks are the smallest units of work for Spark.

Leave a Comment