How to Optimally Utilize Spark Cluster Resources

2 min readOct 5, 2018

When a Spark application is submitted to a cluster, the cluster allocates the resources requested by the application for the entire duration of the application lifecycle. These resources cannot be shared with other application as they are dedicated to that application.

This paradigm is suitable for batch processing applications. The application is submitted, handling huge amounts of data, and when it is done (the main program exits and the driver JVM is terminated), the cluster reclaims the resources back and those resources are available for other applications to utilize. Usually, the batch application does need the resources for most of its lifecycle. However, what if this application is not a batch job? What if it is a server that serves users upon request? Or maybe a streaming application that handles data in a variable load? On such applications, the demand for high resources is only needed during peak time, but during idle time, it is a waste of resources to allocate high capacity that is not used. For this, Spark comes to our aid with Spark Dynamic Allocation. The main idea is this: the Spark application will request minimal (or even no) resources during idle time, but when there are tasks to be performed, it will request more resources to complete those tasks. When the load is done, Spark will release those resources back to the cluster. In this way, we can utilize our cluster’s resources in an efficient way.

Let’s learn how to utilize the Spark Cluster in an efficient manner.

In order to enable dynamic allocation, the cluster must be configured to have an external shuffle service. This is needed in order to retain shuffle information when the Executor is removed. All cluster managers used by Spark support external shufflers. ExecutorAllocationManager is responsible for dynamic allocation of executors. Dynamic Allocation comes with the policy of scaling executors up and down as follows:

Scale Up Policy requests new executors when there are pending tasks and increases the number of executors exponentially since executors start slow and Spark application may need slightly more.
Scale Down Policy removes executors that have been idle for spark.dynamicAllocation.executorIdleTimeout seconds.

Below are the list of properties required to enable dynamic allocation

Makes sure that spark.dynamicAllocation.initialExecutors is equal or greater than spark.dynamicAllocation.minExecutors .If not, you should see the following WARN message in the logs:

spark.dynamicAllocation.initialExecutors less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.

Conclusion

On a long-running Spark application with a substantial amount of idle time, it is more efficient to use dynamic allocation and cluster resources for other needs during these idle periods. This still allows the long-running application to utilize high resources on peak time. Configuring your applications wisely will provide a good balance between smart allocation and performance.

References:

https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation

How to Optimally Utilize Spark Cluster Resources

Conclusion

References:

Written by Suman Das

No responses yet