-
-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Affected Stackable version
No response
Affected Apache Spark-on-Kubernetes version
No response
Current and expected behavior
Disclaimer, we're still in 25.3.0 with spark but currently upgrading to 25.11.0 - if you fixed that in between, just let me know but i didn't find a related issue fitting my problem
We found the cause for the situation of two or more same spark applications running at the same time. You can reproduce that by just deleting the spark-submit pod ungracefully (which is done in case of evictions or node outages) and even when gracefully shutting down the pod. After a couple of seconds, a new spark-submit pod is booting up, starting a new driver and the driver starts its executors. The problem: old driver and executors are still running. That leads to corrupted data, when to spark apps update the same dataset at the same time.
I expect a YARN like behaviour, when the submit is killed then a) the whole application should fail or b) the application should not be restarted and running application can continue its doing
Issue should be easily reproducible
Possible solution
No response
Additional context
No response
Environment
No response
Would you like to work on fixing this bug?
None