Skip to content

BUG: spark application can have multiple drivers at once #646

@maxgruber19

Description

@maxgruber19

Affected Stackable version

No response

Affected Apache Spark-on-Kubernetes version

No response

Current and expected behavior

Disclaimer, we're still in 25.3.0 with spark but currently upgrading to 25.11.0 - if you fixed that in between, just let me know but i didn't find a related issue fitting my problem

We found the cause for the situation of two or more same spark applications running at the same time. You can reproduce that by just deleting the spark-submit pod ungracefully (which is done in case of evictions or node outages) and even when gracefully shutting down the pod. After a couple of seconds, a new spark-submit pod is booting up, starting a new driver and the driver starts its executors. The problem: old driver and executors are still running. That leads to corrupted data, when to spark apps update the same dataset at the same time.

I expect a YARN like behaviour, when the submit is killed then a) the whole application should fail or b) the application should not be restarted and running application can continue its doing

Issue should be easily reproducible

Possible solution

No response

Additional context

No response

Environment

No response

Would you like to work on fixing this bug?

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions