question: is there a way to limit ephemeral storage usage of spark pods?

When running very big jobs via spark we see that sometimes the ephemeral volumes of executors are filled endlessly. That leads to fully used ephemeral storage of the node which then leads to consequences for other pods on that node because they are running out of ephemeral storage as well. We had some trino downtime because of that.

The Volume causing it in the executor looks like the following, seems like a kind of spilling. 

<img width="929" height="543" alt="Image" src="https://github.com/user-attachments/assets/4210435e-8aac-4ea0-9603-1c6fca0ad87c" />

A second thing which is arising out of it, is a second driver spawning in a magical way. I can't explain it yet but in rare cases, the spark application suddenly has 2 running drivers. I'll do my best to provide further details on that. 

### question
is there a way to configure ephemeral storage limits (and maybe more properties) globally for all spark applications the operator applies? it's okay for us to fail one application, but it's a big mess if any application can kill nodes by starting to spill on the nodes filesystem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

question: is there a way to limit ephemeral storage usage of spark pods? #644

question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

question: is there a way to limit ephemeral storage usage of spark pods? #644

Description

question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions