-
Notifications
You must be signed in to change notification settings - Fork 482
Handle CPU limit exceeded in Python workers #5543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It would be really useful to have this available to Rust as well - could we expose this under a |
|
Seems to me that Rust would also use |
CodSpeed Performance ReportMerging #5543 will not alter performanceComparing Summary
Footnotes
|
c2b31e5 to
e3aebb2
Compare
60464b9 to
4303149
Compare
|
The generated output of |
|
How do we make sure that this only terminates the current python request and not the entire python runtime? |
|
Every time we enter Python code we call |
|
Could rapid requests cause the cpu interrupt to be cleared before it has a chance to be processed? |
|
As far as I understand, the isolate only can be working on one request at a time, so as long as we clear the request for an interrupt when we enter a request we should be okay. However, there is a potential problem that we don't clear the interrupt request when entering an already scheduled async task. I will fix that. |
1839c05 to
d264c85
Compare
|
Hmm, I don't think it's true that the isolate can only be working on one request at a time. Imagine a request waiting many seconds on some IO, an isolate should remain responsive to other requests during that time, so it should be possible for it to respond to other requests even if others are currently in progress. I think you should add a test for this case and ensure it works as expected. |
|
It's a bit nuanced. Only a single request will hold the isolate lock at any given time, so in that sense it's true. While a request is waiting on i/o, it will release the lock so that other requests can use it. But keep in mind that we have the microtask queue, which is always global to the isolate. When the task queue gets drained during one request, it can include tasks from other requests. If those aren't scheduling i/o, then the work from those other requests are progressing with this request. This is why we needed the cross-request promise signaling. That arranges it so that promise continuations happen under the correct iocontext, but there are still things like |
|
So for async requests, failing the promise with the limit exceeded error is necessary as opposed to a synchronous failure? Or is it okay for async requests to possibly have the limit exceeded trigger for a request that was not itself cpu heavy if other cpu work is being done at the time? |
d264c85 to
b48a742
Compare
|
I have it set up to clear the signal request every time we enter Python, regardless of whether it is the same request or a different one. So this means that if we exit Python just as the CPU limit triggers we will always clear the signal. If the request keeps going asynchronously, this may mean we hit the hard limit instead. But I think the chance that it exits exactly at the right time to trigger this behavior is very low. |
b48a742 to
64b1c67
Compare
2e210ac to
d4413da
Compare
If we call `TerminateExecution()`, it will exit Python execution without unwinding the stack or cleaning up the runtime state. This leaves the Python runtime in a permanently messed up state and all further requests will fail. This adds a new `cpuLimitNearlyExceededCallback` to the limit enforcer and hooks it up so that it triggers a SIGINT inside of Python. This can be used to raise a `CpuLimitExceeded` Python error into the runtime. If this error is ignored, then we'll hit the hard limit and be terminated. If we ever do call `TerminateExecution()` on a Python isolate, we should condemn the isolate, but that is left as a TODO. To trigger the SIGINT inside of Python, we have to set two addresses: 1. we set `emscripten_signal_clock` to `0` to make the Python eval breaker check for a signal on the next tick. 2. we set `_Py_EMSCRIPTEN_SIGNAL_HANDLING` to 1 to make Python check the signal clock. We also have to set `Module.Py_EmscriptenSignalBuffer` to a buffer with the number of the signal we wish to trip in it (`SIGINT` aka 2). When we start a request we set `_Py_EMSCRIPTEN_SIGNAL_HANDLING` to 0 to avoid ongoing costs of calling out to JavaScript to check the buffer when no signal is set, and we put a 2 into `Py_EmscriptenSignalBuffer`. The most annoying aspect of this is that the symbol `emscripten_signal_clock` is not exported. For Pyodide 0.28.2, I manually located the address of this symbol and hard coded it. For the next Pyodide, we'll make sure to export it.
d4413da to
49ec0b1
Compare
If we call
TerminateExecution(), it will exit Python execution without unwinding the stack or cleaning up the runtime state. This leaves the Python runtime in a permanently messed up state and all further requests will fail.This adds a new
cpuLimitNearlyExceededCallbackto the limit enforcer and hooks it up so that it triggers a SIGINT inside of Python. This can be used to raise aCpuLimitExceededPython error into the runtime. If this error is ignored, then we'll hit the hard limit and be terminated. If we ever do callTerminateExecution()on a Python isolate, we should condemn the isolate, but that is left as a TODO.To trigger the SIGINT inside of Python, we have to set two addresses:
emscripten_signal_clockto0to make the Python eval breaker check for a signal on the next tick._Py_EMSCRIPTEN_SIGNAL_HANDLINGto 1 to make Python check the signal clock.We also have to set
Module.Py_EmscriptenSignalBufferto a buffer with the number of the signal we wish to trip in it (SIGINTaka 2).When we start a request we set
_Py_EMSCRIPTEN_SIGNAL_HANDLINGto 0 to avoid ongoing costs of calling out to JavaScript to check the buffer when no signal is set, and we put a 2 intoPy_EmscriptenSignalBuffer.The most annoying aspect of this is that the symbol
emscripten_signal_clockis not exported. For Pyodide 0.28.2, I manually located the address of this symbol and hard coded it. For the next Pyodide, we'll make sure to export it.