Skip to content

Conversation

@hoodmane
Copy link
Contributor

If we call TerminateExecution(), it will exit Python execution without unwinding the stack or cleaning up the runtime state. This leaves the Python runtime in a permanently messed up state and all further requests will fail.

This adds a new cpuLimitNearlyExceededCallback to the limit enforcer and hooks it up so that it triggers a SIGINT inside of Python. This can be used to raise a CpuLimitExceeded Python error into the runtime. If this error is ignored, then we'll hit the hard limit and be terminated. If we ever do call TerminateExecution() on a Python isolate, we should condemn the isolate, but that is left as a TODO.

To trigger the SIGINT inside of Python, we have to set two addresses:

  1. we set emscripten_signal_clock to 0 to make the Python eval breaker check for a signal on the next tick.
  2. we set _Py_EMSCRIPTEN_SIGNAL_HANDLING to 1 to make Python check the signal clock.

We also have to set Module.Py_EmscriptenSignalBuffer to a buffer with the number of the signal we wish to trip in it (SIGINT aka 2).

When we start a request we set _Py_EMSCRIPTEN_SIGNAL_HANDLING to 0 to avoid ongoing costs of calling out to JavaScript to check the buffer when no signal is set, and we put a 2 into Py_EmscriptenSignalBuffer.

The most annoying aspect of this is that the symbol emscripten_signal_clock is not exported. For Pyodide 0.28.2, I manually located the address of this symbol and hard coded it. For the next Pyodide, we'll make sure to export it.

@hoodmane hoodmane requested review from a team as code owners November 18, 2025 00:03
@guybedford
Copy link
Contributor

guybedford commented Nov 18, 2025

It would be really useful to have this available to Rust as well - could we expose this under a cloudflare:wasm/internal or similar?

@hoodmane
Copy link
Contributor Author

Seems to me that Rust would also use Worker::Isolate::from(js).setCpuLimitNearlyExceededCallback() but we'll need different C++ logic for it.

@codspeed-hq
Copy link

codspeed-hq bot commented Nov 18, 2025

CodSpeed Performance Report

Merging #5543 will not alter performance

Comparing hoodmane/python-cpu-limiter (49ec0b1) with main (1a26e6a)

Summary

✅ 57 untouched
⏩ 30 skipped1

Footnotes

  1. 30 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@hoodmane hoodmane force-pushed the hoodmane/python-cpu-limiter branch from c2b31e5 to e3aebb2 Compare November 18, 2025 00:45
@cloudflare cloudflare deleted a comment from daveth3t3chg33k Nov 18, 2025
@hoodmane hoodmane force-pushed the hoodmane/python-cpu-limiter branch 4 times, most recently from 60464b9 to 4303149 Compare November 18, 2025 21:41
@github-actions
Copy link

github-actions bot commented Nov 18, 2025

The generated output of @cloudflare/workers-types matches the snapshot in types/generated-snapshot 🎉

@danlapid
Copy link
Collaborator

How do we make sure that this only terminates the current python request and not the entire python runtime?

@hoodmane
Copy link
Contributor Author

Every time we enter Python code we call setupRequestSignalHandling() which clears the interrupt request. So if we exit from Python exactly as we requested the cpu interrupt, then we'll clear it as we call back in.

@dom96
Copy link
Contributor

dom96 commented Nov 19, 2025

Could rapid requests cause the cpu interrupt to be cleared before it has a chance to be processed?

@hoodmane
Copy link
Contributor Author

As far as I understand, the isolate only can be working on one request at a time, so as long as we clear the request for an interrupt when we enter a request we should be okay.

However, there is a potential problem that we don't clear the interrupt request when entering an already scheduled async task. I will fix that.

@hoodmane hoodmane force-pushed the hoodmane/python-cpu-limiter branch 6 times, most recently from 1839c05 to d264c85 Compare November 20, 2025 01:20
@dom96
Copy link
Contributor

dom96 commented Nov 20, 2025

Hmm, I don't think it's true that the isolate can only be working on one request at a time. Imagine a request waiting many seconds on some IO, an isolate should remain responsive to other requests during that time, so it should be possible for it to respond to other requests even if others are currently in progress. I think you should add a test for this case and ensure it works as expected.

@jasnell
Copy link
Collaborator

jasnell commented Nov 20, 2025

It's a bit nuanced. Only a single request will hold the isolate lock at any given time, so in that sense it's true. While a request is waiting on i/o, it will release the lock so that other requests can use it. But keep in mind that we have the microtask queue, which is always global to the isolate. When the task queue gets drained during one request, it can include tasks from other requests. If those aren't scheduling i/o, then the work from those other requests are progressing with this request. This is why we needed the cross-request promise signaling. That arranges it so that promise continuations happen under the correct iocontext, but there are still things like queueMicrotask() which may still end up crossing over and running under a different IoContext.

@guybedford
Copy link
Contributor

So for async requests, failing the promise with the limit exceeded error is necessary as opposed to a synchronous failure?

Or is it okay for async requests to possibly have the limit exceeded trigger for a request that was not itself cpu heavy if other cpu work is being done at the time?

@hoodmane hoodmane force-pushed the hoodmane/python-cpu-limiter branch from d264c85 to b48a742 Compare November 20, 2025 20:12
@hoodmane
Copy link
Contributor Author

I have it set up to clear the signal request every time we enter Python, regardless of whether it is the same request or a different one. So this means that if we exit Python just as the CPU limit triggers we will always clear the signal. If the request keeps going asynchronously, this may mean we hit the hard limit instead. But I think the chance that it exits exactly at the right time to trigger this behavior is very low.

@hoodmane hoodmane force-pushed the hoodmane/python-cpu-limiter branch from b48a742 to 64b1c67 Compare November 20, 2025 22:47
@hoodmane hoodmane force-pushed the hoodmane/python-cpu-limiter branch 3 times, most recently from 2e210ac to d4413da Compare November 24, 2025 22:11
If we call `TerminateExecution()`, it will exit Python execution without
unwinding the stack or cleaning up the runtime state. This leaves the
Python runtime in a permanently messed up state and all further requests
will fail.

This adds a new `cpuLimitNearlyExceededCallback` to the limit enforcer and
hooks it up so that it triggers a SIGINT inside of Python. This can be used
to raise a `CpuLimitExceeded` Python error into the runtime. If this error
is ignored, then we'll hit the hard limit and be terminated. If we ever do
call `TerminateExecution()` on a Python isolate, we should condemn the isolate,
but that is left as a TODO.

To trigger the SIGINT inside of Python, we have to set two addresses:
1. we set `emscripten_signal_clock` to `0` to make the Python eval breaker check
   for a signal on the next tick.
2. we set `_Py_EMSCRIPTEN_SIGNAL_HANDLING` to 1 to make Python check the signal clock.

We also have to set `Module.Py_EmscriptenSignalBuffer` to a buffer with the number
of the signal we wish to trip in it (`SIGINT` aka 2).

When we start a request we set `_Py_EMSCRIPTEN_SIGNAL_HANDLING` to 0 to avoid
ongoing costs of calling out to JavaScript to check the buffer when no signal is set,
and we put a 2 into `Py_EmscriptenSignalBuffer`.

The most annoying aspect of this is that the symbol `emscripten_signal_clock` is not
exported. For Pyodide 0.28.2, I manually located the address of this symbol and hard
coded it. For the next Pyodide, we'll make sure to export it.
@hoodmane hoodmane force-pushed the hoodmane/python-cpu-limiter branch from d4413da to 49ec0b1 Compare November 24, 2025 22:19
@hoodmane hoodmane merged commit 1952c72 into main Nov 24, 2025
31 of 34 checks passed
@hoodmane hoodmane deleted the hoodmane/python-cpu-limiter branch November 24, 2025 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants