fix: split inproc handler thread #1446

supervacuus · 2025-11-10T10:59:44Z

This is a more elaborate, long-term fix to getsentry/sentry-java#4830 than #1444.

It also finishes the work done here: #1088
And fixes the issues raised here: #1353
and here: #906

So, while the driver for this PR is a downstream issue that exposes the signal-unsafety of some parts of the current inproc implementation, it also addresses a much broader range of concerns that regularly affect inproc users on all platforms.

At a high level, it introduces a separate handler thread for inproc, which the signal handler (or UEF on Windows) wakes after it exchanges crash context data.

The idea is that we minimize signal handler/UEF to do the least amount of syscall stuff (or at least the subset documented in the signal-safety man-page), while the handler thread can execute functions outside that range (with limitations, since thread sync and heap allocations are still problematic). This allows us to reuse stdio functionality like formatters without running squarely into UB territory or having to rewrite all utilities to async-signal-safe versions, as in #1444.

There are a few considerable changes to mention:

since we run the event construction in a separate handler thread, the use of backtrace() or any unwinder that runs from the "current" instruction address is entirely useless (ignoring the fact that backtrace() was always signal-unsafe to begin with, which itself was the source of crashes, hangs or just empty stack traces).
this means we require a "user context"-based stack walker in inproc, which we already partially acknowledged in Using libunwind for mac, since backtrace do not expect thread context… #1088 and fix: support musl on Linux #1233.
on Linux, this PR requires libunwind (the nognu implementation, not the llvm one, which is a pure C++ exception unwinder), which is a breaking change (at least in the sense that users now require an additional dependency at build and runtime). This means that the "general" Linux usage is now the same as with the musl libc environments.
on macOS, we provide a user context stack-walker based on frame pointer records for arm64 and x86-64, and use the system-provided libunwind for the default stack-trace from a call-site. It turned out that the system-provided libunwind wasn't safe enough to use in the context of the signal handler (either led to hangs or had issues with escaping the trampoline). This means users can now use inproc on macOS again (if their code is compiled without omitting frame pointers, which is always the case by default on macOS).

Further improvements/fixes (summarizing the 30 commits, which I didn't want to squash):

the libunwind-based unwinder modules now also validate retrieved ucontext pointers against memory mapping (for Linux and macOS)
got rid of all remaining __sync functions and replaced them with __atomic (especially the signal handler blocking logic and the spinlock)
rectified the inconsistent usage of C++ new with std::nothrow throughout the affected backend code (including the initialization of crashpad_state_t, which still used malloc and memset although it has std::atomic members)
cleaned up the CMake configure phase of the integration test suite.
ensures that test fixtures do not end up in macOS bundles
fixes build issues with by-default PIE and LTO builds
musl is no longer a special case "Linux" in the build script
fixes a couple of warnings and test-case instabilities
introduce macos-26 build config

TODOs:

finish the x86-64 stackwalker for macOS (and clean up the code)
Figure out if we need the libbacktrace fallback at all and how to handle it.
provide a module-level description of the new mechanism in inproc
Decide on having the change
Update documentation
- Advanced usage might be outdated wrt to signal handling of inproc
- Remove mentions of inproc not working on macOS
- Clarify the new libunwind dependency on Linux

* use `std::nothrow` `new` consistently to keep exception-free semantics for allocation * rename static crashpad_handler to have no module-public prefix * use `nullptr` for arguments where we previously used 0 to clarify that those are pointers * eliminate the `memset()` of the `crashpad_state_t` initialization since it now contains non-trivially constructable fields (`std::atomic`) and replace it with `new` and an empty value initializer.

…ld, since libraries like libunwind.a might be packaged without PIC.

…ms with architecture prefixes (32-bit Linux)

…stack also ensure to get the first frame harmonize libunwind usage

…eader in the libunwind walker for Linux and log as much as possible to understand where the actual crash happens

…in unmapped memory

…-trace from an arbitrary frame-pointer

…nd running the deferred code directly inside the signal handler. Nothing changes for them.

…phore on the return channel and let the OS block and wait. Also check the return value of startup_handler_thread in the initialization and propagate the failure.

…rancy guard * up to now, we've been serializing signal handling even though we didn't know whether it was a runtime signal or one we should be handling * this meant that we blocked all our critical sections during a managed exception * it also meant that we blocked any concurrent managed exceptions * it also meant that we introduced a race window during the time when we chained, because incoming signal on other threads would have gotten next in line, before we even completed the current signal handler by moving it completely outside our synchronization we truly chain at start and don't interfere until we know we must.

cursor · 2025-11-20T10:57:55Z

src/backends/sentry_backend_inproc.c

+        }
+        if (sentry__atomic_fetch(&g_handler_should_exit)) {
+            break;
+        }


Bug: Handler thread exits without processing crash

The handler thread checks g_handler_should_exit immediately after waking from the semaphore, before checking g_handler_has_work. If shutdown is initiated after the signal handler signals the semaphore but before the handler thread processes the work flag, the crash event will be lost because the thread exits without processing it. The same issue exists on UNIX at lines 833-835. The check for g_handler_should_exit needs to happen after verifying and processing any pending work to ensure crashes are never dropped during shutdown.

I am open to discussion about this. I am a big fan of letting the shutdown request overrule any others. In this case, it is unlikely they will happen at the same time, but it would have to be either-or. As such, it isn't really a bug, but rather a policy decision.

src/unwinder/sentry_unwinder_libunwind_mac.c

…irectly.

cursor · 2025-11-20T11:20:57Z

src/backends/sentry_backend_inproc.c

+#    endif
+
+#    ifdef SENTRY_PLATFORM_UNIX
+    sentry__enter_signal_handler();


Bug: Concurrent signal handlers corrupt shared handler state

On UNIX, dispatch_ucontext calls sentry__leave_signal_handler() at line 1064 before waiting for the handler thread to complete. This allows a second signal to arrive and call sentry__enter_signal_handler() successfully, then proceed to overwrite the global g_handler_state structure at lines 1042-1061 while the handler thread is still reading from it at line 853. The single global g_handler_state variable has no synchronization protecting concurrent access between multiple signal handlers and the handler thread, leading to potential data corruption when multiple crashes occur in quick succession across different threads.

This is correct, and was a recurring topic during the development of the changes in this PR. I first wanted to have feedback on how to proceed with the rest. The solution here will be a two-stage blocking mechanism, which I have successfully experimented with in previous commits. However, since the signal handler blocking must also support the other backend handlers, I wanted to have a first review.

supervacuus · 2025-11-20T11:32:06Z

@jpnurmi: I primarily added you regarding the chain-at-start handler strategy. The most significant change in that regard is that we no longer block anything when chaining at the start (see 6b6e545 for details).

@vaind: I primarily added you here because I know you consume inproc downstream and may also be affected by changes to the unwinder-to-platform mapping in the root CMake script. I don't think that any of the Windows changes will cause particular issues downstream, but differing build configurations could cause some pain.

supervacuus added 30 commits November 6, 2025 11:10

fix: split inproc with a handler thread

01eb6e1

prevent warning on write() return value

7eb3ca8

get rid of unsused dispatch parameter

0cff127

and another unused return value for write()

1265007

fix trivial windows compile def issues to run tests locally

b57995c

make tests a bit less brittle

5a11ea2

ensure that find_cp_path isn't built on windows.

b83fc37

eliminate local handler strategy declaration in signal handler

c78f425

turn off painful warning noise

116c4a5

add libunwind to the CI dependencies

b4747c7

ensure we build the example with PIE disabled when doing a static bui…

d21c180

…ld, since libraries like libunwind.a might be packaged without PIC.

extend handler crash to windows

b0822d3

further clean of inproc

8b93cca

review and fix/remove remaining TODOs from branch changes

c4f662a

eliminate multichar warning in crashpad backend when using GCC

84b3a7a

ensure libunwind in benchmark and codeql workflows

ef7b03b

fix Linux ARM64 build (warning in libunwind)

6ae25c3

provide an x86_64 ucontext stackwalker for macOS

a0195db

bump lower-end GCC to 10.5.0

8a63e64

remove obsolete ASAN patch from CI

c1fd0a8

ensure lower-end GCC also finds libunwind

7c97dbb

disable LTO for the example too

524d8b4

use clang diagnostics only for __clang__

1c378c3

install 32-bit libunwind package for 32-bit Linux CI test config

4912a82

fix weird linker asymmetry

0c24c4f

provide empty path-suffix so that CMake can find libunwind on platfor…

70dd979

…ms with architecture prefixes (32-bit Linux)

remove can_lock mechanism from the handler block API

4449859

use PRIx64 without ull cast in sentry__value_new_addr()

4ce7940

fix off-by-one bug when returning the trace length after walking the …

8954928

…stack also ensure to get the first frame harmonize libunwind usage

isolation pinpoint fine, but only exclude when using ucontext.

fcee5a5

supervacuus mentioned this pull request Nov 13, 2025

fix: silence nontrivial memcall warning for clang >= 20 getsentry/crashpad#141

Merged

supervacuus added 20 commits November 13, 2025 18:34

silence clang-20 warning for crashpad

eda314b

update crashpad after merging to getsentry

89d95d2

we know the walk crashes in aarch64 tsan, now introduce stack bound r…

4d5114b

…eader in the libunwind walker for Linux and log as much as possible to understand where the actual crash happens

format

13dfdbc

add unistd.h for musl

72612cf

eliminate the logs and ensure we never walk the callers if the SP is …

df92728

…in unmapped memory

format

bdf6b25

fix invalid find_mem_range() return value check

7e4e1fa

remove macOS left-overs in the libbacktrace unwinder module

0966118

extract fp_walk for the macOS unwinder so we can also provide a stack…

549293a

…-trace from an arbitrary frame-pointer

for targets that must use backtrace() as an unwinder we fall back u…

be0b245

…nd running the deferred code directly inside the signal handler. Nothing changes for them.

instead of spinning on atomic in the signal-handler add ACK pipe/sema…

1c8dbf9

…phore on the return channel and let the OS block and wait. Also check the return value of startup_handler_thread in the initialization and propagate the failure.

actually check the FP against mach_vm_region bounds in the validation

1d85149

Merge branch 'master' into fix/split_inproc_handler_thread

206294a

check mach_vm_region bounds only on macOS builds

cd130d3

ensure we conditionally return on acquire in block_for_signal_handler

9e9d933

update changelog

a011327

Merge branch 'master' into fix/split_inproc_handler_thread

fdc82a8

add inproc module-level docs for developers

4a62c6e

supervacuus marked this pull request as ready for review November 20, 2025 10:52

supervacuus requested review from JoshuaMoelans and jpnurmi November 20, 2025 10:52

cursor bot reviewed Nov 20, 2025

View reviewed changes

fix: return the number of frames when doing a stack walk from an FP d…

7d6142f

…irectly.

supervacuus requested a review from vaind November 20, 2025 11:19

cursor bot reviewed Nov 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: split inproc handler thread #1446

fix: split inproc handler thread #1446

Uh oh!

supervacuus commented Nov 10, 2025 •

edited

Loading

Uh oh!

cursor bot Nov 20, 2025

Uh oh!

supervacuus Nov 20, 2025

Uh oh!

Uh oh!

cursor bot Nov 20, 2025

Uh oh!

supervacuus Nov 20, 2025

Uh oh!

supervacuus commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix: split inproc handler thread #1446

Are you sure you want to change the base?

fix: split inproc handler thread #1446

Uh oh!

Conversation

supervacuus commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot Nov 20, 2025

Choose a reason for hiding this comment

Bug: Handler thread exits without processing crash

Uh oh!

supervacuus Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot Nov 20, 2025

Choose a reason for hiding this comment

Bug: Concurrent signal handlers corrupt shared handler state

Uh oh!

supervacuus Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

supervacuus commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

supervacuus commented Nov 10, 2025 •

edited

Loading