Skip to content

Add multithreaded tests for cleanup of handler when stopping search#18

Closed
saksham-chawla wants to merge 2 commits intocodex/replay-pr-138-basefrom
codex/replay-pr-138-head
Closed

Add multithreaded tests for cleanup of handler when stopping search#18
saksham-chawla wants to merge 2 commits intocodex/replay-pr-138-basefrom
codex/replay-pr-138-head

Conversation

@saksham-chawla
Copy link
Collaborator

@saksham-chawla saksham-chawla commented Mar 18, 2026

What
This change introduces a suite of new multithreaded tests to verify the robustness of the StopFindService mechanism, addressing the potential for race conditions identified in the issue eclipse-score#86. It specifically targets corner cases related to handler cleanup that could lead to use-after-free errors when StopFindService is called from within a FindServiceHandler or from multiple threads concurrently.

The original implementation had a conceptual flaw where a FindServiceHandler could be destroyed prematurely if StopFindService was called from within the handler itself. While this was fixed with a "late removal" mechanism, the existing test coverage was insufficient to guarantee the fix was robust under various concurrent scenarios.

This PR adds four new standalone test applications, each targeting a specific corner case:

  1. find_stop_find_test
    -Scenario: This is the fundamental re-entrancy test. It starts a service discovery, and when the FindServiceHandler is
    invoked, it immediately calls StopFindService on the very handle that triggered the callback.
    - Purpose: This test directly verifies that the "late removal" of the handler is working correctly. If the handler's resources
    were deallocated immediately upon the StopFindService call, the handler would crash before it could finish executing. The
    success of this test proves that the framework safely waits for the handler to complete before cleanup, preventing a use-
    after-free error.

  2. find_long_running_handler_test:
    - Scenario: The FindServiceHandler calls StopFindService on its own handle and then continues to execute for a
    significant duration (by sleeping) before returning.
    - Purpose: Directly stresses the "late removal" mechanism. It ensures that the handler's resources remain valid for its
    entire lifetime, even after the stop request has been processed, preventing use-after-free errors.

  3. find_inter_stop_test:
    - Scenario: Two independent service discovery operations are active. The handler for xpad/cp60/MapApiLanesStamped
    is triggered and immediately calls StopFindService on the handle for xpad/cp60/MapApiLanesStamped_B.
    - Purpose: Checks for unintended side effects or interference between separate, concurrent discovery operations,
    ensuring that stopping one does not corrupt the state of another.

  4. find_concurrent_stop_test:
    - Scenario: Tests thread-safety by having two threads (the FindServiceHandler and a dedicated stopper_thread) call
    StopFindService on the same handle concurrently.
    - Purpose: Verifies that the internal state management of the discovery process is protected against race conditions
    and that the system handles simultaneous stop requests gracefully without deadlocking or crashing.

Why
To prevent regressions and ensure the stability of the service discovery framework, we need comprehensive tests that cover critical race conditions. These tests simulate aggressive, concurrent usage patterns that were not previously covered, increasing confidence in the "late removal" fix for handlers and the overall thread-safety of the mw::com runtime.


In addition to the standalone applications, the PR also mirrors these multithreaded scenarios in new GTest cases within proxy_base_test.cpp to validate StopFindService behavior at the unit-test level.

Architecture Diagram:

flowchart TD
    subgraph Test_Applications
        A["find_stop_find_test.cpp (+116 lines)"]
        B["find_long_running_handler_test.cpp (+120 lines)"]
        C["find_inter_stop_test.cpp (+159 lines)"]
        D["find_concurrent_stop_test.cpp (+136 lines)"]
        E[/"mw_com_config.json (+34 lines)"/]
        F["BUILD (+51 lines)"]
    end

    subgraph Core_Components
        G["ProxyBase"]
        H["ServiceDiscovery"]
        I["FindServiceHandler"]
    end

    subgraph Unit_Tests
        J["proxy_base_test.cpp (+209 lines)"]
        K["ServiceDiscoveryMock"]
    end

    A -->|tests re-entrancy| G
    B -->|stresses late removal| G
    C -->|tests inter-op stop| G
    D -->|tests concurrent stop| G

    G -->|calls| H
    H -->|invokes| I
    I -->|calls StopFindService| G

    J -->|tests multithreaded scenarios| G
    J -->|uses| K
    K -->|mocks| H

    F -->|builds| A
    F -->|builds| B
    F -->|builds| C
    F -->|builds| D
    F -->|builds| J
    A -->|uses| E
    B -->|uses| E
    C -->|uses| E
    D -->|uses| E
Loading

This summary was automatically generated by @propel-code-bot

@saksham-chawla saksham-chawla marked this pull request as draft March 18, 2026 09:49
@saksham-chawla saksham-chawla marked this pull request as ready for review March 18, 2026 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants