Skip to content

fix(modules): auto-restart crashed watchers and fix toggle for dead processes#121

Merged
ErikBjare merged 2 commits intoActivityWatch:masterfrom
TimeToBuildBob:bob/fix-module-restart
Mar 2, 2026
Merged

fix(modules): auto-restart crashed watchers and fix toggle for dead processes#121
ErikBjare merged 2 commits intoActivityWatch:masterfrom
TimeToBuildBob:bob/fix-module-restart

Conversation

@TimeToBuildBob
Copy link
Contributor

@TimeToBuildBob TimeToBuildBob commented Feb 27, 2026

Summary

Fixes three interrelated bugs in aw-qt's module management that prevented crashed watchers from being properly restarted:

  • Two-click toggle bug: toggle() checked self.started flag instead of self.is_alive(), so clicking a crashed module's menu item first called stop() (only cleaning state), requiring a second click to actually restart it
  • No auto-restart after startup: check_module_status() ran once after 2 seconds, then incorrectly scheduled rebuild_modules_menu instead of itself — so after the initial check, module crashes were never detected again
  • Dialog restart button broken: restart_button.clicked.connect(module.start) didn't pass the required testing parameter

Changes

  • Module.toggle() now checks is_alive() to decide between stop/start, and cleans up state for dead processes before restarting
  • check_module_status() now properly reschedules itself every 5 seconds (was a one-shot timer due to scheduling bug)
  • Crashed modules are automatically restarted up to 3 times, with system tray notifications
  • After max auto-restarts exceeded, falls back to the existing manual restart dialog
  • Manual toggle or dialog restart resets the auto-restart counter
  • Added unit tests for toggle behavior with crashed processes (8 tests)

Test plan

  • Unit tests for Module.toggle() with running, stopped, and crashed processes
  • Unit tests for is_alive() edge cases
  • Unit tests for get_unexpected_stops() detection
  • Manual test: kill a watcher process, verify aw-qt detects crash and auto-restarts within 5 seconds
  • Manual test: kill a watcher 3+ times rapidly, verify dialog appears after max restarts

Refs: ActivityWatch/aw-watcher-window#101, #103


Important

Fixes module management bugs in aw-qt by improving toggle logic, adding auto-restart, and updating dialog restart functionality.

  • Behavior:
    • Module.toggle() now checks is_alive() instead of self.started to decide between stop/start, and cleans up state for dead processes before restarting.
    • check_module_status() now reschedules itself every 5 seconds to detect module crashes and auto-restart them up to 3 times, with notifications.
    • After max auto-restarts, a manual restart dialog appears, and manual toggle or dialog restart resets the auto-restart counter.
  • Tests:
    • Added unit tests for Module.toggle() with running, stopped, and crashed processes in test_manager.py.
    • Added unit tests for is_alive() edge cases in test_manager.py.
    • Added unit tests for get_unexpected_stops() detection in test_manager.py.
  • Misc:
    • Fixed restart_button.clicked.connect() to pass the testing parameter in trayicon.py.

This description was created by Ellipsis for b31d618. You can customize this summary. It will automatically update as commits are pushed.

@TimeToBuildBob
Copy link
Contributor Author

@greptileai review

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to b31d618 in 8 seconds. Click for details.
  • Reviewed 255 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_wmfaEtqVwjM75yA0

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@greptile-apps
Copy link

greptile-apps bot commented Feb 27, 2026

Greptile Summary

This PR successfully fixes three interrelated bugs in aw-qt's module management system:

  • Two-click toggle bug fixed: Module.toggle() now checks is_alive() instead of the started flag to determine if a module is running. When a crashed module (dead process but started=True) is toggled, it now cleans up state and restarts in a single action instead of requiring two clicks.

  • Timer scheduling bug fixed: check_module_status() now correctly reschedules itself every 5 seconds instead of incorrectly scheduling rebuild_modules_menu. This was a copy-paste error that prevented ongoing crash detection after the initial 2-second check.

  • Restart button parameter bug fixed: The dialog's restart button now correctly passes the testing parameter to module.start() through a wrapper function, preventing a TypeError.

The auto-restart implementation is well-designed with proper state management, restart count tracking, and user notification. Manual actions (toggle or dialog restart) correctly reset the auto-restart counter. The comprehensive unit tests cover the critical scenarios including crashed process handling.

The code follows Qt's single-threaded event loop model correctly, avoiding race conditions. All state transitions are handled properly, and the 5-second polling interval provides reasonable responsiveness without excessive overhead.

Confidence Score: 5/5

  • This PR is safe to merge with no identified issues
  • The implementation correctly fixes all three described bugs with clean, well-tested code. The toggle logic now properly handles crashed processes, the timer scheduling bug is fixed with a simple one-line change, and the restart button parameter issue is resolved. The auto-restart feature is implemented with appropriate safeguards (max attempts, counter reset on manual action, proper state cleanup). Unit tests provide good coverage of edge cases. Qt's single-threaded event model prevents race conditions. No code smells or security issues detected.
  • No files require special attention

Important Files Changed

Filename Overview
aw_qt/manager.py Fixed toggle logic to check process liveness instead of flag, enabling single-click restart of crashed modules
aw_qt/trayicon.py Added auto-restart with counter tracking, fixed timer scheduling bug, corrected restart button parameter passing, and reset counter on manual actions
tests/test_manager.py Added comprehensive unit tests for toggle behavior with crashed processes, is_alive edge cases, and unexpected stop detection

Sequence Diagram

sequenceDiagram
    participant Timer as QTimer
    participant TrayIcon
    participant Manager
    participant Module
    participant User

    Note over Timer,Module: Auto-restart flow (every 5s)
    Timer->>TrayIcon: check_module_status()
    TrayIcon->>Manager: get_unexpected_stops()
    Manager-->>TrayIcon: [crashed modules]
    
    alt restart count < 3
        TrayIcon->>Module: stop() (cleanup)
        TrayIcon->>Module: start(testing)
        TrayIcon->>TrayIcon: increment restart_counts[name]
        TrayIcon->>User: showMessage (notification)
    else restart count >= 3
        TrayIcon->>User: show_module_failed_dialog()
        TrayIcon->>Module: stop() (cleanup)
    end
    
    Timer->>TrayIcon: check_module_status() (5s later)
    
    Note over User,Module: Manual restart flow
    User->>TrayIcon: clicks menu toggle / restart button
    TrayIcon->>TrayIcon: reset restart_counts[name]
    TrayIcon->>Module: toggle(testing)
    alt is_alive()
        Module->>Module: stop()
    else started flag set (crashed)
        Module->>Module: stop() (cleanup)
        Module->>Module: start()
    else not started
        Module->>Module: start()
    end
Loading

Last reviewed commit: b31d618

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile



class TrayIcon(QSystemTrayIcon):
MAX_AUTO_RESTARTS = 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if crashes happen infrequently but consistently? Like for the aw-watcher-macos crashes we've been rarely having (not more than weekly), it could in theory have crashed 3+ times across the uptime of a multi-week session (if no proper reboots or system crashes)

Might be better to use some type of time-based backoff here? Or a "max-tries in a short period" triggering no more tries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — fixed in bfeb138.

Changed from an absolute counter to a sliding 10-minute window:

  • Tracks timestamps of each restart instead of just a count
  • Only counts restarts within the last 10 minutes (RESTART_WINDOW_SECONDS = 600)
  • A module that crashes once a week will always be auto-restarted
  • A module crash-looping (3+ times in 10 min) gets the dialog

Old timestamps are pruned when new restarts are recorded, so no memory leak. Manual restart/toggle still resets the history.

…rocesses

Three bugs fixed:

1. **Module toggle required two clicks for crashed modules**: `toggle()` checked
   `self.started` instead of `self.is_alive()`, so clicking a crashed module
   first called `stop()` (just cleaning state), then required a second click
   to actually start it.

2. **Crashed modules never auto-restarted**: `check_module_status()` ran once
   at startup, then incorrectly scheduled `rebuild_modules_menu` instead of
   rescheduling itself. After the first 2-second check, module crashes were
   never detected again.

3. **Dialog restart button missing testing arg**: `restart_button.clicked.connect(module.start)`
   didn't pass the `testing` parameter.

Changes:
- `Module.toggle()` now uses `is_alive()` to decide stop vs start
- `check_module_status()` reschedules itself every 5s (was one-shot)
- Crashed modules are auto-restarted up to 3 times with tray notifications
- After max restarts, shows dialog with manual restart option
- Manual toggle/restart resets the auto-restart counter
- Added unit tests for toggle behavior with crashed processes

Refs: ActivityWatch/aw-watcher-window#101, ActivityWatch#103
Address review feedback: instead of permanently disabling auto-restart
after 3 crashes (which could span weeks), use a sliding 10-minute
window. A module is only considered "crash-looping" if it crashes 3+
times within 10 minutes. Infrequent crashes (e.g. weekly) will always
be auto-restarted.

Changes:
- Replace _restart_counts (Dict[str, int]) with
  _restart_timestamps (Dict[str, List[float]])
- Add _recent_restart_count() for sliding window check
- Add _record_restart() with old timestamp pruning
- RESTART_WINDOW_SECONDS = 600 (10 minutes, configurable)
@TimeToBuildBob TimeToBuildBob force-pushed the bob/fix-module-restart branch from bfeb138 to 8e8264a Compare March 1, 2026 10:17
@TimeToBuildBob
Copy link
Contributor Author

Updated: time-windowed crash backoff

Rebased on latest master (picks up CI runner fix from #120) and addressed Erik's review feedback:

Change: Replaced absolute restart counter with a sliding 10-minute window. A module that crashes infrequently (e.g. weekly) will always be auto-restarted. Only rapid crash-looping (3+ crashes in 10 minutes) triggers the failure dialog.

This handles the aw-watcher-macos scenario Erik described — rare but consistent crashes across multi-week sessions won't hit the limit.

@ErikBjare ErikBjare merged commit f9a6653 into ActivityWatch:master Mar 2, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants