fix(modules): auto-restart crashed watchers and fix toggle for dead processes#121
Conversation
|
@greptileai review |
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to b31d618 in 8 seconds. Click for details.
- Reviewed
255lines of code in3files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_wmfaEtqVwjM75yA0
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Greptile SummaryThis PR successfully fixes three interrelated bugs in aw-qt's module management system:
The auto-restart implementation is well-designed with proper state management, restart count tracking, and user notification. Manual actions (toggle or dialog restart) correctly reset the auto-restart counter. The comprehensive unit tests cover the critical scenarios including crashed process handling. The code follows Qt's single-threaded event loop model correctly, avoiding race conditions. All state transitions are handled properly, and the 5-second polling interval provides reasonable responsiveness without excessive overhead. Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Timer as QTimer
participant TrayIcon
participant Manager
participant Module
participant User
Note over Timer,Module: Auto-restart flow (every 5s)
Timer->>TrayIcon: check_module_status()
TrayIcon->>Manager: get_unexpected_stops()
Manager-->>TrayIcon: [crashed modules]
alt restart count < 3
TrayIcon->>Module: stop() (cleanup)
TrayIcon->>Module: start(testing)
TrayIcon->>TrayIcon: increment restart_counts[name]
TrayIcon->>User: showMessage (notification)
else restart count >= 3
TrayIcon->>User: show_module_failed_dialog()
TrayIcon->>Module: stop() (cleanup)
end
Timer->>TrayIcon: check_module_status() (5s later)
Note over User,Module: Manual restart flow
User->>TrayIcon: clicks menu toggle / restart button
TrayIcon->>TrayIcon: reset restart_counts[name]
TrayIcon->>Module: toggle(testing)
alt is_alive()
Module->>Module: stop()
else started flag set (crashed)
Module->>Module: stop() (cleanup)
Module->>Module: start()
else not started
Module->>Module: start()
end
Last reviewed commit: b31d618 |
|
|
||
|
|
||
| class TrayIcon(QSystemTrayIcon): | ||
| MAX_AUTO_RESTARTS = 3 |
There was a problem hiding this comment.
What if crashes happen infrequently but consistently? Like for the aw-watcher-macos crashes we've been rarely having (not more than weekly), it could in theory have crashed 3+ times across the uptime of a multi-week session (if no proper reboots or system crashes)
Might be better to use some type of time-based backoff here? Or a "max-tries in a short period" triggering no more tries.
There was a problem hiding this comment.
Good point — fixed in bfeb138.
Changed from an absolute counter to a sliding 10-minute window:
- Tracks timestamps of each restart instead of just a count
- Only counts restarts within the last 10 minutes (
RESTART_WINDOW_SECONDS = 600) - A module that crashes once a week will always be auto-restarted
- A module crash-looping (3+ times in 10 min) gets the dialog
Old timestamps are pruned when new restarts are recorded, so no memory leak. Manual restart/toggle still resets the history.
…rocesses Three bugs fixed: 1. **Module toggle required two clicks for crashed modules**: `toggle()` checked `self.started` instead of `self.is_alive()`, so clicking a crashed module first called `stop()` (just cleaning state), then required a second click to actually start it. 2. **Crashed modules never auto-restarted**: `check_module_status()` ran once at startup, then incorrectly scheduled `rebuild_modules_menu` instead of rescheduling itself. After the first 2-second check, module crashes were never detected again. 3. **Dialog restart button missing testing arg**: `restart_button.clicked.connect(module.start)` didn't pass the `testing` parameter. Changes: - `Module.toggle()` now uses `is_alive()` to decide stop vs start - `check_module_status()` reschedules itself every 5s (was one-shot) - Crashed modules are auto-restarted up to 3 times with tray notifications - After max restarts, shows dialog with manual restart option - Manual toggle/restart resets the auto-restart counter - Added unit tests for toggle behavior with crashed processes Refs: ActivityWatch/aw-watcher-window#101, ActivityWatch#103
Address review feedback: instead of permanently disabling auto-restart after 3 crashes (which could span weeks), use a sliding 10-minute window. A module is only considered "crash-looping" if it crashes 3+ times within 10 minutes. Infrequent crashes (e.g. weekly) will always be auto-restarted. Changes: - Replace _restart_counts (Dict[str, int]) with _restart_timestamps (Dict[str, List[float]]) - Add _recent_restart_count() for sliding window check - Add _record_restart() with old timestamp pruning - RESTART_WINDOW_SECONDS = 600 (10 minutes, configurable)
bfeb138 to
8e8264a
Compare
Updated: time-windowed crash backoffRebased on latest master (picks up CI runner fix from #120) and addressed Erik's review feedback: Change: Replaced absolute restart counter with a sliding 10-minute window. A module that crashes infrequently (e.g. weekly) will always be auto-restarted. Only rapid crash-looping (3+ crashes in 10 minutes) triggers the failure dialog. This handles the aw-watcher-macos scenario Erik described — rare but consistent crashes across multi-week sessions won't hit the limit. |
Summary
Fixes three interrelated bugs in aw-qt's module management that prevented crashed watchers from being properly restarted:
toggle()checkedself.startedflag instead ofself.is_alive(), so clicking a crashed module's menu item first calledstop()(only cleaning state), requiring a second click to actually restart itcheck_module_status()ran once after 2 seconds, then incorrectly scheduledrebuild_modules_menuinstead of itself — so after the initial check, module crashes were never detected againrestart_button.clicked.connect(module.start)didn't pass the requiredtestingparameterChanges
Module.toggle()now checksis_alive()to decide between stop/start, and cleans up state for dead processes before restartingcheck_module_status()now properly reschedules itself every 5 seconds (was a one-shot timer due to scheduling bug)Test plan
Module.toggle()with running, stopped, and crashed processesis_alive()edge casesget_unexpected_stops()detectionRefs: ActivityWatch/aw-watcher-window#101, #103
Important
Fixes module management bugs in aw-qt by improving toggle logic, adding auto-restart, and updating dialog restart functionality.
Module.toggle()now checksis_alive()instead ofself.startedto decide between stop/start, and cleans up state for dead processes before restarting.check_module_status()now reschedules itself every 5 seconds to detect module crashes and auto-restart them up to 3 times, with notifications.Module.toggle()with running, stopped, and crashed processes intest_manager.py.is_alive()edge cases intest_manager.py.get_unexpected_stops()detection intest_manager.py.restart_button.clicked.connect()to pass thetestingparameter intrayicon.py.This description was created by
for b31d618. You can customize this summary. It will automatically update as commits are pushed.