hmon: approach to monitors rework#81
Conversation
License Check Results🚀 The license check job ran with the Bazel command: bazel run //:license-checkStatus: Click to expand output |
|
The created documentation from the pull request is available at: docu-html |
fda115c to
9e958f1
Compare
9e958f1 to
88e93ee
Compare
88e93ee to
f08a948
Compare
There was a problem hiding this comment.
Pull request overview
This pull request refactors the health monitoring library to remove unnecessary abstractions, improve type safety, and better organize the codebase. The changes introduce separate MonitorTag and DeadlineTag types to replace the generic IdentTag, eliminate the MonitorEvaluator trait in favor of direct usage of Arc<DeadlineMonitorInner>, and reorganize the supervisor API client code into a dedicated module with proper feature flags.
Changes:
- Introduced
MonitorTagandDeadlineTagtypes for better type safety and clarity - Removed
MonitorEvaluatortrait abstraction andMonitorEvalHandlewrapper - Moved supervisor API client implementations to dedicated module with mutually exclusive feature flags
- Improved documentation throughout the codebase
- Fixed Cargo build configuration with proper feature flag management
Reviewed changes
Copilot reviewed 26 out of 28 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/health_monitoring_lib/rust/tag.rs | New module implementing MonitorTag and DeadlineTag with FFI-safe memory management |
| src/health_monitoring_lib/rust/worker.rs | Updated to use DeadlineMonitorInner directly and import SupervisorAPIClient from new module |
| src/health_monitoring_lib/rust/supervisor_api_client/*.rs | New module structure with feature-gated implementations |
| src/health_monitoring_lib/rust/lib.rs | Refactored monitor state management using MonitorState enum |
| src/health_monitoring_lib/rust/ffi.rs | Updated FFI functions to use new tag types |
| src/health_monitoring_lib/rust/deadline/*.rs | Updated to use new tag types and expose DeadlineMonitorInner |
| src/health_monitoring_lib/rust/common.rs | Removed IdentTag, MonitorEvaluator, and related types |
| src/health_monitoring_lib/cpp/include/score/hm/tag.h | New C++ tag types for FFI compatibility |
| src/health_monitoring_lib/cpp/include/score/hm/*.h | Updated to use new tag types |
| src/health_monitoring_lib/cpp/*.cpp | Updated implementations to use new tag types |
| src/health_monitoring_lib/Cargo.toml | Added feature flags for supervisor API client selection |
| Cargo.toml | Set default-members and removed global monitor_rs dependency |
| src/health_monitoring_lib/BUILD | Added crate_features configuration |
| .github/workflows/lint_clippy.yml | Updated to test both feature flag configurations |
| examples/* | Updated to use new tag types |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f08a948 to
df8135d
Compare
df8135d to
08e6e6d
Compare
08e6e6d to
6afc701
Compare
6afc701 to
94920a7
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Remove unnecessary abstraction from monitors. - Improve docs. - Move `FFIBorrowed` and `FFIHandle` to main `ffi` module. - Small esthetics fixes.
- Restore some parts. - Separate errors into groups.
- Add `HealthMonitorError`. - Not fully utilized in this change, required for future monitors. - Simplify implementation of `start` and `build`. - No longer must be reimplemented for FFI. - New unit tests for `lib.rs`.
aa819a8 to
d00cc2e
Compare
pawelrutkaq
left a comment
There was a problem hiding this comment.
Cpp aligment in next PR ?
3aefb18 to
c313b59
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/health_monitoring_lib/rust/deadline/deadline_monitor.rs:276
- This
debug_assert!(snapshot.is_stopped(), "Deadline snapshot cannot be both running and stopped")is internally inconsistent:DeadlineStateSnapshot::set_running()does not clear the stopped bit (tests even assert both can be true), so the message is misleading and the assertion doesn't validate what it claims. Either fix the state flags to makerunning/stoppedmutually exclusive, or update/remove this assertion/message to match the intended semantics.
debug_assert!(
snapshot.is_stopped(),
"Deadline snapshot cannot be both running and stopped"
);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Rename trait. - Fix docs. - Fix some behaviors.
c313b59 to
0eca8fb
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/health_monitoring_lib/rust/deadline/deadline_monitor.rs:276
- The
debug_assert!message says a snapshot "cannot be both running and stopped", butDeadlineStateSnapshotintentionally hasis_stopped()remain true even whenset_running()is called (see deadline_state tests). Updating the assertion/message to reflect the actual state semantics would avoid confusion when debugging.
debug_assert!(
snapshot.is_stopped(),
"Deadline snapshot cannot be both running and stopped"
);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for (tag, monitor) in monitors_to_collect.iter_mut() { | ||
| match monitor.take() { | ||
| Some(DeadlineMonitorState::Taken(handle)) => { | ||
| if monitors.push(handle).is_err() { | ||
| // Should not fail since we preallocated enough capacity | ||
| return Err("Failed to push monitor handle".to_string()); | ||
| Some(MonitorState::Taken(handle)) => { | ||
| if collected_monitors.push(handle).is_err() { | ||
| // Should not fail - capacity was preallocated. | ||
| error!("Failed to push monitor handle."); | ||
| return Err(HealthMonitorError::WrongState); | ||
| } | ||
| }, |
There was a problem hiding this comment.
collect_given_monitors mutates the map with take() as it iterates. If there are multiple monitors and the function returns early (e.g., due to an Available(_) entry), any previously-seen Taken(handle) entries will have been removed and replaced with None, making subsequent retries/start attempts fail with an invalid state. Consider a two-pass approach (first validate all entries are Taken without mutating, then collect) or otherwise ensure state is restored on early-return paths.
| let supervisor_api_cycle_ms = self.supervisor_api_cycle.as_millis() as u64; | ||
| let internal_processing_cycle_ms = self.internal_processing_cycle.as_millis() as u64; | ||
| if !supervisor_api_cycle_ms.is_multiple_of(internal_processing_cycle_ms) { |
There was a problem hiding this comment.
Cycle validation casts Duration::as_millis() (u128) to u64, which can truncate for large durations and lead to incorrect validation. Consider keeping the values as u128 (or explicitly validating the cast) to avoid silent overflow/truncation.
| let supervisor_api_cycle_ms = self.supervisor_api_cycle.as_millis() as u64; | |
| let internal_processing_cycle_ms = self.internal_processing_cycle.as_millis() as u64; | |
| if !supervisor_api_cycle_ms.is_multiple_of(internal_processing_cycle_ms) { | |
| let supervisor_api_cycle_ms = self.supervisor_api_cycle.as_millis(); | |
| let internal_processing_cycle_ms = self.internal_processing_cycle.as_millis(); | |
| if supervisor_api_cycle_ms % internal_processing_cycle_ms != 0 { |
Uh oh!
There was an error while loading. Please reload this page.