Skip to content

Conversation

@heywji
Copy link
Contributor

@heywji heywji commented Dec 9, 2025

When guest triggers panic and QEMU exits quickly (especially on RHEL10), the test may fail with MonitorSocketError due to socket connection being reset while checking QMP events.

ID: 1863
Signed-off-by: Wenkang Ji wji@redhat.com

Summary by CodeRabbit

  • Bug Fixes

    • Improved resilience of event monitoring by adding graceful error handling for closed QMP connections with appropriate null responses.
  • Refactor

    • Optimized event detection mechanism for better reliability and robustness.

✏️ Tip: You can customize this high-level summary in your review settings.

When guest triggers panic and QEMU exits quickly (especially on RHEL10),
the test may fail with MonitorSocketError due to socket connection being
reset while checking QMP events.

Signed-off-by: Wenkang Ji <wji@redhat.com>
@coderabbitai
Copy link

coderabbitai bot commented Dec 9, 2025

Walkthrough

This change modifies an event detection function in qemu/tests/pvpanic.py to replace direct polling of a named QMP event with iteration through a list of events returned by get_events(). Error handling has been added to catch MonitorSocketError exceptions, log a warning, and return None instead of propagating the error. This allows the function to gracefully handle closed QMP connections.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Logic refactoring verification: The shift from direct get_event(name) polling to iterating through get_events() requires verification that the new approach produces equivalent behavior and correctly identifies target events.
  • Error handling scope: Confirm that MonitorSocketError handling is appropriate and that returning None on socket errors aligns with expected control flow and caller expectations.
  • Edge cases: Review whether the iteration logic properly handles empty event lists, malformed events, or other corner cases that may arise from closed connections.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and clearly summarizes the main change: fixing a race condition in QMP event checking in the pvpanic test.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
qemu/tests/pvpanic.py (1)

54-67: Move MonitorSocketError handling outside _do_check() to avoid timeout waste and log spam

When MonitorSocketError is caught inside _do_check() and returns None (falsy), wait_for() continues polling until the full timeout expires, logging "QMP connection closed" on every iteration. Since wait_for() propagates exceptions from its callback and stops only on a truthy return, this turns a terminal condition into a slow, noisy timeout.

Instead:

  • Simplify _do_check() to a clean "event present?" predicate returning only True/False
  • Wrap the wait_for() call in try/except MonitorSocketError to log once and return None immediately

This preserves the external behavior (returns True on success, None otherwise) without wasting the full timeout when QMP closes:

     def _do_check(vm, event_names):
-        try:
-            events = vm.monitor.get_events()
-            for event in events:
-                if event.get("event") in event_names:
-                    LOG_JOB.info(
-                        "Receive qmp %s event notification", event.get("event")
-                    )
-                    vm.monitor.clear_event(event.get("event"))
-                    return True
-        except MonitorSocketError:
-            LOG_JOB.warning("QMP connection closed")
-            return None
+        events = vm.monitor.get_events()
+        for event in events:
+            if event.get("event") in event_names:
+                LOG_JOB.info(
+                    "Receive qmp %s event notification", event.get("event")
+                )
+                vm.monitor.clear_event(event.get("event"))
+                return True
         return False
 
-    LOG_JOB.info("Try to get qmp events %s in %s seconds!", event_names, timeout)
-    return wait_for(lambda: _do_check(vm, event_names), timeout, 5, 5)
+    LOG_JOB.info("Try to get qmp events %s in %s seconds!", event_names, timeout)
+    try:
+        return wait_for(lambda: _do_check(vm, event_names), timeout, 5, 5)
+    except MonitorSocketError:
+        LOG_JOB.warning("QMP connection closed while waiting for events %s", event_names)
+        return None
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f03b470 and 239916d.

📒 Files selected for processing (1)
  • qemu/tests/pvpanic.py (2 hunks)
🔇 Additional comments (1)
qemu/tests/pvpanic.py (1)

7-7: Import of MonitorSocketError looks correct

The new import cleanly scopes the QMP error handling to this module and matches its later usage in check_qmp_events(). No issues here.

@heywji
Copy link
Contributor Author

heywji commented Dec 9, 2025

Test Result: PASS

 (1/2) Host_RHEL.m10.u2.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.2.x86_64.io-github-autotest-qemu.pvpanic_basic.nmi.q35:  PASS (61.82 s)
 (2/2) Host_RHEL.m10.u2.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.8.0.x86_64.io-github-autotest-qemu.pvpanic_basic.nmi.q35: STARTED
 (2/2) Host_RHEL.m10.u2.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.8.0.x86_64.io-github-autotest-qemu.pvpanic_basic.nmi.q35:  PASS (54.19 s)
RESULTS    : PASS 2 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

@heywji
Copy link
Contributor Author

heywji commented Dec 9, 2025

@leidwang Hi Leidong, could you please help review this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant