Skip to content

Conversation

@vyadavmsft
Copy link
Collaborator

This is BUG Fix for SerialConsole panic detection to eliminate false positives from harmless boot messages while maintaining strong detection of real kernel panics.

Changes:

  • Tightened panic patterns to require specific markers:

    • 'Kernel panic - not syncing:', 'panic:', 'Oops:'
    • 'soft/hard lockup - CPU# stuck' (requires CPU number)
    • 'rcu_sched self-detected stall' (specific RCU stall marker)
  • Removed overly broad patterns that caused false positives:

    • Generic 'watchdog' (now requires CPU# for lockup detection)
    • Generic 'RIP:' (stack traces are not panics)
    • Generic 'hung task', 'stall' (too broad)
  • Expanded ignore list for harmless boot messages:

    • 'NMI watchdog.*permanently disabled'
    • 'Perf NMI watchdog permanently disabled'
    • Generic watchdog enabled/disabled messages
    • RCU grace period messages (not stalls)
    • hung_task_timeout_secs configuration messages

This eliminates false positives like 'NMI watchdog: Perf NMI watchdog permanently disabled' while still catching real kernel crashes, lockups, and RCU stalls.

Improve SerialConsole panic detection to eliminate false positives from harmless boot messages while maintaining strong detection of real kernel panics.

Changes:

- Tightened panic patterns to require specific markers:

  * 'Kernel panic - not syncing:', 'panic:', 'Oops:'

  * 'soft/hard lockup - CPU#<num> stuck' (requires CPU number)

  * 'rcu_sched self-detected stall' (specific RCU stall marker)

- Removed overly broad patterns that caused false positives:

  * Generic 'watchdog' (now requires CPU# for lockup detection)

  * Generic 'RIP:' (stack traces are not panics)

  * Generic 'hung task', 'stall' (too broad)

- Expanded ignore list for harmless boot messages:

  * 'NMI watchdog.*permanently disabled'

  * 'Perf NMI watchdog permanently disabled'

  * Generic watchdog enabled/disabled messages

  * RCU grace period messages (not stalls)

  * hung_task_timeout_secs configuration messages

This eliminates false positives like 'NMI watchdog: Perf NMI watchdog permanently disabled' while still catching real kernel crashes, lockups, and RCU stalls.
@vyadavmsft vyadavmsft force-pushed the vyadav_fix_panic_handler branch from 8287e04 to c1999db Compare October 24, 2025 17:41
@vyadavmsft vyadavmsft closed this Oct 27, 2025
@vyadavmsft
Copy link
Collaborator Author

Need to work on long term solution with vijay and other. the immediate problematic line is already addressed in other pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants