Skip to content

Conversation

@akky16
Copy link
Collaborator

@akky16 akky16 commented Jun 17, 2025

-Update apml_manager to call APML APIs to register for uevent from apml alertl driver.

  • APML library provides API to register, monitor and unregister for the alerts on RAS and temperature from apml alertl driver.
  • The alertEventHandler can handle alerts from socket 0 and 1.
  • On alerts, the monitor_ras_alert returns socket & die-id and alertl source information.

@abinayaddhandapani
Copy link
Collaborator

@akky16 Is this verified on 2P systems by injecting error on P1 socket?

@akky16 akky16 force-pushed the ras-app-alertl-integ_sp7 branch from aabb8b8 to f238690 Compare June 18, 2025 09:50
@akky16 akky16 force-pushed the ras-app-alertl-integ_sp7 branch from f238690 to e23abaf Compare June 26, 2025 10:25
@abinayaddhandapani
Copy link
Collaborator

@akky16 There is a new change introduced to support ADDC flow for the ShutdownEvents.
Please find the reference code in #111

This also needs to be taken care by the Alert_L module

@nchatrad nchatrad force-pushed the ras-app-alertl-integ_sp7 branch from e23abaf to c38e16b Compare August 5, 2025 14:50
- Modify configure() to call APML API:
        The configure() function is modified to replace userspace
gpiolib with the APML API "apml_register_udev_monitor()" which
registers for uevents from the Alert_L driver associated with RAS events.

- Create alertSrcHandler() to monitor RAS alerts:
        The alertSrcHandler() function calls the APML API
"monitor_ras_alert()" to monitor RAS alert from the Alert_L driver.
A single instance of alertSrcHandler() can handle alerts from
multiple sockets. Monitoring the Alert_L GPIO and clearing
the RMI Status(0x2) and RMI RASstatus(0x4c) registers on RAS alert
is managed by the APML driver.

These change do not alter the overall structure and CPER creation.
Requesting, Binding and monitoring of GPIOs can be removed as this
functionality is now handled by the Alert_L driver. Only one thread
is required to monitor RAS alerts, regardless of the number of
sockets.

This implementation allows multiple user applications to register
for uevents and get the alert source information simultaneously.

Tested fields - Verified on Morocco system (P0 and P1).
CPER files are generated upon FATAL error injection for the socket.

Reviewed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Sathya Priya Kumar <SathyaPriya.K@amd.com>
Signed-off-by: Akshay Gupta <akshay.gupta@amd.com>
Support receiving interrupt for MCA Shutdown events

On a MCA shutdown error , ALERT_L will be triggered.
Alert_L module checks the RasStatusReg[Bit6] to identify
shutdown event and sends the event to user space.

Upon receiving the event, Ras app follows ths fatal error path to
harvest MCA banks and create CPER record and take recovery action.

Tested fields: Unit test done for shutdown error in Congo.

Signed-off-by: Sathya Priya Kumar <SathyaPriya.K@amd.com>
Signed-off-by: Akshay Gupta <akshay.gupta@amd.com>
Reviewed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
@akky16 akky16 force-pushed the ras-app-alertl-integ_sp7 branch from c38e16b to 597c735 Compare August 13, 2025 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants