Skip to content

dcgmi health check error,but temp is ok #270

@47oo

Description

@47oo

I want to know the reason why temp not high 75°C,but dcgm health log has slowdown?
This is dcgm error message:
ERROR [1514002:1514004] [[Health]] Detected a WARNING in health system Thermal: 'Detected clocks event due to thermal violation in GPU 2. Verify that the cooling on this machine is functional, including external, thermal material interface, fans, and any other components.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions