Skip to content

mla tool fails to parse dcgm_eud.mle with "Failed to find Mle header" #274

@wang-mask

Description

@wang-mask

Environment

  • DCGM Version: 3.3.6
  • GPU: NVIDIA Datacenter GPUs (8-GPU system)

Steps to Reproduce

  1. Run EUD diagnostics:
    sudo dcgmi diag -r eud -p "eud.suite_level=4,eud.passthrough_args='run_tests=compute,memory,hsio'"
  2. Parse the detail log files
  $ sudo /usr/share/nvidia/diagnostic/mla /var/log/nvidia-dcgm/dcgm_eud.mle
  Error populating db! Check /var/log/nvidia-dcgm/dcgm_eud.debug for more information.
  /var/log/nvidia-dcgm/dcgm_eud.mle did not parse correctly.
  Attempting to analyze partial data!
  No content available to write to reportType bgl for file /var/log/nvidia-dcgm/dcgm_eud.mle 
  RC: Failed to find Mle header

  $ cat /var/log/nvidia-dcgm/dcgm_eud.debug
  [<func>                                            ]: MODS LOG ANALYZER DEBUG LOG: Thu Jan 22 19:14:26 2026
  
  [<func>                                            ]: MLA Version: 20.152
  [<func>                                            ]: Processing file '/var/log/nvidia-dcgm/dcgm_eud.mle' and  writing to '/var/log/nvidia-dcgm/dcgm_eud.mle'
  [<func>                                            ]: Error: could not find MLE header information!
  [<func>                                            ]: Log did not parse correctly. MLA will attempt to analyze partial data!

Could you help me identify the cause of this issue or provide the correct method to use and parse the raw EUD logs?
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions