-
Notifications
You must be signed in to change notification settings - Fork 1
AmdSmiPlugin part 1 #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Conversation
| except Exception: | ||
| return None | ||
|
|
||
| def _vu(self, v: object, unit: str, *, required: bool = False) -> ValueUnit | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: may be good to add some more descriptive function names for _vu, _vu_req, _nz.
|
|
||
| return out | ||
|
|
||
| def _smi_try(self, fn, *a, default=None, **kw): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type hints should be added here
|
|
||
| return out | ||
|
|
||
| def _get_soc_pstate(self, h) -> StaticSocPstate | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type hints should be added, and docstring updated to reflect them.
| except ValidationError: | ||
| return None | ||
|
|
||
| def _get_xgmi_plpd(self, h) -> StaticXgmiPlpd | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type hints should be added, and docstring updated to reflect them.
| except ValidationError: | ||
| return None | ||
|
|
||
| def _get_cache_info(self, h) -> list[StaticCacheInfoItem]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type hints should be added, and docstring updated to reflect them. Would also be good to use a more descriptive variable name.
|
|
||
| return out | ||
|
|
||
| def _get_clock(self, h) -> StaticClockData | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type hints should be added, and docstring updated to reflect them.
| """Collect AmdSmi data from system | ||
| Args: | ||
| args (_type_, optional): _description_. Defaults to None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring placeholder needs to be updated.
| self._log_event( | ||
| category=EventCategory.PLATFORM, | ||
| description=f"{key} is not consistent across all GPUs", | ||
| priority=EventPriority.ERROR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is inconsistency here always an error condition? Just wondering if maybe 'warning' would be more appropriate for the priority here
| if args.l0_to_recovery_count_error_threshold is None: | ||
| args.l0_to_recovery_count_error_threshold = self.L0_TO_RECOVERY_COUNT_ERROR_THRESHOLD | ||
| if args.l0_to_recovery_count_warning_threshold is None: | ||
| args.l0_to_recovery_count_warning_threshold = ( | ||
| self.L0_TO_RECOVERY_COUNT_WARNING_THRESHOLD | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be better to have these default values in the analyzer args model itself rather than having them as class variables and setting them manually like this.
| if args.expected_memory_partition_mode or args.expected_compute_partition_mode: | ||
| self.check_expected_memory_partition_mode( | ||
| data.partition, | ||
| args.expected_memory_partition_mode, | ||
| args.expected_compute_partition_mode, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this check does not depend on the 'static' attribute of data, why is it being skipped when static is None?
Functionality supported:
How to run:
It is the users responsibility to have Python API installed: https://rocm.docs.amd.com/projects/amdsmi/en/latest/reference/amdsmi-py-api.html#