-
Notifications
You must be signed in to change notification settings - Fork 6
test: Added RDMA validation script for waagent, ibverbs tools, and Azure persistent naming #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,105 @@ | ||
| #!/usr/bin/env bash | ||
| # These are templates, not actual shell scripts, so tell shellcheck to | ||
| # ignore the templated parts | ||
| # shellcheck disable=all | ||
| {{ ansible_managed | comment }} | ||
| {{ "system_role:hpc" | comment(prefix="", postfix="") }} | ||
| # shellcheck enable=all | ||
| # SPDX-License-Identifier: MIT | ||
| # | ||
| # RDMA Validation Script | ||
| # Usage: test-rdma.sh | ||
| # | ||
|
|
||
| # This is test code, and some operations are expected to fail. Hence we can't | ||
| # use set -e to automatically exit the script if something fails. | ||
| set -u | ||
|
|
||
| fail() | ||
| { | ||
| echo Failed: "$1" | ||
| exit 1 | ||
| } | ||
|
|
||
| require_file() { | ||
| local path="$1" | ||
| [[ -e "$path" ]] || fail "missing file: $path" | ||
| } | ||
|
|
||
| require_executable() { | ||
| local path="$1" | ||
| [[ -x "$path" ]] || fail "not executable: $path" | ||
| } | ||
|
|
||
| require_cmd() { | ||
| local cmd="$1" | ||
| command -v "$cmd" >/dev/null 2>&1 || fail "missing command in PATH: $cmd" | ||
| } | ||
|
|
||
| sys_vendor() { | ||
| if [[ -r /sys/class/dmi/id/sys_vendor ]]; then | ||
| cat /sys/class/dmi/id/sys_vendor | ||
| else | ||
| echo "" | ||
| fi | ||
| } | ||
|
|
||
| is_systemd() { | ||
| [[ "$(ps -p 1 -o comm= 2>/dev/null || true)" == "systemd" ]] | ||
| } | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to check whether it is systemd environment even we are running in RHEL-9.6 systemd environment?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it doesn't hurt, and it "future proofs" this code. |
||
|
|
||
| main() { | ||
| echo | ||
| echo "Testing waagent RDMA flag" | ||
| require_file /etc/waagent.conf | ||
| grep -Fxq "OS.EnableRDMA=y" /etc/waagent.conf || fail "expected 'OS.EnableRDMA=y' in /etc/waagent.conf" | ||
| echo Test Passed: "waagent RDMA flag is set" | ||
|
|
||
| echo | ||
| echo "Testing RDMA userland tools" | ||
| require_cmd ibv_devinfo | ||
| echo Test Passed: "RDMA tools are present (ibv_devinfo)" | ||
|
|
||
| # Azure persistent RDMA naming artifacts/services (Azure only) | ||
| if [ "$(sys_vendor)" != "Microsoft Corporation" ]; then | ||
| echo | ||
| echo "Testing Azure persistent RDMA naming (skip: not Azure)" | ||
| echo Test Passed: "not running on Azure; Azure persistent RDMA naming checks skipped" | ||
| return 0 | ||
| fi | ||
|
|
||
| if ! is_systemd; then | ||
| echo | ||
| echo "Testing Azure persistent RDMA naming (skip: not systemd)" | ||
| echo Test Passed: "not running systemd; systemd unit checks skipped" | ||
| return 0 | ||
| fi | ||
|
|
||
| echo | ||
| echo "Testing Azure persistent RDMA naming artifacts" | ||
| require_executable /usr/sbin/azure_persistent_rdma_naming.sh | ||
| require_executable /usr/sbin/azure_persistent_rdma_naming_monitor.sh | ||
| require_file /etc/systemd/system/azure_persistent_rdma_naming.service | ||
| require_file /etc/systemd/system/azure_persistent_rdma_naming_monitor.service | ||
| require_file /etc/udev/rules.d/99-azure-persistent-rdma-naming.rules | ||
| echo Test Passed: "Azure persistent RDMA naming artifacts exist" | ||
|
|
||
| echo | ||
| echo "Testing Azure persistent RDMA naming services" | ||
| require_cmd systemctl | ||
| systemctl is-enabled azure_persistent_rdma_naming.service >/dev/null 2>&1 || fail "azure_persistent_rdma_naming.service not enabled" | ||
| systemctl is-enabled azure_persistent_rdma_naming_monitor.service >/dev/null 2>&1 || fail "azure_persistent_rdma_naming_monitor.service not enabled" | ||
|
|
||
| # azure_persistent_rdma_naming.service is Type=oneshot, so it may not remain | ||
| # "active" after it runs. Treat "failed" as an error; other states are OK. | ||
| if [ "$(systemctl is-failed azure_persistent_rdma_naming.service 2>/dev/null || true)" = "failed" ]; then | ||
| fail "azure_persistent_rdma_naming.service is in failed state" | ||
| fi | ||
|
|
||
| # Monitor service should be continuously running. | ||
| systemctl is-active azure_persistent_rdma_naming_monitor.service >/dev/null 2>&1 || fail "azure_persistent_rdma_naming_monitor.service not active" | ||
| echo Test Passed: "Azure persistent RDMA naming services look healthy" | ||
| } | ||
|
|
||
| main "$@" | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing is templated in this file except for ansible_managed and "system_role:hpc" fingerprint. This is fine, but is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spetrosi It is only done due to maintain standard as previously as well tests are kept in template form.