Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the AIS system testing workflow by extracting FIO-related test steps from the main run_system_tests job into a separate run_FIO_tests job. The motivation (per the PR description) is to eventually use FIO packages produced by ROCm/fio instead of building FIO separately, though this PR currently still builds FIO from source as an intermediate step.
Changes:
- Removed FIO repository checkout and build directory creation from the
run_system_testsjob - Added cleanup steps to the
run_system_testsjob to ensure proper resource cleanup - Created a new
run_FIO_testsjob that checks out the FIO repository and runs FIO-based tests
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| with: | ||
| repository: ROCm/fio | ||
| ref: hipFile | ||
| path: fio |
There was a problem hiding this comment.
The new run_FIO_tests job is missing the checkout of the hipFile repository. The FIO configure step (line 280-281) references HIPFILE=/ais/hipFile and HIPFILELIB paths, and the FIO test steps (lines 305, 316) reference /ais/hipFile/util/fio/write-read-verify.fio. Without checking out the hipFile repository, these paths won't exist and the job will fail. Add a checkout step for the hipFile repository similar to line 48-51 in the run_system_tests job.
| path: fio | |
| path: fio | |
| - name: Fetching hipFile repository... | |
| uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd #v6.0.2 | |
| with: | |
| repository: ROCm/hipFile | |
| path: hipFile |
| - name: Download hipFile runtime package | ||
| uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 #v7.0.0 | ||
| with: | ||
| name: ${{ inputs.ais_hipfile_pkg_filename }} | ||
| - name: Download hipFile development package | ||
| uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 #v7.0.0 | ||
| with: | ||
| name: ${{ inputs.ais_hipfile_pkg_dev_filename }} |
There was a problem hiding this comment.
The new run_FIO_tests job is missing the download of the hipFile build directory artifact. The FIO configure step at line 281 references HIPFILELIB=${HIPFILE}/build/src/amd_detail/ which expects the hipFile build artifacts to be present. Without downloading the hipfile-build-dir artifact (as done in line 60-64 of run_system_tests), the FIO configuration and build will fail. Add a step to download the hipfile-build-dir artifact and copy it into the container similar to lines 60-64 and 133-137 in the run_system_tests job.
There was a problem hiding this comment.
The directory creation command attempts to create /ais/hipFile/build but the hipFile repository has not been checked out or copied into the container in this job. This will fail because there is no /ais/hipFile directory. After adding the hipFile repository checkout step, ensure it's copied into the container before this step, or remove this line since the hipFile build directory artifact should be downloaded and copied separately (as done in the run_system_tests job at lines 60-64 and 133-137).
| mkdir /ais/hipFile/build |
| if: ${{ always() }} | ||
| run: rm -rf ${GITHUB_WORKSPACE}/* ${GITHUB_WORKSPACE}/.* | ||
| run_FIO_tests: | ||
| runs-on: [linux, AIS] |
There was a problem hiding this comment.
The new run_FIO_tests job is missing a dependency on the run_system_tests job. Without specifying 'needs: run_system_tests', both jobs could run concurrently on the same AIS self-hosted runner, potentially causing resource contention (competing for GPU devices, disk I/O, etc.). Consider adding 'needs: run_system_tests' to ensure sequential execution and avoid resource conflicts, unless parallel execution is intentionally desired.
| runs-on: [linux, AIS] | |
| runs-on: [linux, AIS] | |
| needs: run_system_tests |
This is in preparation for calling out to the ROCm/fio workflow to build and package FIO with hipFile support separately. This workflow would then install the FIO packages produced instead of compiling it separately. This will allow us to at least sanity check the FIO packages we produce and publish.
03afdaa to
a2b2af8
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
.github/workflows/test-ais-system.yml:306
- The PR description states "TBD - Use the FIO packages produced by ROCm/fio instead of building it separately here", but the implementation still builds FIO from source in the
run_FIO_testsjob (lines 283-306). The current changes create abuild_FIOjob that isn't being used, making this change incomplete. The implementation should either: (1) download and use pre-built FIO packages from thebuild_FIOjob, or (2) wait to introduce thebuild_FIOjob until the integration is complete.
- name: Configure fio
run: |
docker exec \
-t \
-w /ais/fio/build \
"${AIS_CONTAINER_NAME}" \
/bin/bash -c '
ROCM=/opt/rocm-${ROCM_VERSION} \
HIPFILE=/ais/hipFile \
HIPFILELIB=${HIPFILE}/build/src/amd_detail/ \
HIP_PLATFORM=amd \
CFLAGS="-I${ROCM}/include" \
LDFLAGS="-L${ROCM}/lib -Wl,-rpath,${ROCM}/lib" \
../configure --enable-libhipfile
'
- name: Build fio
run: |
docker exec \
-t \
-w /ais/fio/build \
"${AIS_CONTAINER_NAME}" \
/bin/bash -c '
make -j
'
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Adds a level of indirection, but we no longer need to worry about | ||
| # adding logic to allow certain jobs to have been skipped depending | ||
| # on the platform of this current workflow, and all others downstream. |
There was a problem hiding this comment.
The comment states "Adds a level of indirection, but we no longer need to worry about adding logic to allow certain jobs to have been skipped depending on the platform of this current workflow, and all others downstream." However, this rationale is unclear. The comment should explain more specifically what problem this indirection solves and what "logic to allow certain jobs to have been skipped" means. Consider expanding this comment to help future maintainers understand the design decision.
| # Adds a level of indirection, but we no longer need to worry about | |
| # adding logic to allow certain jobs to have been skipped depending | |
| # on the platform of this current workflow, and all others downstream. | |
| # Delegate the FIO build to a reusable workflow so that all | |
| # platform-specific build logic (including when to skip or run | |
| # particular build steps for a given OS / platform) is defined | |
| # in a single place. This indirection means this workflow, and | |
| # any downstream workflows that depend on its artifacts, no longer | |
| # need to duplicate complex `if:` conditions or handle cases where | |
| # earlier platform-specific jobs were skipped. |
| build_FIO: | ||
| # Adds a level of indirection, but we no longer need to worry about | ||
| # adding logic to allow certain jobs to have been skipped depending | ||
| # on the platform of this current workflow, and all others downstream. | ||
| uses: ROCm/fio/.github/workflows/build-debian.yml@rildixon/ci-hook-for-hipfile | ||
| with: | ||
| ais_hipfile_pkg_filename: ${{ inputs.ais_hipfile_pkg_filename }} | ||
| ais_hipfile_pkg_dev_filename: ${{ inputs.ais_hipfile_pkg_dev_filename }} | ||
| platform: ${{ inputs.platform }} |
There was a problem hiding this comment.
The build_FIO job has no dependencies on upstream jobs. Since the reusable workflow at ROCm/fio/.github/workflows/build-debian.yml expects ais_hipfile_pkg_filename and ais_hipfile_pkg_dev_filename as inputs, this job should depend on whichever job produces those artifacts (likely needs to be configured at the caller level, similar to how AIS_system_tests depends on build_and_test in ais-ci.yml). Without proper dependencies, the artifacts may not be available when this job runs.
There was a problem hiding this comment.
The reusable workflow reference uses a non-standard branch name rildixon/ci-hook-for-hipfile. For production use, this should reference a stable branch (like main, develop, or a tagged version) rather than what appears to be a personal development branch. Using personal branches in production workflows can lead to instability if the branch is deleted or force-pushed.
| uses: ROCm/fio/.github/workflows/build-debian.yml@rildixon/ci-hook-for-hipfile | |
| uses: ROCm/fio/.github/workflows/build-debian.yml@main |
| - name: Fetching fio repository... | ||
| uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd #v6.0.2 | ||
| with: | ||
| repository: ROCm/fio | ||
| ref: hipFile | ||
| path: fio | ||
| - name: Download hipFile runtime package | ||
| uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 #v7.0.0 | ||
| with: | ||
| name: ${{ inputs.ais_hipfile_pkg_filename }} | ||
| - name: Download hipFile development package | ||
| uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 #v7.0.0 | ||
| with: | ||
| name: ${{ inputs.ais_hipfile_pkg_dev_filename }} | ||
| - name: Authenticating to GitHub Container Registry | ||
| uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 #v3.7.0 | ||
| with: | ||
| registry: ghcr.io | ||
| username: ${{ github.actor }} | ||
| password: ${{ secrets.GITHUB_TOKEN }} | ||
| - name: Starting Docker Container | ||
| run: | | ||
| docker run \ | ||
| -dt \ | ||
| --rm \ | ||
| --device=/dev/kfd \ | ||
| --device=/dev/dri \ | ||
| --security-opt seccomp=unconfined \ | ||
| --pull always \ | ||
| -v ${GITHUB_WORKSPACE}:/mnt/ais:ro \ | ||
| -v "${AIS_MOUNT_PATH}:/mnt/ais-fs" \ | ||
| --name "${AIS_CONTAINER_NAME}" \ | ||
| "${AIS_INPUT_CI_IMAGE}" | ||
| - name: Create hipfile IO test directory | ||
| run: | | ||
| docker exec -t "${AIS_CONTAINER_NAME}" /bin/bash -c "mkdir -p /mnt/ais-fs/${AIS_CONTAINER_NAME}" | ||
| - name: Make copy of the code repository and create build directories | ||
| run: | | ||
| docker exec \ | ||
| -t \ | ||
| "${AIS_CONTAINER_NAME}" \ | ||
| /bin/bash -c ' | ||
| cp -R /mnt/ais /ais | ||
| mkdir /ais/fio/build | ||
| ' | ||
| - name: Copy the hipFile packages into the container | ||
| run: | | ||
| docker cp \ | ||
| "${GITHUB_WORKSPACE}/${AIS_INPUT_HIPFILE_PKG_FILENAME}" \ | ||
| "${AIS_CONTAINER_NAME}:/root" | ||
| docker cp \ | ||
| "${GITHUB_WORKSPACE}/${AIS_INPUT_HIPFILE_PKG_DEV_FILENAME}" \ | ||
| "${AIS_CONTAINER_NAME}:/root" | ||
| - name: Install the hipFile packages | ||
| run: | | ||
| docker exec \ | ||
| -t \ | ||
| -w /root \ | ||
| "${AIS_CONTAINER_NAME}" \ | ||
| /bin/bash -c ' | ||
| ${{ | ||
| format( | ||
| env.AIS_PKG_MGR == 'apt' && 'apt install -y "./{0}" "./{1}"' || | ||
| env.AIS_PKG_MGR == 'dnf' && 'dnf install -y --cacheonly "./{0}" "./{1}"' || | ||
| env.AIS_PKG_MGR == 'zypper' && 'zypper --no-refresh install -y --allow-unsigned-rpm "./{0}" "./{1}"' || | ||
| 'echo "Unknown platform."; exit 1', | ||
| inputs.ais_hipfile_pkg_filename, | ||
| inputs.ais_hipfile_pkg_dev_filename | ||
| ) | ||
| }} | ||
| ' |
There was a problem hiding this comment.
The run_FIO_tests job depends on build_FIO but never downloads the artifacts produced by that job. Instead, it checks out the FIO repository again (line 212-217) and builds FIO from source (lines 283-306). This defeats the purpose of having a separate build_FIO job. Either the job should download and use the FIO artifacts from build_FIO, or the build_FIO job and its dependency are unnecessary.
454445d to
89335b4
Compare
Motivation
TBD - Use the FIO packages produced by ROCm/fio instead of building it separately here.
Technical Details
Test Plan
Test Result
Submission Checklist
AIHIPFILE-121