-
Notifications
You must be signed in to change notification settings - Fork 32
Enable the build and test workflow to push images to non-Pelican repos #3022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation. |
|
@jhiemstrawisc I would suggest reviewing this PR commit-by-commit, because the commit messages contain more information about what I had in mind for each of the files being touched here. If your eyes glaze over, I'd focus on As I write this comment, the history at https://github.com/brianaydemir/pelicanplatform-pelican/actions has the record of me dogfooding this PR, so to speak. |
Ironically, this PR isn't structured like that. But perhaps we can take this opportunity to demonstrate an alternative workflow: at reviewer discretion, they can accept the workflow runs from the fork on which the PR is based. There's also a lesson to be learned here: Take care to structure commits so that there's one that updates dependencies, container image definitions, and the like; and a second that changes code as needed. (I accidentally squashed too many commits together during development….) |
jhiemstrawisc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this feels pretty solid, most of my comments are minor nits, questions, or additional threads I hope to tempt you on pulling 😉
Can you also document the process for developers to trigger these tests in our CONTRIBUTE.md so its description lives somewhere other than this PR?
This isn't yet a complete review because I haven't yet tested the workflow. I hope to do that shortly.
|
|
||
| // Set XRD_PLUGINCONFDIR so XRootD can load the pelican protocol handler | ||
| // This allows the cache to handle pelican:// URLs | ||
| t.Setenv("XRD_PLUGINCONFDIR", "/Users/bbockelm/projects/xrdcl-curl/build/release_dir/etc/xrootd/client.plugins.d/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to leave breadcrumbs wherever bad AI code was merged, especially in cases where I think a reviewer really should have caught it. It looks like this was merged without a second human review in #2900.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just an observation, or a request for me to leave a comment somewhere? If the latter, I'm not sure where you had in mind.
go.mod
Outdated
|
|
||
| go 1.25.0 | ||
|
|
||
| toolchain go1.25.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going through commit-by-commit like you suggested, so maybe I'll see the answer to this later, but why is pinning the toolchain to this specific version needed?
I ask because it looks like BrianB recently made the explicit choice to remove the toolchain definition. See:
2f798fa#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6R6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I'd like to avoid setting toolchain unless necessary (unless I'm misunderstanding the point of the toolchain line?).
In my understanding, toolchain pins a quite specific version of the toolchain (as in, "you shall use 1.25.5, not 1.25.4 or the latest 1.25.x patch release"). Unless there's a reason why we don't want to pick up things like security fixes in patch releases, I think it's better to keep the line out.
Dug a little in GitHub history and it seems like #2247 introduced pinning the toolchain. It was labelled as a dependabot roll-up ... but it might have been inadvertent?
(Again, this is just based on my understanding of the toolchain line; let me know if I got it wrong)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the goal here is to have one location that defines the Go version to use, and go.mod seems as good a place as any because it already expresses an opinion on that.
I think the actual problem is that unless Go and/or GoReleaser automatically update to the latest version of the toolchain, even when the currently installed version already satisfies any stated requirements (e.g., the go line in go.mod), we've effectively been pinning the toolchain version in images/Dockerfile since at least #1877.
That said, there's an API that can be used to look up what the latest version is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, not 100% sure what Go's behavior is here. One of the nifty (scary?) things introduced last year is the ability to have the runtime and compiler be independent -- go version N compiler can compile code for N+2, for example.
So, despite the CI/CD using 1.25.5 at https://github.com/PelicanPlatform/pelican/blob/main/images/Dockerfile#L39, it's a touch unclear what the behavior is.
If we do want to start freezing the toolchain (which I'm still not 100% convinced on), it seems like we need another metafile so we can set it in precisely one place (contrast with the various XRootD plugins...)
|
|
||
|
|
||
| def main(): | ||
| with open(DEFAULTS_FILE, mode="r", encoding="utf-8") as fp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're touching this, is there any reason not to also double check parameters that define osdf_default against config/resources/osdf.yaml?
We also define some other keys in the parameters.yaml file I'm not totally sure what to do about validation wise. They are root_default, client_default and server_default. I don't think we can address these here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the one hand, I don't think anything other default shows up at https://docs.pelicanplatform.org/parameters.
On the other hand, what's the worst that could happen if I add the check:
RuntimeError:
./config/resources/osdf.yaml: Xrootd.ManagerHost: The 'osdf_default' key is not in './docs/parameters.yaml'
./config/resources/osdf.yaml: Xrootd.SummaryMonitoringHost: The 'osdf_default' key is not in './docs/parameters.yaml'
./config/resources/osdf.yaml: Xrootd.DetailedMonitoringHost: The 'osdf_default' key is not in './docs/parameters.yaml'
./config/resources/osdf.yaml: Federation.DiscoveryUrl: The 'osdf_default' key is not in './docs/parameters.yaml'
./config/resources/osdf.yaml: Federation.TopologyNamespaceUrl: Expected 'https://topology.opensciencegrid.org/osdf/namespaces?production=1', but found 'https://topology.opensciencegrid.org/osdf/namespaces' for 'osdf_default' in './docs/parameters.yaml'
./config/resources/osdf.yaml: Topology.DisableDowntime: The 'osdf_default' key is not in './docs/parameters.yaml'
./config/resources/osdf.yaml: Topology.DisableCacheX509: The 'osdf_default' key is not in './docs/parameters.yaml'
./config/resources/osdf.yaml: Topology.DisableOriginX509: The 'osdf_default' key is not in './docs/parameters.yaml'
./config/resources/osdf.yaml: Topology.DisableCaches: The 'osdf_default' key is not in './docs/parameters.yaml'
./config/resources/osdf.yaml: Topology.DisableOrigins: The 'osdf_default' key is not in './docs/parameters.yaml'
As for the rest, see my responses to other comments.
| # If the defaults file has a value, it must match the parameters | ||
| # file. The reverse need not be true (the default value may be set | ||
| # via other means, i.e., code). | ||
| if default != parameters[key]["default"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script still feels inadequate to me since the bulk of our defaults aren't actually defined in config/resources/defaults.yaml.
What if instead the solution did something like:
- Build Pelican.
- In a clean environment with no existing Pelican configuration, run
pelican config dumporpelican config dump --json(whichever format is easier to work with). - Check the output of this against
docs/parameters.yaml.
Not only might this work with params that aren't defined via config/resources/defaults.yaml, but because some of our defaults use bashisms that reference other parameters:
name: Origin.Url
description: |+
The origin's configured URL, as reported to XRootD. This is the file transfer endpoint for the origin.
type: url
default: https://${Server.Hostname}:${Origin.Port}
components: ["origin"]
we could actually evaluate correctness in these more complex cases.
Feel free to say "out of scope," but I thought I'd bring it up since you took the time to clean things up here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would rather defer this work to another issue and PR. It's a good suggestion, but it's significantly beyond the current approach of some basic text comparisons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also define some other keys in the parameters.yaml file I'm not totally sure what to do about validation wise. They are root_default, client_default and server_default. I don't think we can address these here.
For the moment, I'd like to the validation to be of the form: Obvious what it intends to check, obvious that it does so correctly, and (to the cognoscenti) obvious where it is inadequate.
| # file. The reverse need not be true (the default value may be set | ||
| # via other means, i.e., code). | ||
| if default != parameters[key]["default"]: | ||
| errors.append(f"Default value does not match the parameters file: {key}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you modify the error message to provide both the deduced and expected values, and leave a note about which two files should be inspected to correct the issue?
If you worry about clutter from printing the filepaths multiple times in the case that multiple defaults fail this check, you could add that part in the join under if errors a few lines below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See this comment for the new phrasing.
I'm not concerned about repetitiveness or verbosity because the only time these errors should be appearing are for newly added parameters (or the mythical day when these checks are improved even further).
| # | ||
| # Reference: https://docs.docker.com/reference/dockerfile/#automatic-platform-args-in-the-global-scope | ||
| # | ||
| # ARG BUILDPLATFORM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that you touched these here, but it wasn't clear to me why these ARGs are left commented -- are you trying to enumerate the variables used to distinguish cross-compilation stuff like "building on" and "building for"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to clarify the comment.
Basically, I'm too lazy to open up a web browser, so it's sometimes nice to have a "quick reference" or "cheatsheet" of sorts in the file itself.
And if you're not me, I imagine that it might be nice to have all the possible build args listed in a single location, at the top of the file, even if you don't need to ever set them. (The only thing worse than needing to update a magic string is… updating a magic string buried in the middle of a file you otherwise don't care about.)
| RUN --mount=type=bind,source=go.mod,target=/context/go.mod \ | ||
| --mount=type=bind,source=web_ui/frontend/package.json,target=/context/package.json \ | ||
| --mount=type=cache,id=dnf-${BUILDPLATFORM},target=/var/cache/dnf,sharing=locked \ | ||
| <<ENDRUN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I get what's going on here (keeping image layers clean by avoiding COPY of these context files, providing a common cache for dnf), but I've never had to cook something like this up for myself and definitely don't know how to assess whether this accomplishes its goal, so I'm limited in my ability to review this section very effectively. I trust you know what you're doing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of that magic revolves around controlling what the commands being run inside the container being built have access to vs. what files are actually persisted in the final container image.
--mount=type=bind is fairly straightforward in that it just makes files and directories available from the build context (here, the Git repo, essentially) inside the container being built. It's like a COPY line, but without the persistence. (Paraphrase Einstien.)
--mount=type=cache is a bit tricker. Here, I'm using it as a persistent scratch or temp space: Multiple build stages benefit from the dnf cache (it is not exactly the fast thing to build from scratch), but actually including the cache in the final image is a waste of space. Better to let the Docker daemon hold on to that directory itself. And if it loses the cache, that's not a functional or correctness problem.
| useradd -m -s /bin/bash alice | ||
| # Add some normal user accounts for testing. | ||
| # The inspiration for their names: https://en.wikipedia.org/wiki/Alice_and_Bob#Cast_of_characters. | ||
| for name in alice bob carol dave; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently you don't care about security because you didn't let Eve join in the fun 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
640k users should be enough for anyone.
| if ${PUSH_DEV} || ${PUSH_SERVER}; then | ||
| IS_PRIMARY_GITHUB_REPO=false | ||
|
|
||
| case '${{ github.repository }}' in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me want to learn more bash!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shell quoting is a fraught exercise, doubly so in a workflow like this, where the script that's being run is what we get after GitHub mangles the text.
| # images somewhere. | ||
|
|
||
| brianaydemir/pelicanplatform-pelican) | ||
| REGISTRY_REPO="brian.aydemir.r1" ;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little hard to piece this together -- is REGISTRY_REPO always going to be tied hub.opensciencegrid.org later on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a practical matter, I would bet yes.
As a theoretical matter, there's no need to assume it. In fact, the way this workflow constructs image names in general is in need of cleaning up, which I'm doing in #3051. Look forward to the "fix" there.
We want to make sure that we're doing a clean install of dependencies. See also: - https://docs.npmjs.com/cli/v8/commands/npm-ci - https://github.com/actions/setup-node/blob/main/docs/advanced-usage.md#working-with-lockfiles
- Update to Python 3.14 - Update the action to trigger on pushes to 'main' and release branches - Format the scripts using `black` - Fix assorted logic errors and inadequacies
- Update to v4 - Add TypeScript - Fix the regex for release branches to allow multi-digit components - Remove comments that are out-of-date; add a link to GitHub Docs
- Use 'package.json' to determine the Node.js version - Use 'go.mod' to determine the Go version - Restrict them to running on the PelicanPlatform/pelican repo
- Use 'go.mod' to determine the Go version - Update the action to trigger on pushes to 'main' and release branches
- Use 'package.json' to determine the Node.js version - Use 'go.mod' to determine the Go version - Update the action to trigger on pushes to 'main' and release branches
- Use 'go.mod' to determine the Go version - Fix up the Next.js cache action based on the current guide
Also, install HTCondor into the correct container. We need it to be available in the GitHub Actions build-and-test workflow. I'm also making some assorted tweaks to try and reduce the layer count of the final images, while also trying to avoid putting things in layers where they're not really needed. I'm also trying to be consistent about creating user accounts before installing software that might rely on those accounts existing.
- Use 'package.json' to determine the Node.js version - Use 'go.mod' to determine the Go version - Set up Go, because the testing container image no longer does so - Fix up the Next.js cache action based on the current guide
Now that the testing container actually has Condor installed into it, it turns out that this test was assuming that it was running as non-root. When running as root, we need to be more precise about which user is running what, and what they need access to. Because we're assuming the existence of a HTCondor installation, we might as well use its condor_config_val to determine the LIBEXEC directory, rather than (incorrectly) trying to piece the path together ourselves. Also, given how this is run within GitHub Actions, it's arguably better to depend on how 'go test' runs tests than it is to assume that the repo checkout has been marked as safe.
More precisely, the most recent version for the language version specified in `go.mod`. While I have some concerns about the underlying API endpoint not being well documented, it's also used by the setup-go action in our GitHub Actions workflows, so as a practical matter, any changes to the API would have quite the blast radius. An alternative is `dnf install golang-1.25.*` (or similar), but I don't know how quickly AlmaLinux's AppStream updates.
49ee0a2 to
a65f0f3
Compare
b315415 to
47d446e
Compare
47d446e to
50338e9
Compare
093009f to
409d06e
Compare
|
I believe I've addressed all the comments, even if I didn't necessarily add a reply. (If there's a nice, concise listing of the comments that fits on one screen, I haven't managed to find it in GitHub's UI….) I haven't shouldn't have force-pushed over any previously existing commits. The significant change is getting rid of |
Summary
The main innovation of this PR is introducing a scheme by which a developer can run Pelican's build-and-test workflow on a branch in their fork of the main PelicanPlatform/pelican repo. This includes running the workflow in a version of the testing container that has been updated to take into account changes the branch has made to Pelican's dependencies and the workflows themselves.
As a demonstration and test of this scheme, this PR includes an update to the definition of the testing container so that the build-and-test workflow actually runs the
TestHTCondorPluginunit test, and also fixes the test so that it runs as intended.The Scheme
Requirements:
The developer has a project on hub.opensciencegrid.org that they can:
The basic idea:
The developer updates
build-and-test.ymlwith their GitHub fork and corresponding project on hub.opensciencegrid.org. (This needs to be done only once, as the change can be merged into the PelicanPlatform/pelican repo.)The developer enables the running of GitHub Actions on their fork. (They are disabled by default.)
The developer adds
PELICAN_HARBOR_ROBOT_USERandPELICAN_HARBOR_ROBOT_PASSWORDrepository secrets to their fork, with contents befitting of their names.As needed, the developer creates a new (pre-)release on their fork. The tag should be a semvar tag following Pelican's release tagging scheme. The branch should be the one they will be using to create pull requests against PelicanPlatform/pelican.
Note
To be more precise, the developer will still need to make two PRs against PelicanPlatform/pelican: One to update dependencies or similar, so that a new version of the testing container is built. And a second with all of the changes that need to be tested in that updated container. The idea and point here is to provide a way that the developer can test the combined effects of both PRs before submitting them.
While this isn't quite what #2502 asked for, I feel like it's close enough.
Other Details
The entirety of the build-and-test workflows could use a bit of love, which I intend to take care of when I tackle #2997.
In the meantime, the number of files that #2995 had to touch is disappointing, so I've gone through and updated everything else to use
go.modandpackage.jsonas the authoritative version specifiers for Go and Node.