Enable the build and test workflow to push images to non-Pelican repos #3022

brianaydemir · 2026-01-22T23:40:50Z

Summary

The main innovation of this PR is introducing a scheme by which a developer can run Pelican's build-and-test workflow on a branch in their fork of the main PelicanPlatform/pelican repo. This includes running the workflow in a version of the testing container that has been updated to take into account changes the branch has made to Pelican's dependencies and the workflows themselves.

As a demonstration and test of this scheme, this PR includes an update to the definition of the testing container so that the build-and-test workflow actually runs the TestHTCondorPlugin unit test, and also fixes the test so that it runs as intended.

The Scheme

Requirements:

The developer has a project on hub.opensciencegrid.org that they can:
1. Pull/push container images to
2. Create a "robot" account with similar privileges

The basic idea:

The developer updates build-and-test.yml with their GitHub fork and corresponding project on hub.opensciencegrid.org. (This needs to be done only once, as the change can be merged into the PelicanPlatform/pelican repo.)
The developer enables the running of GitHub Actions on their fork. (They are disabled by default.)
The developer adds PELICAN_HARBOR_ROBOT_USER and PELICAN_HARBOR_ROBOT_PASSWORD repository secrets to their fork, with contents befitting of their names.
As needed, the developer creates a new (pre-)release on their fork. The tag should be a semvar tag following Pelican's release tagging scheme. The branch should be the one they will be using to create pull requests against PelicanPlatform/pelican.

Note

To be more precise, the developer will still need to make two PRs against PelicanPlatform/pelican: One to update dependencies or similar, so that a new version of the testing container is built. And a second with all of the changes that need to be tested in that updated container. The idea and point here is to provide a way that the developer can test the combined effects of both PRs before submitting them.

While this isn't quite what #2502 asked for, I feel like it's close enough.

Other Details

The entirety of the build-and-test workflows could use a bit of love, which I intend to take care of when I tackle #2997.

In the meantime, the number of files that #2995 had to touch is disappointing, so I've gone through and updated everything else to use go.mod and package.json as the authoritative version specifiers for Go and Node.

github-advanced-security · 2026-01-22T23:42:01Z

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

brianaydemir · 2026-01-22T23:47:14Z

@jhiemstrawisc I would suggest reviewing this PR commit-by-commit, because the commit messages contain more information about what I had in mind for each of the files being touched here. If your eyes glaze over, I'd focus on TestHTCondorPlugin, because that's something that was added specifically to catch release blockers sooner, so it's unfortunate that it hasn't actually been running in our CI workflow.

As I write this comment, the history at https://github.com/brianaydemir/pelicanplatform-pelican/actions has the record of me dogfooding this PR, so to speak.

brianaydemir · 2026-01-24T16:21:31Z

To be more precise, the developer will still need to make two PRs against PelicanPlatform/pelican

Ironically, this PR isn't structured like that. But perhaps we can take this opportunity to demonstrate an alternative workflow: at reviewer discretion, they can accept the workflow runs from the fork on which the PR is based.

There's also a lesson to be learned here: Take care to structure commits so that there's one that updates dependencies, container image definitions, and the like; and a second that changes code as needed. (I accidentally squashed too many commits together during development….)

jhiemstrawisc

Overall this feels pretty solid, most of my comments are minor nits, questions, or additional threads I hope to tempt you on pulling 😉

Can you also document the process for developers to trigger these tests in our CONTRIBUTE.md so its description lives somewhere other than this PR?

This isn't yet a complete review because I haven't yet tested the workflow. I hope to do that shortly.

jhiemstrawisc · 2026-01-28T19:06:52Z

cmd/plugin_htcondor_test.go


-	// Set XRD_PLUGINCONFDIR so XRootD can load the pelican protocol handler
-	// This allows the cache to handle pelican:// URLs
-	t.Setenv("XRD_PLUGINCONFDIR", "/Users/bbockelm/projects/xrdcl-curl/build/release_dir/etc/xrootd/client.plugins.d/")


I've been trying to leave breadcrumbs wherever bad AI code was merged, especially in cases where I think a reviewer really should have caught it. It looks like this was merged without a second human review in #2900.

Is this just an observation, or a request for me to leave a comment somewhere? If the latter, I'm not sure where you had in mind.

jhiemstrawisc · 2026-01-28T21:01:47Z

go.mod


 go 1.25.0

+toolchain go1.25.5


I'm going through commit-by-commit like you suggested, so maybe I'll see the answer to this later, but why is pinning the toolchain to this specific version needed?

I ask because it looks like BrianB recently made the explicit choice to remove the toolchain definition. See:
2f798fa#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6R6

Indeed, I'd like to avoid setting toolchain unless necessary (unless I'm misunderstanding the point of the toolchain line?).

In my understanding, toolchain pins a quite specific version of the toolchain (as in, "you shall use 1.25.5, not 1.25.4 or the latest 1.25.x patch release"). Unless there's a reason why we don't want to pick up things like security fixes in patch releases, I think it's better to keep the line out.

Dug a little in GitHub history and it seems like #2247 introduced pinning the toolchain. It was labelled as a dependabot roll-up ... but it might have been inadvertent?

(Again, this is just based on my understanding of the toolchain line; let me know if I got it wrong)

So, the goal here is to have one location that defines the Go version to use, and go.mod seems as good a place as any because it already expresses an opinion on that.

I think the actual problem is that unless Go and/or GoReleaser automatically update to the latest version of the toolchain, even when the currently installed version already satisfies any stated requirements (e.g., the go line in go.mod), we've effectively been pinning the toolchain version in images/Dockerfile since at least #1877.

That said, there's an API that can be used to look up what the latest version is.

Honestly, not 100% sure what Go's behavior is here. One of the nifty (scary?) things introduced last year is the ability to have the runtime and compiler be independent -- go version N compiler can compile code for N+2, for example.

So, despite the CI/CD using 1.25.5 at https://github.com/PelicanPlatform/pelican/blob/main/images/Dockerfile#L39, it's a touch unclear what the behavior is.

If we do want to start freezing the toolchain (which I'm still not 100% convinced on), it seems like we need another metafile so we can set it in precisely one place (contrast with the various XRootD plugins...)

jhiemstrawisc · 2026-01-28T21:09:51Z

.github/scripts/validate-defaults/main.py

+
+
+def main():
+    with open(DEFAULTS_FILE, mode="r", encoding="utf-8") as fp:


Since you're touching this, is there any reason not to also double check parameters that define osdf_default against config/resources/osdf.yaml?

We also define some other keys in the parameters.yaml file I'm not totally sure what to do about validation wise. They are root_default, client_default and server_default. I don't think we can address these here.

On the one hand, I don't think anything other default shows up at https://docs.pelicanplatform.org/parameters.

On the other hand, what's the worst that could happen if I add the check:

RuntimeError: ./config/resources/osdf.yaml: Xrootd.ManagerHost: The 'osdf_default' key is not in './docs/parameters.yaml' ./config/resources/osdf.yaml: Xrootd.SummaryMonitoringHost: The 'osdf_default' key is not in './docs/parameters.yaml' ./config/resources/osdf.yaml: Xrootd.DetailedMonitoringHost: The 'osdf_default' key is not in './docs/parameters.yaml' ./config/resources/osdf.yaml: Federation.DiscoveryUrl: The 'osdf_default' key is not in './docs/parameters.yaml' ./config/resources/osdf.yaml: Federation.TopologyNamespaceUrl: Expected 'https://topology.opensciencegrid.org/osdf/namespaces?production=1', but found 'https://topology.opensciencegrid.org/osdf/namespaces' for 'osdf_default' in './docs/parameters.yaml' ./config/resources/osdf.yaml: Topology.DisableDowntime: The 'osdf_default' key is not in './docs/parameters.yaml' ./config/resources/osdf.yaml: Topology.DisableCacheX509: The 'osdf_default' key is not in './docs/parameters.yaml' ./config/resources/osdf.yaml: Topology.DisableOriginX509: The 'osdf_default' key is not in './docs/parameters.yaml' ./config/resources/osdf.yaml: Topology.DisableCaches: The 'osdf_default' key is not in './docs/parameters.yaml' ./config/resources/osdf.yaml: Topology.DisableOrigins: The 'osdf_default' key is not in './docs/parameters.yaml'

As for the rest, see my responses to other comments.

jhiemstrawisc · 2026-01-28T21:16:16Z

.github/scripts/validate-defaults/main.py

+        # If the defaults file has a value, it must match the parameters
+        # file. The reverse need not be true (the default value may be set
+        # via other means, i.e., code).
+        if default != parameters[key]["default"]:


This script still feels inadequate to me since the bulk of our defaults aren't actually defined in config/resources/defaults.yaml.

What if instead the solution did something like:

Build Pelican.

In a clean environment with no existing Pelican configuration, run pelican config dump or pelican config dump --json (whichever format is easier to work with).

Check the output of this against docs/parameters.yaml.

Not only might this work with params that aren't defined via config/resources/defaults.yaml, but because some of our defaults use bashisms that reference other parameters:

name: Origin.Url description: |+ The origin's configured URL, as reported to XRootD. This is the file transfer endpoint for the origin. type: url default: https://${Server.Hostname}:${Origin.Port} components: ["origin"]

we could actually evaluate correctness in these more complex cases.

Feel free to say "out of scope," but I thought I'd bring it up since you took the time to clean things up here.

Yeah, I would rather defer this work to another issue and PR. It's a good suggestion, but it's significantly beyond the current approach of some basic text comparisons.

And from another comment:

We also define some other keys in the parameters.yaml file I'm not totally sure what to do about validation wise. They are root_default, client_default and server_default. I don't think we can address these here.

For the moment, I'd like to the validation to be of the form: Obvious what it intends to check, obvious that it does so correctly, and (to the cognoscenti) obvious where it is inadequate.

jhiemstrawisc · 2026-01-28T21:19:26Z

.github/scripts/validate-defaults/main.py

+        # file. The reverse need not be true (the default value may be set
+        # via other means, i.e., code).
+        if default != parameters[key]["default"]:
+            errors.append(f"Default value does not match the parameters file: {key}")


Can you modify the error message to provide both the deduced and expected values, and leave a note about which two files should be inspected to correct the issue?

If you worry about clutter from printing the filepaths multiple times in the case that multiple defaults fail this check, you could add that part in the join under if errors a few lines below.

See this comment for the new phrasing.

I'm not concerned about repetitiveness or verbosity because the only time these errors should be appearing are for newly added parameters (or the mythical day when these checks are improved even further).

jhiemstrawisc · 2026-01-28T21:52:08Z

images/Dockerfile

 #
 # Reference: https://docs.docker.com/reference/dockerfile/#automatic-platform-args-in-the-global-scope
 #
 # ARG BUILDPLATFORM


Not that you touched these here, but it wasn't clear to me why these ARGs are left commented -- are you trying to enumerate the variables used to distinguish cross-compilation stuff like "building on" and "building for"?

I'll try to clarify the comment.

Basically, I'm too lazy to open up a web browser, so it's sometimes nice to have a "quick reference" or "cheatsheet" of sorts in the file itself.

And if you're not me, I imagine that it might be nice to have all the possible build args listed in a single location, at the top of the file, even if you don't need to ever set them. (The only thing worse than needing to update a magic string is… updating a magic string buried in the middle of a file you otherwise don't care about.)

jhiemstrawisc · 2026-01-28T21:57:49Z

images/Dockerfile

+RUN --mount=type=bind,source=go.mod,target=/context/go.mod \
+    --mount=type=bind,source=web_ui/frontend/package.json,target=/context/package.json \
+    --mount=type=cache,id=dnf-${BUILDPLATFORM},target=/var/cache/dnf,sharing=locked \
+    <<ENDRUN


I think I get what's going on here (keeping image layers clean by avoiding COPY of these context files, providing a common cache for dnf), but I've never had to cook something like this up for myself and definitely don't know how to assess whether this accomplishes its goal, so I'm limited in my ability to review this section very effectively. I trust you know what you're doing!

A lot of that magic revolves around controlling what the commands being run inside the container being built have access to vs. what files are actually persisted in the final container image.

--mount=type=bind is fairly straightforward in that it just makes files and directories available from the build context (here, the Git repo, essentially) inside the container being built. It's like a COPY line, but without the persistence. (Paraphrase Einstien.)

--mount=type=cache is a bit tricker. Here, I'm using it as a persistent scratch or temp space: Multiple build stages benefit from the dnf cache (it is not exactly the fast thing to build from scratch), but actually including the cache in the final image is a waste of space. Better to let the Docker daemon hold on to that directory itself. And if it loses the cache, that's not a functional or correctness problem.

jhiemstrawisc · 2026-01-28T22:05:39Z

images/Dockerfile

-  useradd -m -s /bin/bash alice
+  # Add some normal user accounts for testing.
+  # The inspiration for their names: https://en.wikipedia.org/wiki/Alice_and_Bob#Cast_of_characters.
+  for name in alice bob carol dave; do


Apparently you don't care about security because you didn't let Eve join in the fun 😛

640k users should be enough for anyone.

jhiemstrawisc · 2026-01-28T22:28:10Z

.github/workflows/build-and-test.yml

-          if ${PUSH_DEV} || ${PUSH_SERVER}; then
+          IS_PRIMARY_GITHUB_REPO=false
+
+          case '${{ github.repository }}' in


This makes me want to learn more bash!

Shell quoting is a fraught exercise, doubly so in a workflow like this, where the script that's being run is what we get after GitHub mangles the text.

jhiemstrawisc · 2026-01-28T22:30:21Z

.github/workflows/build-and-test.yml

+            # images somewhere.
+
+            brianaydemir/pelicanplatform-pelican)
+              REGISTRY_REPO="brian.aydemir.r1" ;;


Little hard to piece this together -- is REGISTRY_REPO always going to be tied hub.opensciencegrid.org later on?

As a practical matter, I would bet yes.

As a theoretical matter, there's no need to assume it. In fact, the way this workflow constructs image names in general is in need of cleaning up, which I'm doing in #3051. Look forward to the "fix" there.

We want to make sure that we're doing a clean install of dependencies. See also: - https://docs.npmjs.com/cli/v8/commands/npm-ci - https://github.com/actions/setup-node/blob/main/docs/advanced-usage.md#working-with-lockfiles

- Update to Python 3.14 - Update the action to trigger on pushes to 'main' and release branches - Format the scripts using `black` - Fix assorted logic errors and inadequacies

- Update to v4 - Add TypeScript - Fix the regex for release branches to allow multi-digit components - Remove comments that are out-of-date; add a link to GitHub Docs

- Use 'package.json' to determine the Node.js version - Use 'go.mod' to determine the Go version - Restrict them to running on the PelicanPlatform/pelican repo

- Use 'go.mod' to determine the Go version - Update the action to trigger on pushes to 'main' and release branches

- Use 'package.json' to determine the Node.js version - Use 'go.mod' to determine the Go version - Update the action to trigger on pushes to 'main' and release branches

- Use 'go.mod' to determine the Go version - Fix up the Next.js cache action based on the current guide

Also, install HTCondor into the correct container. We need it to be available in the GitHub Actions build-and-test workflow. I'm also making some assorted tweaks to try and reduce the layer count of the final images, while also trying to avoid putting things in layers where they're not really needed. I'm also trying to be consistent about creating user accounts before installing software that might rely on those accounts existing.

- Use 'package.json' to determine the Node.js version - Use 'go.mod' to determine the Go version - Set up Go, because the testing container image no longer does so - Fix up the Next.js cache action based on the current guide

Now that the testing container actually has Condor installed into it, it turns out that this test was assuming that it was running as non-root. When running as root, we need to be more precise about which user is running what, and what they need access to. Because we're assuming the existence of a HTCondor installation, we might as well use its condor_config_val to determine the LIBEXEC directory, rather than (incorrectly) trying to piece the path together ourselves. Also, given how this is run within GitHub Actions, it's arguably better to depend on how 'go test' runs tests than it is to assume that the repo checkout has been marked as safe.

…licit

More precisely, the most recent version for the language version specified in `go.mod`. While I have some concerns about the underlying API endpoint not being well documented, it's also used by the setup-go action in our GitHub Actions workflows, so as a practical matter, any changes to the API would have quite the blast radius. An alternative is `dnf install golang-1.25.*` (or similar), but I don't know how quickly AlmaLinux's AppStream updates.

brianaydemir · 2026-02-03T13:33:02Z

I believe I've addressed all the comments, even if I didn't necessarily add a reply. (If there's a nice, concise listing of the comments that fits on one screen, I haven't managed to find it in GitHub's UI….)

I haven't shouldn't have force-pushed over any previously existing commits.

The significant change is getting rid of toolchain in go.mod and using the go.dev API in images/Dockerfile to find the most recent version of the toolchain to install. The configuration for the setup-go action has been updated similarly.

brianaydemir added this to the v7.24 milestone Jan 22, 2026

brianaydemir requested a review from jhiemstrawisc January 22, 2026 23:40

brianaydemir assigned brianaydemir and jhiemstrawisc Jan 22, 2026

brianaydemir added bug Something isn't working test Improvements to the test suite infrastructure GitHub Actions, Release management, and CI internal Internal code improvements, not user-facing labels Jan 22, 2026

brianaydemir linked an issue Jan 22, 2026 that may be closed by this pull request

PRs that touch Dockerfile should run tests in a container image based on that Dockerfile #2502

Open

brianaydemir marked this pull request as ready for review January 22, 2026 23:47

brianaydemir mentioned this pull request Jan 29, 2026

Use architecture-specific runners for multi-architecture builds #3051

Draft

jhiemstrawisc requested changes Jan 29, 2026

View reviewed changes

brianaydemir added 15 commits February 1, 2026 14:27

Run the 'geoquery' script through standard Python tools

cd38383

Prefer 'npm ci' to 'npm install' in GitHub Actions

f83fd60

We want to make sure that we're doing a clean install of dependencies. See also: - https://docs.npmjs.com/cli/v8/commands/npm-ci - https://github.com/actions/setup-node/blob/main/docs/advanced-usage.md#working-with-lockfiles

Lay the foundation for using go.mod to define the Go toolchain version

091f1c9

Maintenance on the parameters.yaml validation action

ad5abcb

- Update to Python 3.14 - Update the action to trigger on pushes to 'main' and release branches - Format the scripts using `black` - Fix assorted logic errors and inadequacies

Maintenance on the CodeQL action

2f8f1e8

- Update to v4 - Add TypeScript - Fix the regex for release branches to allow multi-digit components - Remove comments that are out-of-date; add a link to GitHub Docs

Maintenance on the release actions

172f3d5

- Use 'package.json' to determine the Node.js version - Use 'go.mod' to determine the Go version - Restrict them to running on the PelicanPlatform/pelican repo

Maintenance on the "go generate" action

9172450

- Use 'go.mod' to determine the Go version - Update the action to trigger on pushes to 'main' and release branches

Maintenance on the linter action

a5601f8

- Use 'package.json' to determine the Node.js version - Use 'go.mod' to determine the Go version - Update the action to trigger on pushes to 'main' and release branches

Maintenance on the non-Linux tests

21a774c

- Use 'go.mod' to determine the Go version - Fix up the Next.js cache action based on the current guide

Maintenance on the Linux testing action

fa6a721

- Use 'package.json' to determine the Node.js version - Use 'go.mod' to determine the Go version - Set up Go, because the testing container image no longer does so - Fix up the Next.js cache action based on the current guide

Fix parameters.yaml validation errors

67636e3

Tests that build Pelican and don't need VCS info should make that exp…

d69b805

…licit

Enable the build and test workflow to push images to non-Pelican repos

2ceb1a1

brianaydemir added 2 commits February 1, 2026 14:27

Check the documentation for OSDF's default parameters

1a88411

brianaydemir force-pushed the actions-fixes branch from 49ee0a2 to a65f0f3 Compare February 1, 2026 20:27

Clarify comments

24be2be

brianaydemir force-pushed the actions-fixes branch 4 times, most recently from b315415 to 47d446e Compare February 2, 2026 18:06

Document how to test changes on your own fork

50338e9

brianaydemir force-pushed the actions-fixes branch from 47d446e to 50338e9 Compare February 2, 2026 18:25

Update messages based on reviewer feedback

409d06e

brianaydemir force-pushed the actions-fixes branch from 093009f to 409d06e Compare February 3, 2026 13:29

brianaydemir requested a review from jhiemstrawisc February 3, 2026 13:33

Configure setup-go to check for the latest patch release

5075b7c



		def main():
		with open(DEFAULTS_FILE, mode="r", encoding="utf-8") as fp:

Enable the build and test workflow to push images to non-Pelican repos #3022

Are you sure you want to change the base?

Enable the build and test workflow to push images to non-Pelican repos #3022

Uh oh!

Conversation

brianaydemir commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The Scheme

Other Details

Uh oh!

github-advanced-security bot commented Jan 22, 2026

Uh oh!

brianaydemir commented Jan 22, 2026

Uh oh!

brianaydemir commented Jan 24, 2026

Uh oh!

jhiemstrawisc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brianaydemir Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brianaydemir commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brianaydemir commented Jan 22, 2026 •

edited

Loading

brianaydemir Feb 1, 2026 •

edited

Loading

brianaydemir commented Feb 3, 2026 •

edited

Loading