Skip to content

Conversation

@jjudd
Copy link

@jjudd jjudd commented Dec 14, 2025

A few times a year across many thousands of builds we encounter a rare error about the DumpPlatformClassPath class being missing. Our Bazel setup uses dynamic execution, builds without the bytes, remote execution + remote caching, and path mapping.

The error we encounter is as follows:

Error: Could not find or load main class DumpPlatformClassPath
Caused by: java.lang.ClassNotFoundException: DumpPlatformClassPath

I'm guessing that this is happening due to some kind of Bazel bug that happens with our Bazel setup and tree artifacts, i.e., declare_directory.

Best I can tell this is happening because the DumpPlatformClassPath.class file is somehow not materializing correctly. I'm not 100% confident about that, but it's my leading hypothesis at this point in time.

This commit changes the actions in bootclasspath.bzl to not rely on tree artifacts. Instead, they rely JDK 11+'s ability to launch single-file programs (introduced in JEP 330). This avoids the javac action and declare_directory previously required to compile DumpPlatformClassPath.

Problem is this makes rules_java not compatible with JDK's older than 11.

I'm very open to alternative solutions to this, but I haven't yet come up with a robust, cross platform solution that avoids tree artifacts while also maintaining compatibility with JDKs older than 11.

I wanted to open this PR to get some discussion going.

@jjudd jjudd requested review from a team and comius as code owners December 14, 2025 00:47
@fmeum
Copy link
Contributor

fmeum commented Dec 14, 2025

Do you have any reason to believe that this issue is specific to this particular use of tree artifacts? They are used pervasively across many rulesets, so if this is a more general problem with either your setup or Bazel itself, then avoiding this one usage won't help much.

Compared to regular output files, tree artifacts that fail to have their contents materialized by the action don't fail the build and can thus result in a corrupted cache entry with some or all files missing. Is it possible that your executors sometimes fail to write out the file and/or upload it?

@jjudd jjudd force-pushed the bootclasspath-avoid-tree-artifact branch 4 times, most recently from 3b000c4 to ea91625 Compare December 14, 2025 21:04
…otclasspath

A few times a year across many thousands of builds we encounter a rare
error about the `DumpPlatformClassPath` class being missing. Our Bazel
setup uses dynamic execution, builds without the bytes, remote execution
+ remote caching, and path mapping.

The error we encounter is as follows:
```
Error: Could not find or load main class DumpPlatformClassPath
Caused by: java.lang.ClassNotFoundException: DumpPlatformClassPath
```

I'm guessing that this is happening due to some kind of Bazel bug that
happens with our Bazel setup and tree artifacts, i.e.,
declare_directory.

Best I can tell this is happening because the
`DumpPlatformClassPath.class` file is somehow not materializing
correctly. I'm not 100% confident about that, but it's my leading
hypothesis at this point in time.

This commit changes the actions in `bootclasspath.bzl` to, when
possible, not rely on tree artifacts. Instead, they rely JDK 11+'s
ability to launch single-file programs (introduced in JEP 330). This
avoids the `javac` action and `declare_directory` previously required to
compile `DumpPlatformClassPath`.
@jjudd jjudd force-pushed the bootclasspath-avoid-tree-artifact branch from ea91625 to 34809e2 Compare December 14, 2025 21:15
@jjudd
Copy link
Author

jjudd commented Dec 15, 2025

I've updated the PR so that it has JDK 8 compatibility. It now uses the new method if JDK >= 11 and the old method if JDK < 11. I believe CI is also now passing. The one failing build appears to be due to flakiness - it failed twice for two different reasons seemingly unrelated to this PR.

I do not have satisfactory answers to your questions. Our build has oodles and oodles of tree artifacts that we have no problems with. This action, in particular, has broken about 5 times this year and it breaks in the same way each time.

When it breaks it's particularly painful because it corrupts the cache or workspace.

I don't know what is unique about this action that makes this failure mode more likely. Maybe it's early enough in the build graph? Something else?

If it happened more often I imagine we could figure out what is going wrong and fix it. Considering it happens so rarely, I'm not confident in finding the root cause.

Copy link
Contributor

@fmeum fmeum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the fallback, this looks good to me now if the maintainers are okay with the added complexity.

@hvadehra
Copy link
Member

if the maintainers are okay with the added complexity.

Getting rid of the extra action is nice, so I think this is a nice improvement in the long run and we can drop the java < 11 case when we drop support for java 8 (I guess that will be EOY 2030 lol).

Copy link
Member

@hvadehra hvadehra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for this.

Analysis only tests in test/toolchains/bootclasspath_tests.bzl should be good enough. I'm thinking two test cases with a transition on the java version?

@hvadehra
Copy link
Member

I'm curious if you've been running this patch in your env already and have observed the problem go away?

@jjudd
Copy link
Author

jjudd commented Dec 17, 2025

Thanks for taking a look! I'll add a test this week.

We've been running with this change since the weekend and the problem went away for branches that have it. The remote cache was poisoned by an action sometime around Saturday morning. We didn't clear the cache in order to test this fix. Branches with the fix haven't run into the bad action.

We've observed no ill effects of using this change locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants