-
Notifications
You must be signed in to change notification settings - Fork 87
Attempt to fix missing class error by not using tree artifacts for bootclasspath #340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Attempt to fix missing class error by not using tree artifacts for bootclasspath #340
Conversation
|
Do you have any reason to believe that this issue is specific to this particular use of tree artifacts? They are used pervasively across many rulesets, so if this is a more general problem with either your setup or Bazel itself, then avoiding this one usage won't help much. Compared to regular output files, tree artifacts that fail to have their contents materialized by the action don't fail the build and can thus result in a corrupted cache entry with some or all files missing. Is it possible that your executors sometimes fail to write out the file and/or upload it? |
3b000c4 to
ea91625
Compare
…otclasspath A few times a year across many thousands of builds we encounter a rare error about the `DumpPlatformClassPath` class being missing. Our Bazel setup uses dynamic execution, builds without the bytes, remote execution + remote caching, and path mapping. The error we encounter is as follows: ``` Error: Could not find or load main class DumpPlatformClassPath Caused by: java.lang.ClassNotFoundException: DumpPlatformClassPath ``` I'm guessing that this is happening due to some kind of Bazel bug that happens with our Bazel setup and tree artifacts, i.e., declare_directory. Best I can tell this is happening because the `DumpPlatformClassPath.class` file is somehow not materializing correctly. I'm not 100% confident about that, but it's my leading hypothesis at this point in time. This commit changes the actions in `bootclasspath.bzl` to, when possible, not rely on tree artifacts. Instead, they rely JDK 11+'s ability to launch single-file programs (introduced in JEP 330). This avoids the `javac` action and `declare_directory` previously required to compile `DumpPlatformClassPath`.
ea91625 to
34809e2
Compare
|
I've updated the PR so that it has JDK 8 compatibility. It now uses the new method if JDK >= 11 and the old method if JDK < 11. I believe CI is also now passing. The one failing build appears to be due to flakiness - it failed twice for two different reasons seemingly unrelated to this PR. I do not have satisfactory answers to your questions. Our build has oodles and oodles of tree artifacts that we have no problems with. This action, in particular, has broken about 5 times this year and it breaks in the same way each time. When it breaks it's particularly painful because it corrupts the cache or workspace. I don't know what is unique about this action that makes this failure mode more likely. Maybe it's early enough in the build graph? Something else? If it happened more often I imagine we could figure out what is going wrong and fix it. Considering it happens so rarely, I'm not confident in finding the root cause. |
fmeum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the fallback, this looks good to me now if the maintainers are okay with the added complexity.
Getting rid of the extra action is nice, so I think this is a nice improvement in the long run and we can drop the java < 11 case when we drop support for java 8 (I guess that will be EOY 2030 lol). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test for this.
Analysis only tests in test/toolchains/bootclasspath_tests.bzl should be good enough. I'm thinking two test cases with a transition on the java version?
|
I'm curious if you've been running this patch in your env already and have observed the problem go away? |
|
Thanks for taking a look! I'll add a test this week. We've been running with this change since the weekend and the problem went away for branches that have it. The remote cache was poisoned by an action sometime around Saturday morning. We didn't clear the cache in order to test this fix. Branches with the fix haven't run into the bad action. We've observed no ill effects of using this change locally. |
A few times a year across many thousands of builds we encounter a rare error about the
DumpPlatformClassPathclass being missing. Our Bazel setup uses dynamic execution, builds without the bytes, remote execution + remote caching, and path mapping.The error we encounter is as follows:
I'm guessing that this is happening due to some kind of Bazel bug that happens with our Bazel setup and tree artifacts, i.e., declare_directory.
Best I can tell this is happening because the
DumpPlatformClassPath.classfile is somehow not materializing correctly. I'm not 100% confident about that, but it's my leading hypothesis at this point in time.This commit changes the actions in
bootclasspath.bzlto not rely on tree artifacts. Instead, they rely JDK 11+'s ability to launch single-file programs (introduced in JEP 330). This avoids thejavacaction anddeclare_directorypreviously required to compileDumpPlatformClassPath.Problem is this makes rules_java not compatible with JDK's older than 11.
I'm very open to alternative solutions to this, but I haven't yet come up with a robust, cross platform solution that avoids tree artifacts while also maintaining compatibility with JDKs older than 11.
I wanted to open this PR to get some discussion going.