Skip to content

Conversation

@khsrali
Copy link
Contributor

@khsrali khsrali commented Dec 1, 2025

Fix #6915
In StashCalculation, stashing could begin before the parent CalcJob that created the RemoteData source has finished execution.

Previously, StashCalculation could start immediately after being submitted, even if the CalcJob that created its source_node (RemoteData) was still running. This created a race condition where the stash operation might attempt to copy files before they were fully written or before the parent job had completed.

Solution

  • Added an awaitables list attribute to CalcJob to track PKs of CalcJobs that must complete before proceeding (generic solution)
  • Modified the Waiting state in tasks.py to check awaitables before executing transport tasks - if any awaitable is not in a terminal state (FINISHED, KILLED, EXCEPTED), the process sleeps for 5 seconds before retrying

@khsrali khsrali requested a review from agoscinski December 1, 2025 16:23
@codecov
Copy link

codecov bot commented Dec 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.59%. Comparing base (cd11f08) to head (0d060b7).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7115      +/-   ##
==========================================
+ Coverage   79.58%   79.59%   +0.02%     
==========================================
  Files         566      566              
  Lines       43517    43538      +21     
==========================================
+ Hits        34629    34651      +22     
+ Misses       8888     8887       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@GeigerJ2
Copy link
Contributor

GeigerJ2 commented Dec 2, 2025

As this PR touches the CalcJob (and therefore the entire AiiDA ecosystem), I'm pinging AiiDA veteran @mbercx here to have a look, as well, as I wouldn't feel comfortable myself making any decisions here without actually having used AiiDA in the past 2 years.

EDIT (from in-person discussion with @agoscinski): WorkChains did already have the concept of awaitables, so I guess the aiida-core native way to solve the issue of the StashCalculation not waiting for the previous calculation (and the data it produces that should actually be stashed), would be to wrap the two calculations in a WorkChain / workfunction. aiida-workgraph has actual data dependencies, so it would not be an issue there. Wondering if, conceptually, we want to add the "awaitables" feature to calculations, as well. Haven't thought about this enough. Finally, wondered how this only surfaced now, as the race condition would make any StashCalculation unusable, as they would never wait for their dependent CalcJob, no?

@mbercx
Copy link
Member

mbercx commented Dec 3, 2025

Thanks for the ping @GeigerJ2! I'm afraid I'd also have to spend a bit of time to better understand the issue and the solution here, which this week is tricky.

Looking at the changelog section that describes the StashCalculation, its use case seems to stash results for CalcJobs post-execution. I.e. the user realises that they want to stash some output files after they have already run a calculation. So I'm not sure that this fix is necessary? If a user knows a-priori that they want to stash certain files, they can just add this to the options, no?

If we do want the use case for StashCalculation to focus on already completed CalcJobs, we can also add some validation to the source_node to check that its creator is finished.

In any case, do we need to hold off on v2.7.2 to fix this? Well, at least the issue is still in the v2.7.2 project, this PR is in the v2.8.0 one, so I'm not sure. :) I'd like to have a closer look and give more thoughtful feedback, but will only have time to do so next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

StashCalculation does not wait until specified CalcJob finished

3 participants