-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix bug in CHASM activity schedule-to-close timer task validation #8720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
chasm/lib/activity/library.go
Outdated
| func (l *library) Tasks() []*chasm.RegistrableTask { | ||
| return []*chasm.RegistrableTask{ | ||
| chasm.NewRegistrableSideEffectTask[*Activity, *activitypb.ActivityDispatchTask]( | ||
| chasm.NewRegistrableSideEffectTask( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this because these types are inferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had added it cuz my IDE had trouble inferring it, though everything compiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK. Well shout out if you think we should keep it. I'd prefer to have the code be in line with Go rather than tracking IDE deficiencies, but then my IDE doesn't have a problem with it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea we can remove it. Just a slight annoyance for me, hopefully the IDE will address the issue soon.
|
|
||
| valid := TransitionTimedOut.Possible(activity) && task.Attempt == attempt.Count | ||
| return valid, nil | ||
| return TransitionTimedOut.Possible(activity), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bug fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you're right since there could be multiple retry attempts within a scheduleToClose tasks. Good catch.
|
|
||
| valid := TransitionTimedOut.Possible(activity) && task.Attempt == attempt.Count | ||
| return valid, nil | ||
| return TransitionTimedOut.Possible(activity), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you're right since there could be multiple retry attempts within a scheduleToClose tasks. Good catch.
chasm/lib/activity/library.go
Outdated
| func (l *library) Tasks() []*chasm.RegistrableTask { | ||
| return []*chasm.RegistrableTask{ | ||
| chasm.NewRegistrableSideEffectTask[*Activity, *activitypb.ActivityDispatchTask]( | ||
| chasm.NewRegistrableSideEffectTask( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had added it cuz my IDE had trouble inferring it, though everything compiles.
|
|
||
| // ScheduleToCloseTimeoutTask is a pure task that enforces a timeout across the sequence of activity | ||
| // attempts. | ||
| message ScheduleToCloseTimeoutTask { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you can remove this altogether now. In place of the interface arg you can use _ any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep it because we will have validation later when we support activity resets.
|
Thanks for catching this. |
| // If the activity has exhausted retries, mark the outcome failure as well but don't store duplicate failure info. | ||
| // Also reset the retry interval as there won't be any more retries. | ||
| if noRetriesLeft { | ||
| outcome.Variant = &activitypb.ActivityOutcome_Failed_{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fretz12 thanks for reviewing. After you reviewed, I added this commit which gets rid of this set-to-empty-struct on this line. Would you mind looking and seeing if you agree that it's unnecessary? I'd prefer not to do it because I don't want code relying on a special value set here -- I feel that the code should just be able to take the failure from the right place without any special empty value here. 0cbe6a9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with that as long as the outcome is filled out on any GET API responses if an activity has reached terminal state, whether from the Attempt or Outcome field stored internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this change from the PR so that we can discuss it separately.
bergundy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test you added is okay but I am slightly worried that it will be flaky. I would recommend unit testing instead where you have more control over timing.
|
|
||
| // ScheduleToCloseTimeoutTask is a pure task that enforces a timeout across the sequence of activity | ||
| // attempts. | ||
| message ScheduleToCloseTimeoutTask { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep it because we will have validation later when we support activity resets.
tests/standalone_activity_test.go
Outdated
| require.Error(t, err) | ||
| } | ||
|
|
||
| func (s *standaloneActivityTestSuite) Test_ScheduleToCloseTimeout_WithRetry() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test could be flaky when CI is under load.
2 seconds is fairly short to ensure at least one attempt is issued.
| Message: "Retryable failure", | ||
| FailureInfo: &failurepb.Failure_ApplicationFailureInfo{ApplicationFailureInfo: &failurepb.ApplicationFailureInfo{ | ||
| NonRetryable: false, | ||
| NextRetryDelay: durationpb.New(1 * time.Second), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on timing, this may prevent the schedule to close timeout from firing because we would know there's not enough time for the next attempt and avoid scheduling it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a test for this behavior if we don't yet.
tests/standalone_activity_test.go
Outdated
|
|
||
| // TestStartToCloseTimeout tests that a start-to-close timeout is recorded after the activity is started. | ||
| func (s *standaloneActivityTestSuite) TestStartToCloseTimeout() { | ||
| func (s *standaloneActivityTestSuite) Test_StartToCloseTimeout() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We typically don't put an underscore after the word Test in the codebase.
| func (s *standaloneActivityTestSuite) Test_StartToCloseTimeout() { | |
| func (s *standaloneActivityTestSuite) TestStartToCloseTimeout() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Agreed, I'm aware that some of the functional tests I've been writing involving timer tasks may be flaky. Let's address this in follow-on PRs. (They are doing their immediate job of verifying that the intended algorithm has been implemented.) |
333da11 to
4325002
Compare
What changed?
Why?
How did you test it?
Note
Decouples schedule-to-close timeout from attempt matching, updates proto to empty task payload, and adds a retry-based timeout test.
schedule-to-closetimeout validation to only requireTransitionTimedOutto be possible (remove attempt check) inactivity_tasks.go.ScheduleToCloseTimeoutTask, stop settingAttemptinstatemachine.go.activity.proto.v1.ScheduleToCloseTimeoutTaskan empty message; regenerate Go (tasks.pb.go).TestScheduleToCloseTimeout_WithRetryto verify schedule-to-close timeout across a retry.Written by Cursor Bugbot for commit 58f13fc. This will update automatically on new commits. Configure here.