Skip to content

Conversation

@dandavison
Copy link
Contributor

@dandavison dandavison commented Nov 29, 2025

What changed?

  • Fix bug: schedule-to-close timer task validator was incorrectly requiring activity attempt at task execution time to be equal to activity attempt at task creation
  • Add test of schedule-to-close timeout that fails with the bug fix reverted
  • Do not set empty struct as outcome failure on attempt failure when retries are exhausted.
  • Improve doc comments

Why?

  • Standalone activity schedule-to-close was incorrect: would not have fired after attempt 1 without this fix
  • Setting empty struct on attempt failure when retries are exhausted should not be necessary and it is fragile to introduce special values that code might start to rely on.

How did you test it?

  • built
  • added new functional test(s)

Note

Decouples schedule-to-close timeout from attempt matching, updates proto to empty task payload, and adds a retry-based timeout test.

  • Activity execution:
    • Relax schedule-to-close timeout validation to only require TransitionTimedOut to be possible (remove attempt check) in activity_tasks.go.
    • When scheduling ScheduleToCloseTimeoutTask, stop setting Attempt in statemachine.go.
  • Proto/Generated code:
    • Make activity.proto.v1.ScheduleToCloseTimeoutTask an empty message; regenerate Go (tasks.pb.go).
  • Tests:
    • Add TestScheduleToCloseTimeout_WithRetry to verify schedule-to-close timeout across a retry.

Written by Cursor Bugbot for commit 58f13fc. This will update automatically on new commits. Configure here.

@dandavison dandavison requested review from a team as code owners November 29, 2025 01:26
func (l *library) Tasks() []*chasm.RegistrableTask {
return []*chasm.RegistrableTask{
chasm.NewRegistrableSideEffectTask[*Activity, *activitypb.ActivityDispatchTask](
chasm.NewRegistrableSideEffectTask(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this because these types are inferred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added it cuz my IDE had trouble inferring it, though everything compiles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK. Well shout out if you think we should keep it. I'd prefer to have the code be in line with Go rather than tracking IDE deficiencies, but then my IDE doesn't have a problem with it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea we can remove it. Just a slight annoyance for me, hopefully the IDE will address the issue soon.


valid := TransitionTimedOut.Possible(activity) && task.Attempt == attempt.Count
return valid, nil
return TransitionTimedOut.Possible(activity), nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bug fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you're right since there could be multiple retry attempts within a scheduleToClose tasks. Good catch.


valid := TransitionTimedOut.Possible(activity) && task.Attempt == attempt.Count
return valid, nil
return TransitionTimedOut.Possible(activity), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you're right since there could be multiple retry attempts within a scheduleToClose tasks. Good catch.

func (l *library) Tasks() []*chasm.RegistrableTask {
return []*chasm.RegistrableTask{
chasm.NewRegistrableSideEffectTask[*Activity, *activitypb.ActivityDispatchTask](
chasm.NewRegistrableSideEffectTask(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added it cuz my IDE had trouble inferring it, though everything compiles.


// ScheduleToCloseTimeoutTask is a pure task that enforces a timeout across the sequence of activity
// attempts.
message ScheduleToCloseTimeoutTask {
Copy link
Contributor

@fretz12 fretz12 Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can remove this altogether now. In place of the interface arg you can use _ any

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it because we will have validation later when we support activity resets.

@fretz12
Copy link
Contributor

fretz12 commented Nov 29, 2025

Thanks for catching this.

// If the activity has exhausted retries, mark the outcome failure as well but don't store duplicate failure info.
// Also reset the retry interval as there won't be any more retries.
if noRetriesLeft {
outcome.Variant = &activitypb.ActivityOutcome_Failed_{}
Copy link
Contributor Author

@dandavison dandavison Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fretz12 thanks for reviewing. After you reviewed, I added this commit which gets rid of this set-to-empty-struct on this line. Would you mind looking and seeing if you agree that it's unnecessary? I'd prefer not to do it because I don't want code relying on a special value set here -- I feel that the code should just be able to take the failure from the right place without any special empty value here. 0cbe6a9

Copy link
Contributor

@fretz12 fretz12 Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with that as long as the outcome is filled out on any GET API responses if an activity has reached terminal state, whether from the Attempt or Outcome field stored internally.

Copy link
Contributor Author

@dandavison dandavison Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this change from the PR so that we can discuss it separately.

Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test you added is okay but I am slightly worried that it will be flaky. I would recommend unit testing instead where you have more control over timing.


// ScheduleToCloseTimeoutTask is a pure task that enforces a timeout across the sequence of activity
// attempts.
message ScheduleToCloseTimeoutTask {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it because we will have validation later when we support activity resets.

require.Error(t, err)
}

func (s *standaloneActivityTestSuite) Test_ScheduleToCloseTimeout_WithRetry() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test could be flaky when CI is under load.
2 seconds is fairly short to ensure at least one attempt is issued.

Message: "Retryable failure",
FailureInfo: &failurepb.Failure_ApplicationFailureInfo{ApplicationFailureInfo: &failurepb.ApplicationFailureInfo{
NonRetryable: false,
NextRetryDelay: durationpb.New(1 * time.Second),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on timing, this may prevent the schedule to close timeout from firing because we would know there's not enough time for the next attempt and avoid scheduling it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a test for this behavior if we don't yet.


// TestStartToCloseTimeout tests that a start-to-close timeout is recorded after the activity is started.
func (s *standaloneActivityTestSuite) TestStartToCloseTimeout() {
func (s *standaloneActivityTestSuite) Test_StartToCloseTimeout() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We typically don't put an underscore after the word Test in the codebase.

Suggested change
func (s *standaloneActivityTestSuite) Test_StartToCloseTimeout() {
func (s *standaloneActivityTestSuite) TestStartToCloseTimeout() {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@dandavison
Copy link
Contributor Author

The test you added is okay but I am slightly worried that it will be flaky. I would recommend unit testing instead where you have more control over timing.

Agreed, I'm aware that some of the functional tests I've been writing involving timer tasks may be flaky. Let's address this in follow-on PRs. (They are doing their immediate job of verifying that the intended algorithm has been implemented.)

Base automatically changed from saa-id-policy to standalone-activity December 5, 2025 23:52
@dandavison dandavison force-pushed the saa-schedule-to-close-bug branch from 333da11 to 4325002 Compare December 6, 2025 00:11
@dandavison dandavison merged commit 8ec58d8 into standalone-activity Dec 6, 2025
11 checks passed
@dandavison dandavison deleted the saa-schedule-to-close-bug branch December 6, 2025 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants