Skip to content

Conversation

@gow
Copy link
Contributor

@gow gow commented Nov 24, 2025

What changed?

  • Prevents workflow task generation when paused (ex: signals, activity completions, child workflow completions etc)
  • Prevents activity task generation when a workflow is paused and an inflight workflow task comes back with start activity commands.
  • Prevents eager activity starts when the workflow is paused.

Why?

When a workflow is paused, we should not generate any more new activity and workflow tasks.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Note

Prevents workflow/activities from scheduling or being updated while paused, adds guards across APIs and task generation, and introduces tests to verify paused behavior.

  • Behavior changes (pause-aware):
    • RespondWorkflowTaskCompleted: reject ForceCreateNewWorkflowTask if ms.IsWorkflowExecutionStatusPaused().
    • Activity scheduling in workflow_task_completed_handler: bypass activity task generation and disable eager start when paused.
    • signalWithStartWorkflow: skip scheduling a new workflow task when paused.
    • Update workflow APIs: UpdateWorkflow rejects updates if paused; shared update util only schedules WFT if not paused.
    • NDC reapplication: do not schedule WFT when paused.
    • Task generation: GenerateScheduleWorkflowTaskTasks and ScheduleWorkflowTask no-op when paused.
    • Mutable state: add IsWorkflowExecutionStatusPaused() implementation and interface plumbing.
  • Tests:
    • Unit tests for all pause guards (respond WFT completed, signal-with-start, NDC reapply, activity failures, update workflow).
    • Transfer queue test ensures parent paused doesn’t schedule parent WFT when starting child.
    • Functional tests validate pause/unpause flow, query behavior, and absence of WFTs between pause/unpause.

Written by Cursor Bugbot for commit 69e0e02. This will update automatically on new commits. Configure here.

@gow gow requested review from a team as code owners November 24, 2025 06:12
@gow gow requested review from spkane31 and yycptt November 24, 2025 06:13
err = s.SdkClient().SignalWorkflow(ctx, workflowID, runID, s.testEndSignal, "signal to complete the workflow")
s.NoError(err)

time.Sleep(2 * time.Second) // wait 2 seconds to give enough time record the signal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the sleep required with the s.EventuallyWithT below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a failure in tests that needs to be fixed but lgtm

Comment on lines 513 to 507
} else { // if not we bypass activity task generation if eager start activity is requested.
bypassActivityTaskGeneration = eagerStartActivity
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like the else part is a duplication of L509.

// Create a transfer task to schedule a workflow task
if !mutableState.HasPendingWorkflowTask() {
// Create a transfer task to schedule a workflow task only if the workflow is in running status and there is no pending workflow task.
if !mutableState.HasPendingWorkflowTask() && mutableState.GetExecutionState().GetStatus() == enumspb.WORKFLOW_EXECUTION_STATUS_RUNNING {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think workflow tasks scheduled on the task processing side goes through this code path.

Copy link
Contributor Author

@gow gow Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm.. maybe intercepting in the task generator is the better? Specifically in TaskGeneratorImpl.GenerateScheduleWorkflowTaskTasks().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled all call sites of mutableState.AddWorkflowTaskScheduledEvent() individually (there were about 7-8 places).

@gow gow force-pushed the cg/wf_pause_4_prevent_wft branch from c293cae to 3eba1df Compare December 5, 2025 00:12
@gow gow marked this pull request as draft December 5, 2025 00:12
@gow
Copy link
Contributor Author

gow commented Dec 5, 2025

This needs more work. Moving it back to draft.

@gow gow force-pushed the cg/wf_pause_4_prevent_wft branch 3 times, most recently from 6e866e3 to 2362882 Compare December 5, 2025 22:29
@gow gow marked this pull request as ready for review December 5, 2025 22:32
@gow gow force-pushed the cg/wf_pause_4_prevent_wft branch from 2362882 to 93c915a Compare December 5, 2025 22:34
@gow gow force-pushed the cg/wf_pause_4_prevent_wft branch from 93c915a to 69e0e02 Compare December 8, 2025 20:15
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing pause check allows WFT scheduling event without task

When a workflow task completes while the workflow is paused, and conditions trigger scheduling a new workflow task (like buffered events at line 504 or wtFailedShouldCreateNewTask), the code at lines 562/568 calls AddWorkflowTaskScheduledEvent without checking pause status. This writes a WorkflowTaskScheduled event to history, but GenerateScheduleWorkflowTaskTasks skips transfer task generation due to the pause check. When the workflow is later unpaused, HasPendingWorkflowTask() returns true, so no new task is scheduled and no transfer task is generated for the existing pending task. This could leave the workflow stuck. The condition at lines 500-518 should also check IsWorkflowExecutionStatusPaused() to prevent scheduling when paused.

service/history/api/respondworkflowtaskcompleted/api.go#L499-L568

newWorkflowTaskType := enumsspb.WORKFLOW_TASK_TYPE_UNSPECIFIED
if ms.IsWorkflowExecutionRunning() {
if request.GetForceCreateNewWorkflowTask() || // Heartbeat WT is always of Normal type.
wtFailedShouldCreateNewTask ||
hasBufferedEventsOrMessages ||
activityNotStartedCancelled ||
// If the workflow has an ongoing transition to another deployment version, we should ensure
// it has a pending wft so it does not remain in the transition phase for long.
ms.GetDeploymentTransition() != nil {
newWorkflowTaskType = enumsspb.WORKFLOW_TASK_TYPE_NORMAL
} else if updateRegistry.HasOutgoingMessages(true) {
// There shouldn't be any sent updates in the registry because
// all sent but not processed updates were rejected by server.
// Therefore, it doesn't matter if to includeAlreadySent or not.
newWorkflowTaskType = enumsspb.WORKFLOW_TASK_TYPE_SPECULATIVE
}
}
bypassTaskGeneration := request.GetReturnNewWorkflowTask() && wtFailedCause == nil
// TODO (alex-update): All current SDKs always set ReturnNewWorkflowTask to true
// which means that server always bypass task generation if WFT didn't fail.
// ReturnNewWorkflowTask flag needs to be removed.
if newWorkflowTaskType == enumsspb.WORKFLOW_TASK_TYPE_SPECULATIVE && !bypassTaskGeneration {
// If task generation can't be bypassed (i.e. WFT has failed),
// WFT must be created as Normal because speculative WFT by nature skips task generation.
newWorkflowTaskType = enumsspb.WORKFLOW_TASK_TYPE_NORMAL
}
var newWorkflowTask *historyi.WorkflowTaskInfo
// Speculative workflow task will be created after mutable state is persisted.
if newWorkflowTaskType == enumsspb.WORKFLOW_TASK_TYPE_NORMAL {
versioningStamp := request.WorkerVersionStamp
if versioningStamp.GetUseVersioning() {
if ms.GetAssignedBuildId() == "" {
// old versioning is used. making sure the versioning stamp does not go through otherwise the
// workflow will start using new versioning which may surprise users.
// TODO: remove this block when deleting old wv [cleanup-old-wv]
versioningStamp = nil
} else {
// new versioning is used. do not return new wft to worker if stamp build ID does not match wf build ID
// let the task go through matching and get dispatched to the right worker
if versioningStamp.GetBuildId() != ms.GetAssignedBuildId() {
bypassTaskGeneration = false
}
}
}
if ms.GetDeploymentTransition() != nil {
// Do not return new wft to worker if the workflow is transitioning to a different deployment version.
// Let the task go through matching and get dispatched to the right worker
bypassTaskGeneration = false
}
var newWTErr error
// If we checked WT heartbeat timeout before and WT wasn't timed out,
// then OriginalScheduledTime needs to be carried over to the new WT.
if checkWTHeartbeatTimeout && !wtHeartbeatTimedOut {
newWorkflowTask, newWTErr = ms.AddWorkflowTaskScheduledEventAsHeartbeat(
bypassTaskGeneration,
timestamppb.New(currentWorkflowTask.OriginalScheduledTime),
enumsspb.WORKFLOW_TASK_TYPE_NORMAL, // Heartbeat workflow task is always of Normal type.
)
} else {
newWorkflowTask, newWTErr = ms.AddWorkflowTaskScheduledEvent(bypassTaskGeneration, newWorkflowTaskType)

Fix in Cursor Fix in Web


isPaused := false
for hist.HasNext() {
event, err := hist.Next()
s.NoError(err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Test assertion uses wrong context inside EventuallyWithT

In assertWorkflowIsPaused, line 520 uses s.NoError(err) while all other assertions in the function correctly use require.NoError(t, err) with the passed *assert.CollectT. This function is called inside EventuallyWithT, which expects assertions to use the provided CollectT for retry logic. Using s.NoError() will cause the test to fail immediately on error rather than allowing EventuallyWithT to retry, leading to potentially flaky tests or incorrect failure behavior.

Fix in Cursor Fix in Web

Copy link
Member

@yycptt yycptt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think task_refresher needs to be updated as well to not regenerated activity & workflow task. and yeah I think I'd prefer the logic lives in refresher not task generator, and keeps generator logic simple/straightforward.

return nil, serviceerror.NewNotFound("Workflow task not found.")
}

// We don't accept the request to create a new workflow task if the workflow is paused.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still seems weird/inconsistent to me that if workflow task is not heartbeating, we accept the result, otherwise we drop the response and let it timeout. But I understand this is what's been agreed on.

Comment on lines +428 to +430
if r.mutableState.GetExecutionState().Status == enumspb.WORKFLOW_EXECUTION_STATUS_PAUSED {
return nil // we bypass task generation if the workflow is paused.
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I don't think you need this? Upper layer should already prevented workflow task in mutable state from being created in the first place?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants