Skip to content

Conversation

@grutt
Copy link
Contributor

@grutt grutt commented Jun 12, 2025

Description

This addresses the race condition where multiple listeners can interfere with each other's active status

Race condition scenario:

  1. Worker A establishes ListenV2 connection, sets worker active=true with timestamp T1
  2. Worker B establishes ListenV2 connection, sets worker active=true with timestamp T2 (T2 > T1)
  3. Worker A disconnects, sets worker active=false with timestamp T1
  4. Since T1 <= T2, the database update succeeds and worker is marked inactive
  5. Worker B is still connected but worker appears inactive, causing heartbeat rejections

This polling goroutine ensures that as long as ANY ListenV2 connection is active,
the worker status remains active=true, preventing the race condition

Type of change

  • Bug fix (non-breaking change which fixes an issue)

@grutt grutt requested a review from Copilot June 12, 2025 12:32
@vercel
Copy link

vercel bot commented Jun 12, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
hatchet-docs Ready Ready Preview Comment Sep 11, 2025 3:43pm
hatchet-v0-docs Ready Ready Preview Comment Sep 11, 2025 3:43pm

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a background polling goroutine in ListenV2 to keep a worker’s active status true whenever any connection remains open, preventing unintended deactivation due to timestamp races.

  • Starts a ticker-based goroutine that checks and reactivates a worker every 2 seconds
  • Logs debug details on poller shutdown and status updates
  • Handles database errors gracefully during polling
Comments suppressed due to low confidence (1)

internal/services/dispatcher/server.go:507

  • Add unit or integration tests for this new polling behavior, verifying that the worker remains active while any listener is connected and that status updates use the correct timestamp ordering.
go func() {

// If worker is not active, set it to active
if !currentWorker.IsActive {
s.l.Debug().Msgf("worker %s active status poller: setting worker to active", request.WorkerId)
_, err := s.repo.Worker().UpdateWorkerActiveStatus(ctx, tenantId, request.WorkerId, true, sessionEstablished)
Copy link

Copilot AI Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polling goroutine reuses the original sessionEstablished timestamp, which can be older than later updates and reintroduce the race condition. Use the current time (e.g., time.Now()) for each update to ensure the timestamp always increases.

Suggested change
_, err := s.repo.Worker().UpdateWorkerActiveStatus(ctx, tenantId, request.WorkerId, true, sessionEstablished)
_, err := s.repo.Worker().UpdateWorkerActiveStatus(ctx, tenantId, request.WorkerId, true, time.Now())

Copilot uses AI. Check for mistakes.
// This polling goroutine ensures that as long as ANY ListenV2 connection is active,
// the worker status remains active=true, preventing the race condition
go func() {
ticker := time.NewTicker(2 * time.Second) // 2x heartbeat interval
Copy link

Copilot AI Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Avoid hardcoding the polling interval. Extract 2 * time.Second into a named constant or configuration to ensure it stays in sync with your heartbeat settings.

Suggested change
ticker := time.NewTicker(2 * time.Second) // 2x heartbeat interval
ticker := time.NewTicker(pollingInterval) // 2x heartbeat interval

Copilot uses AI. Check for mistakes.
s.workers.DeleteForSession(request.WorkerId, sessionId)
}()

// HOTFIX: Start a background goroutine to poll and ensure worker stays active
Copy link

Copilot AI Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This large inline comment could be moved to a dedicated helper or summarized more concisely to improve readability. Consider extracting the poller into its own function with a clear docstring.

Copilot uses AI. Check for mistakes.
abelanger5
abelanger5 previously approved these changes Jun 12, 2025
// This addresses the race condition where multiple listeners can interfere with each other's active status
//
// Race condition scenario:
// 1. Worker A establishes ListenV2 connection, sets worker active=true with timestamp T1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would update this comment to indicate that it is the same worker that's establishing the connection, this is incorrect as written

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants