Skip to content

fix: enable Gunicorn worker recycling with graceful analytics flush#6762

Closed
gagantrivedi wants to merge 1 commit intomainfrom
fix/gunicorn-max-requests-graceful-flush
Closed

fix: enable Gunicorn worker recycling with graceful analytics flush#6762
gagantrivedi wants to merge 1 commit intomainfrom
fix/gunicorn-max-requests-graceful-flush

Conversation

@gagantrivedi
Copy link
Member

@gagantrivedi gagantrivedi commented Feb 24, 2026

Contributes to https://github.com/Flagsmith/pulumi/issues/162


Summary

  • Enable Gunicorn --max-requests (default 1000) and --max-requests-jitter (default 100) to recycle workers periodically, mitigating memory leaks
  • Add atexit handler to flush in-process analytics caches (APIUsageCache, FeatureEvaluationCache) via the task processor before a worker exits, preventing data loss during recycling
  • Rename internal flush methods for clarity: _flush_through_thread (hot path) vs _flush_through_task_processor (shutdown path)

Context

Without --max-requests, Gunicorn workers never recycle and memory leaks accumulate indefinitely. Enabling worker recycling requires flushing any buffered analytics data before exit, otherwise counts are silently lost. The shutdown flush uses .delay() (task queue enqueue) rather than .run_in_thread() to avoid thread-safety issues during Python interpreter teardown.

Both GUNICORN_MAX_REQUESTS and GUNICORN_MAX_REQUESTS_JITTER remain configurable via environment variables. Setting GUNICORN_MAX_REQUESTS=0 disables recycling entirely.

Test plan

  • Unit tests for flush_on_shutdown on both cache classes (populated and empty)
  • Unit tests for flush_analytics_caches atexit handler (happy path and exception handling)
  • Unit test for AppAnalyticsConfig.ready() atexit registration
  • Manual verification: ran Gunicorn with --max-requests 3, confirmed atexit handler fires and logs flush on worker recycle

…lush

Enable --max-requests (default 1000) and --max-requests-jitter (default
100) for Gunicorn workers to mitigate memory leaks. Add atexit handler
to flush in-process analytics caches via the task processor before a
worker exits, preventing data loss during recycling.
@gagantrivedi gagantrivedi requested a review from a team as a code owner February 24, 2026 08:01
@gagantrivedi gagantrivedi requested review from Zaimwa9 and removed request for a team February 24, 2026 08:01
@vercel
Copy link

vercel bot commented Feb 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments
Project Deployment Actions Updated (UTC)
docs Ignored Ignored Feb 24, 2026 8:01am
flagsmith-frontend-preview Ignored Ignored Feb 24, 2026 8:01am
flagsmith-frontend-staging Ignored Ignored Feb 24, 2026 8:01am

Request Review

@github-actions github-actions bot added api Issue related to the REST API fix labels Feb 24, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 24, 2026

Docker builds report

Image Build Status Security report
ghcr.io/flagsmith/flagsmith-e2e:pr-6762 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-api-test:pr-6762 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-frontend:pr-6762 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-api:pr-6762 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-private-cloud:pr-6762 Finished ✅ Results

@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.26%. Comparing base (deee405) to head (30c0df1).
⚠️ Report is 32 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6762   +/-   ##
=======================================
  Coverage   98.25%   98.26%           
=======================================
  Files        1312     1313    +1     
  Lines       48568    48642   +74     
=======================================
+ Hits        47722    47796   +74     
  Misses        846      846           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

name = "app_analytics"

def ready(self) -> None:
atexit.register(flush_analytics_caches)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is gunicorn's worker_exit hook more suitable for this?

Copy link
Member Author

@gagantrivedi gagantrivedi Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much difference between them for our use case. Can you elaborate a bit more on your reasoning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it'll better document our intent as we do not need to force-flush outside of worker context, and lower the mental fatigue of mapping out the worker lifecycle as we already manage it here.

On a technical level, worker_exit is a stronger guarantee that the code will run when we need it to run (i.e. when a worker is marked for recycling).

@Zaimwa9 Zaimwa9 removed their request for review February 25, 2026 08:10
@Zaimwa9 Zaimwa9 assigned khvn26 and unassigned Zaimwa9 Feb 25, 2026
Comment on lines +43 to +54
for key, value in self._cache.items():
track_request.delay(
kwargs={
"resource": key.resource.value,
"host": key.host,
"environment_key": key.environment_key,
"count": value,
"labels": dict(key.labels),
}
)
self._cache = {}
self._last_flushed_at = timezone.now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we defer the iteration to the task processor as by just sending over self._cache itself as json for example? I don't know how big this cache could be, but I don't love the idea here of creating an indefinite number of tasks on a regular basis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good shout! I will add a task for bulk tracking

@gagantrivedi
Copy link
Member Author

After further thought, --max-requests would just mask underlying issues rather than address them properly. We should be more proactive about identifying and resolving the root causes. We've tracked the issues we've found so far in https://github.com/Flagsmith/pulumi/issues/162 and will work through those. Closing this in favour of that approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Issue related to the REST API fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants