fix: enable Gunicorn worker recycling with graceful analytics flush#6762
fix: enable Gunicorn worker recycling with graceful analytics flush#6762gagantrivedi wants to merge 1 commit intomainfrom
Conversation
…lush Enable --max-requests (default 1000) and --max-requests-jitter (default 100) for Gunicorn workers to mitigate memory leaks. Add atexit handler to flush in-process analytics caches via the task processor before a worker exits, preventing data loss during recycling.
|
The latest updates on your projects. Learn more about Vercel for GitHub. 3 Skipped Deployments
|
Docker builds report
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6762 +/- ##
=======================================
Coverage 98.25% 98.26%
=======================================
Files 1312 1313 +1
Lines 48568 48642 +74
=======================================
+ Hits 47722 47796 +74
Misses 846 846 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| name = "app_analytics" | ||
|
|
||
| def ready(self) -> None: | ||
| atexit.register(flush_analytics_caches) |
There was a problem hiding this comment.
Is gunicorn's worker_exit hook more suitable for this?
There was a problem hiding this comment.
I don't see much difference between them for our use case. Can you elaborate a bit more on your reasoning?
There was a problem hiding this comment.
IMO it'll better document our intent as we do not need to force-flush outside of worker context, and lower the mental fatigue of mapping out the worker lifecycle as we already manage it here.
On a technical level, worker_exit is a stronger guarantee that the code will run when we need it to run (i.e. when a worker is marked for recycling).
| for key, value in self._cache.items(): | ||
| track_request.delay( | ||
| kwargs={ | ||
| "resource": key.resource.value, | ||
| "host": key.host, | ||
| "environment_key": key.environment_key, | ||
| "count": value, | ||
| "labels": dict(key.labels), | ||
| } | ||
| ) | ||
| self._cache = {} | ||
| self._last_flushed_at = timezone.now() |
There was a problem hiding this comment.
Couldn't we defer the iteration to the task processor as by just sending over self._cache itself as json for example? I don't know how big this cache could be, but I don't love the idea here of creating an indefinite number of tasks on a regular basis.
There was a problem hiding this comment.
Yeah, good shout! I will add a task for bulk tracking
|
After further thought, |
Contributes to https://github.com/Flagsmith/pulumi/issues/162
Summary
--max-requests(default 1000) and--max-requests-jitter(default 100) to recycle workers periodically, mitigating memory leaksatexithandler to flush in-process analytics caches (APIUsageCache,FeatureEvaluationCache) via the task processor before a worker exits, preventing data loss during recycling_flush_through_thread(hot path) vs_flush_through_task_processor(shutdown path)Context
Without
--max-requests, Gunicorn workers never recycle and memory leaks accumulate indefinitely. Enabling worker recycling requires flushing any buffered analytics data before exit, otherwise counts are silently lost. The shutdown flush uses.delay()(task queue enqueue) rather than.run_in_thread()to avoid thread-safety issues during Python interpreter teardown.Both
GUNICORN_MAX_REQUESTSandGUNICORN_MAX_REQUESTS_JITTERremain configurable via environment variables. SettingGUNICORN_MAX_REQUESTS=0disables recycling entirely.Test plan
flush_on_shutdownon both cache classes (populated and empty)flush_analytics_cachesatexit handler (happy path and exception handling)AppAnalyticsConfig.ready()atexit registration--max-requests 3, confirmed atexit handler fires and logs flush on worker recycle