-
-
Notifications
You must be signed in to change notification settings - Fork 23
Dataset cache purge cronjob #1211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #1211 +/- ##
===========================================
+ Coverage 85.93% 86.04% +0.11%
===========================================
Files 94 95 +1
Lines 3235 3261 +26
Branches 371 373 +2
===========================================
+ Hits 2780 2806 +26
Misses 380 380
Partials 75 75 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a dataset cache purge cronjob that automatically marks datasets as stale after they've exceeded a configurable age threshold (default 24 hours). The implementation adds a new internal API endpoint for dataset lifecycle management and a Kubernetes CronJob to periodically trigger cache cleanup.
Key changes:
- Added new internal API endpoint
/servicex/internal/dataset-lifecyclefor dataset cache purging - Introduced Kubernetes CronJob to automatically purge old cached datasets every hour
- Refactored dataset deletion logic to enable reuse between manual and automated operations
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| servicex_app/routes.py | Registers the new DatasetLifecycleOps resource endpoint |
| servicex_app/resources/internal/dataset_lifecycle_ops.py | New endpoint implementation that purges datasets older than specified age |
| servicex_app/resources/datasets/get_all.py | Extracts dataset fetching logic into reusable function |
| servicex_app/resources/datasets/delete_dataset.py | Extracts deletion logic into reusable function |
| helm/servicex/values.yaml | Adds configuration parameters for dataset lifecycle cronjob |
| helm/servicex/templates/dataset-lifecycle/cronjob.yaml | Defines Kubernetes CronJob for periodic dataset cache cleanup |
| docs/deployment/reference.md | Documents new configuration parameters |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| env: | ||
| - name: LIFETIME | ||
| value: {{ .Values.datasetLifecycle.cacheLifetime }} | ||
| command: |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command should be formatted as an array with shell invocation, similar to the data-lifecycle cronjob. Change to use '/bin/sh' with '-c' flag for better error handling and consistency: command: ['/bin/sh', '-c', 'curl --request POST \"http://{{ .Release.Name }}-servicex-app:8000/servicex/internal/dataset-lifecycle?age=$LIFETIME\"']
| command: | |
| command: | |
| - /bin/sh | |
| - -c |
servicex_app/servicex_app/resources/internal/dataset_lifecycle_ops.py
Outdated
Show resolved
Hide resolved
| - name: LIFETIME | ||
| value: {{ .Values.datasetLifecycle.cacheLifetime }} | ||
| command: | ||
| - curl --request POST "http://{{ .Release.Name }}-servicex-app:8000/servicex/internal/dataset-lifecycle?age=$LIFETIME" |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The curl command lacks error handling flags. Add '-f' (fail on HTTP errors) and '--silent' or '--show-error' for better error reporting: curl -f --silent --show-error --request POST ... This ensures the job fails appropriately when the API returns an error.
| - curl --request POST "http://{{ .Release.Name }}-servicex-app:8000/servicex/internal/dataset-lifecycle?age=$LIFETIME" | |
| - curl -f --silent --show-error --request POST "http://{{ .Release.Name }}-servicex-app:8000/servicex/internal/dataset-lifecycle?age=$LIFETIME" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
servicex_app/servicex_app/resources/internal/dataset_lifecycle_ops.py
Outdated
Show resolved
Hide resolved
servicex_app/servicex_app/resources/internal/dataset_lifecycle_ops.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
for more information, see https://pre-commit.ci
Co-authored-by: ponyisi <4177101+ponyisi@users.noreply.github.com>
Address #1167 , except the cache lifetime is reduced to one day.