-
Notifications
You must be signed in to change notification settings - Fork 60
Reorganize monitoring documentation #617 #660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
panta-123
commented
Nov 11, 2025
- Add figures to illustrate monitoring setup
- Add configuration examples for common monitoring tools
- Have each section for different monitoring tool.
- add note that dashboard might be outdated.
- remove some dev stuff from it as this is operator docs not development.
- Listing metrics is hard to keep track and risk of being outdated. Code search link is added.
- Add figures to illustrate monitoring setup - Add configuration examples for common monitoring tools - Have each section for different monitoring tool. - add note that dashboard might be outdated.
| metrics_port = 8080 | ||
| ``` | ||
| The used metrics can be found in following links (code search) | ||
| - [Counter](https://github.com/search?q=repo%3Arucio%2Frucio+Metrics.Counter&type=code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how elegant this solution is but we should probably figure out a way to describe what these are actually monitoring. It's useful to have the list of names like this though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see any other way than listing the name manually if we want to add descriptions.
Do you want me to add back the list ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was mostly musing out loud here, I think this is a good solution but we should probably include auto-docs or something for core/metrics.
| - [Gauge](https://github.com/search?q=repo%3Arucio%2Frucio+Metrics.gauge&type=code) | ||
| - [Timer](https://github.com/search?q=repo%3Arucio%2Frucio+Metrics.timer&type=code) | ||
| [Grafana Dashboard JSON](https://github.com/rucio/rucio/blob/master/tools/monitoring/visualization/rucio-internal.json) for Graphite is given here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include an example screenshot of what these dashboards end up looking like? Just so people know what they're getting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. People might not be using use these example as it is. I can add screenshot of what we have for our experiment which is a little different than these, but give some idea on dashboard.
| user_scope = rucio | ||
| ``` | ||
| 2. Prometheus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is also worth noting we have the whole (probes repo)[https://github.com/rucio/probes] which can use Prometheus.
Unsure how much this guide should talk about setting them up but it's useful to note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@voetberg , Gave a shot in recent commit for Probes docs. Please have a look.
|
These are overall really good changes! I think it is worth it to talk about how these can be actually set up (e.g., what sort of pods should be running, what sort of infra people need to run these monitoring things, the exact executables to run a hermes daemon, etc). I am unsure if this is outside the scope of this PR though (mostly just want to call this out as something that's missing) |
I think we can add hermes daemon required and point to daemon deployment doc. |