Reorganize monitoring documentation #617 #660

panta-123 · 2025-11-11T15:03:49Z

Add figures to illustrate monitoring setup
Add configuration examples for common monitoring tools
Have each section for different monitoring tool.
add note that dashboard might be outdated.
remove some dev stuff from it as this is operator docs not development.
Listing metrics is hard to keep track and risk of being outdated. Code search link is added.

- Add figures to illustrate monitoring setup - Add configuration examples for common monitoring tools - Have each section for different monitoring tool. - add note that dashboard might be outdated.

voetberg · 2025-11-11T16:50:49Z

docs/operator/monitoring.md

+     metrics_port = 8080
+     ```
+The used metrics can be found in following links (code search)
+- [Counter](https://github.com/search?q=repo%3Arucio%2Frucio+Metrics.Counter&type=code)


I like how elegant this solution is but we should probably figure out a way to describe what these are actually monitoring. It's useful to have the list of names like this though

@voetberg ,

I can't see any other way than listing the name manually if we want to add descriptions.
Do you want me to add back the list ?

Sorry, I was mostly musing out loud here, I think this is a good solution but we should probably include auto-docs or something for core/metrics.

voetberg · 2025-11-11T16:51:39Z

docs/operator/monitoring.md

+- [Gauge](https://github.com/search?q=repo%3Arucio%2Frucio+Metrics.gauge&type=code)
+- [Timer](https://github.com/search?q=repo%3Arucio%2Frucio+Metrics.timer&type=code)
+
+[Grafana Dashboard JSON](https://github.com/rucio/rucio/blob/master/tools/monitoring/visualization/rucio-internal.json) for Graphite is given here. 


Can we include an example screenshot of what these dashboards end up looking like? Just so people know what they're getting.

Yes. People might not be using use these example as it is. I can add screenshot of what we have for our experiment which is a little different than these, but give some idea on dashboard.

voetberg · 2025-11-11T17:12:36Z

docs/operator/monitoring.md

+     user_scope = rucio
+     ```
+
+2. Prometheus


I think it is also worth noting we have the whole (probes repo)[https://github.com/rucio/probes] which can use Prometheus.

Unsure how much this guide should talk about setting them up but it's useful to note.

@voetberg , Gave a shot in recent commit for Probes docs. Please have a look.

voetberg · 2025-11-11T17:18:24Z

These are overall really good changes! I think it is worth it to talk about how these can be actually set up (e.g., what sort of pods should be running, what sort of infra people need to run these monitoring things, the exact executables to run a hermes daemon, etc). I am unsure if this is outside the scope of this PR though (mostly just want to call this out as something that's missing)

panta-123 · 2025-11-11T20:33:42Z

These are overall really good changes! I think it is worth it to talk about how these can be actually set up (e.g., what sort of pods should be running, what sort of infra people need to run these monitoring things, the exact executables to run a hermes daemon, etc). I am unsure if this is outside the scope of this PR though (mostly just want to call this out as something that's missing)

I think we can add hermes daemon required and point to daemon deployment doc.
But any other infrastructure deployment strategies should not be mentioned, as they are outside the rucio docs scopes.
The available inetgration to infrastructure is already listed in doc and related config choices is mentionedin this PR.

Reorganize monitoring documentation rucio#617

03c405a

- Add figures to illustrate monitoring setup - Add configuration examples for common monitoring tools - Have each section for different monitoring tool. - add note that dashboard might be outdated.

panta-123 requested review from bari12 and voetberg November 11, 2025 15:03

voetberg reviewed Nov 11, 2025

View reviewed changes

panta-123 added 3 commits November 12, 2025 09:29

some language fixes rucio#617

2045fe4

Add Probes and its related info rucio#617

5dfdbd8

minor edits on Probes mermaid rucio#617

0ffa416

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reorganize monitoring documentation #617 #660

Reorganize monitoring documentation #617 #660

Uh oh!

panta-123 commented Nov 11, 2025

Uh oh!

voetberg Nov 11, 2025

Uh oh!

panta-123 Nov 11, 2025

Uh oh!

voetberg Nov 11, 2025

Uh oh!

voetberg Nov 11, 2025

Uh oh!

panta-123 Nov 11, 2025 •

edited

Loading

Uh oh!

voetberg Nov 11, 2025

Uh oh!

panta-123 Nov 13, 2025

Uh oh!

voetberg commented Nov 11, 2025

Uh oh!

panta-123 commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Reorganize monitoring documentation #617 #660

Are you sure you want to change the base?

Reorganize monitoring documentation #617 #660

Uh oh!

Conversation

panta-123 commented Nov 11, 2025

Uh oh!

voetberg Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

panta-123 Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

voetberg Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

voetberg Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

panta-123 Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

voetberg Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

panta-123 Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

voetberg commented Nov 11, 2025

Uh oh!

panta-123 commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

panta-123 Nov 11, 2025 •

edited

Loading