-
Notifications
You must be signed in to change notification settings - Fork 122
Feature: Prometheus metrics for rustic forget and rustic forget --prune #1713
Description
I just learned that metrics are only emitted for rustic backup but not for rustic forget #1505 (reply in thread)
Summary
rustic backup emits a rich set of Prometheus metrics (timing, data volume, blob counts, file/dir change stats). rustic forget and rustic forget --prune currently emit no metrics, creating an observability blind spot: you cannot alert on forgotten snapshots, track storage reclamation, or measure prune operation duration.
This issue proposes a set of metrics for both subcommands, modeled on the existing rustic_backup_* namespace.
Existing backup metrics — mapping to forget
✅ Direct equivalents (same concept, renamed)
| Backup metric | Proposed forget equivalent | Notes |
|---|---|---|
rustic_backup_time |
rustic_forget_time |
Unix timestamp when operation started |
rustic_backup_backup_start |
rustic_forget_forget_start |
Unix timestamp of forget phase start |
rustic_backup_backup_end |
rustic_forget_forget_end |
Unix timestamp of forget phase end |
rustic_backup_backup_duration |
rustic_forget_forget_duration |
Duration of forget phase (seconds) |
rustic_backup_total_duration |
rustic_forget_total_duration |
Total duration incl. prune phase (seconds) |
For --prune, add a separate timing pair for the prune phase:
| Proposed metric | Notes |
|---|---|
rustic_forget_prune_start |
Unix timestamp of prune phase start |
rustic_forget_prune_end |
Unix timestamp of prune phase end |
rustic_forget_prune_duration |
Duration of prune phase only (seconds) |
↔️ Opposite-direction equivalents (prune only)
Backup adds data; prune removes it. These mirror the data_added family but in reverse. Only emitted when --prune is active.
| Backup metric | Proposed forget/prune equivalent | Notes |
|---|---|---|
rustic_backup_data_added |
rustic_forget_data_removed |
Total bytes freed (uncompressed) |
rustic_backup_data_added_files |
rustic_forget_data_removed_files |
File bytes freed (uncompressed) |
rustic_backup_data_added_trees |
rustic_forget_data_removed_trees |
Tree/dir bytes freed (uncompressed) |
rustic_backup_data_added_packed |
rustic_forget_data_removed_packed |
Storage bytes freed (after compression) |
rustic_backup_data_added_files_packed |
rustic_forget_data_removed_files_packed |
File storage freed (compressed) |
rustic_backup_data_added_trees_packed |
rustic_forget_data_removed_trees_packed |
Tree storage freed (compressed) |
rustic_backup_data_blobs |
rustic_forget_data_blobs_removed |
Count of data blobs deleted |
rustic_backup_tree_blobs |
rustic_forget_tree_blobs_removed |
Count of tree blobs deleted |
❌ Not applicable for forget
These backup metrics relate to filesystem scanning, which forget does not perform — no equivalents needed:
rustic_backup_files_new/_changed/_unmodifiedrustic_backup_dirs_new/_changed/_unmodifiedrustic_backup_total_files_processedrustic_backup_total_dirs_processedrustic_backup_total_bytes_processedrustic_backup_total_dirsize_processed
New forget-specific metrics
Snapshot counts (core purpose of forget)
| Metric | Notes |
|---|---|
rustic_forget_snapshots_total |
Total snapshots in repo before forget |
rustic_forget_snapshots_removed |
Snapshots removed this run |
rustic_forget_snapshots_kept |
Snapshots kept this run |
Retention policy breakdown
Knowing why snapshots were kept helps validate policy configuration. Either as separate metrics or a single metric with a reason label:
rustic_forget_snapshots_kept{reason="last"}
rustic_forget_snapshots_kept{reason="hourly"}
rustic_forget_snapshots_kept{reason="daily"}
rustic_forget_snapshots_kept{reason="weekly"}
rustic_forget_snapshots_kept{reason="monthly"}
rustic_forget_snapshots_kept{reason="yearly"}
rustic_forget_snapshots_kept{reason="tag"}
rustic_forget_snapshots_kept{reason="within"}
A single labeled metric is preferable to 8 separate metrics — easier to query with sum by (reason) and fewer time series.
Pack file metrics (prune only)
| Metric | Notes |
|---|---|
rustic_forget_packs_removed |
Pack files fully deleted (all blobs unused) |
rustic_forget_packs_rewritten |
Pack files rewritten (partial blob removal) |
rustic_forget_packs_kept |
Pack files left untouched |
Full proposed metric list
rustic forget (without --prune)
rustic_forget_time
rustic_forget_forget_start
rustic_forget_forget_end
rustic_forget_forget_duration
rustic_forget_total_duration
rustic_forget_snapshots_total
rustic_forget_snapshots_removed
rustic_forget_snapshots_kept{reason="..."}
Additional metrics for rustic forget --prune
rustic_forget_prune_start
rustic_forget_prune_end
rustic_forget_prune_duration
rustic_forget_data_removed
rustic_forget_data_removed_files
rustic_forget_data_removed_trees
rustic_forget_data_removed_packed
rustic_forget_data_removed_files_packed
rustic_forget_data_removed_trees_packed
rustic_forget_data_blobs_removed
rustic_forget_tree_blobs_removed
rustic_forget_packs_removed
rustic_forget_packs_rewritten
rustic_forget_packs_kept
Alerting use cases enabled
With these metrics, the following Prometheus alerts become possible:
- Forget not running:
time() - rustic_forget_time > 86400 - Prune taking too long:
rustic_forget_prune_duration > 3600 - Snapshot count growing unbounded:
rustic_forget_snapshots_total > threshold - Policy misconfiguration (no snapshots being removed):
rustic_forget_snapshots_removed == 0sustained over time - Storage not being freed despite removals:
rustic_forget_data_removed_packed == 0whenrustic_forget_snapshots_removed > 0