Skip to content

Comments

Sync dev to main#150

Closed
tis24dev wants to merge 24 commits intomainfrom
dev
Closed

Sync dev to main#150
tis24dev wants to merge 24 commits intomainfrom
dev

Conversation

@tis24dev
Copy link
Owner

  • Enhance prefiltering and PBS staged restore
  • Add PBS notifications backup and API restore
  • Make PBS restore behavior interactive (UI-driven)
  • Add ctx cancellation and refactor restore code
  • Refactor restore functions, tighten error checks
  • Add prefilter-manual CLI and test adjustments
  • Add context cancellation support, refactor switches
  • Refactor function signatures & simplify logging
  • Improve PBS list parsing and notification cleanup

Refine backup prefiltering: track detailed stats, skip symlinks, and avoid normalizing known structured config paths (etc/proxmox-backup, etc/pve, etc/ssh, etc/pam.d, etc/systemd/system). Normalize helpers now return a (changed,bool) flag; config normalization is limited to safe text normalization (no sorting). Tests updated and a new structured-config test added. In orchestrator, add recovery for malformed/flattened proxmox-backup datastore.cfg by detecting duplicate keys and attempting to restore from pbs_datastore_inventory.json (including a lightweight inventory parser and duplicate-key detector). Update tests to cover recovery behavior.
Add support for backing up PBS notifications and for applying PBS config via the proxmox-backup-manager API.

Key changes:
- Collector: add BackupPBSNotifications and BackupPBSNotificationsPriv flags, include notifications.cfg and notifications-priv.cfg in PBS collection logic, and write a notifications_summary.json report; update tests.
- New file: collector_pbs_notifications_summary.go implements JSON summaries for notification targets, matchers and endpoints.
- Config: add RESTORE_PBS_APPLY_MODE and RESTORE_PBS_STRICT settings with parsing and template defaults; add PBS notifications config flags to parsing and defaults.
- Orchestrator: add pbs_api_apply.go implementing API-based restore/apply of PBS categories (with strict 1:1 reconciliation and an auto fallback to file-based apply) and wire collector overrides for notifications.
- Categories: update PBS category definitions to include command outputs and notification-related files.

These changes enable selective backup of PBS notification data and provide an API-first restore path (with configurable behavior and strict reconciliation option).
Move PBS restore reconciliation out of backup.env and into an interactive choice during restores. Introduces a PBSRestoreBehavior enum (Merge vs Clean 1:1) and threads it through RestorePlan; removes RESTORE_PBS_APPLY_MODE/RESTORE_PBS_STRICT config parsing and env template entries. Staged PBS apply logic now uses the selected behavior: API-based applies are preferred and file-based fallbacks are only allowed in Clean mode, while Merge mode skips destructive/API-unavailable actions. Updated CLI and TUI to prompt for the behavior, adjusted notification/apply helpers, tests, and documentation to reflect the interactive selection and new default apply semantics.
Collector: simplify Flush error assignment and add context cancellation checks (ctx.Err()) in aggregateBackupHistory and aggregateReplicationStatus, including periodic checks inside loops to allow early cancellation.

Restore/Access Control: ensure ctx is non-nil and check ctx.Err() early in applyPBSAccessControlFromStage and applyPVEAccessControlFromStage. Remove unused logger parameter from applyPBSACLSectionFormat and applyPBSACLLineFormat (they no longer accept a logger).

Restore/SDN: reserve ctx (_ = ctx) for future use and tighten error handling when stat'ing staged SDN (remove redundant nil check in the else-if condition).

Restore/TUI: simplify NIC repair branch by removing a redundant nil check on plan.Mapping and delete the unused promptOkTUI helper.
Simplify and tighten various restore-related internals:

- Consolidate service timeout variable formatting.
- Replace redundant "err != nil && !errors.Is(...)" checks with a single "!errors.Is(err, os.ErrNotExist)" in fs_atomic.go and restore_firewall.go.
- Improve confirmRestoreAction destination handling: clean and display the actual restore destination (root vs a subdirectory) and clarify warnings.
- Remove unused parameters: detectNodeForVM no longer accepts vmEntry and restartPVEFirewallService/extractHardlink no longer accept a logger; update callers accordingly.
- Adjust extractHardlink signature and callers/tests to match; update tests to remove logger construction and assert behavior against the new signatures.
- Minor test updates and formatting tweaks.

These changes reduce unused parameters, improve user-facing messaging during restore, and simplify error checks. Tests were updated to reflect the new function signatures.
Introduce a new cmd/prefilter-manual command to run the backup prefilter manually with configurable root, max-size and log-level flags. Tighten and clarify unit tests: adjust assertion message in collector_collectall_test.go, reformat table-driven cases in orchestrator/decrypt_test.go, and add a lint ignore for an intentional nil-context test in copyRawArtifactsToWorkdir.
Add context handling and cancellation checks across orchestrator operations: createBundle and decrypt TUI now accept/guard nil contexts and check ctx.Err(), io.Copy uses a context-aware reader (contextReader) to allow cancellation. Update tests to use context.TODO and rename one test accordingly. Refactor multiple if/else branches to switch statements for systemType in categories, compatibility and restore UI code, and adjust logging to include step index in logStep. Miscellaneous: reformat category literals, simplify an error check in restore_ha, adjust bundle test to use a switch on header names, and remove an unused renderEnvValue helper from config/upgrade.go.
Remove logger coupling from several APIs and tidy up related logic.

- Remove logger parameter from decryptArchiveWithSecretPrompt, buildNetworkPlanReport, targetNetworkEndpointFromConfig, applyPBSNodeCfgViaAPI, and validatePBSDatastoreReadOnly and update all callers.
- Drop unused defaultNetworkHealthOptions and remove applyPBSHostConfigsFromStage (cleanup of dead/unused code).
- Make guardMountPoint resilient to nil/expired contexts by ensuring ctx is non-nil and returning ctx.Err() early.
- Adjust NewWithDeps to avoid copying the full Config into base (only set DryRun) to reduce unintended coupling.
- Replace if/else with switch in RecreateDirectoriesFromConfig and ensure PBS datastore readonly warnings check logger != nil before logging.

These changes simplify APIs by decoupling logging responsibilities, fix a nil-context issue, and remove some unused helpers to make call sites clearer.
parsePBSListIDs: sanitize and validate candidate keys, error when none provided, and make row parsing stricter and more informative. The function now trims empty keys up-front, iterates using the cleaned key list, ensures values are non-empty strings, and returns a descriptive error (including row index and available keys) instead of silently skipping unparseable rows.

applyPBSNotificationsViaAPI: move matcher cleanup earlier in strict mode so matchers are removed before endpoints (preventing endpoint cleanup from being blocked by references). The matcher handling was consolidated (deduplicated) and matcher names are sorted for deterministic processing.
Copilot AI review requested due to automatic review settings February 12, 2026 01:59
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR synchronizes the development branch to main, bringing significant enhancements to PBS restore functionality and prefiltering safety. The changes introduce an interactive PBS restore behavior selection (Merge vs Clean 1:1), implement API-driven PBS staged restore with fallback to file-based restore, add PBS notifications backup and restore, improve prefilter safety for structured configs, add context cancellation support throughout, and refactor if-else chains to switch statements for better code clarity.

Changes:

  • Add interactive PBS restore behavior selection (Merge for existing PBS vs Clean 1:1 for fresh install)
  • Implement PBS API-driven staged restore using proxmox-backup-manager with file-based fallback
  • Add PBS notifications backup collection and API restore with credential protection
  • Improve prefilter safety by skipping structured config paths (etc/pve/, etc/proxmox-backup/, etc/ssh/, etc/pam.d/, etc/systemd/system/)
  • Add context cancellation support in bundle creation, backup aggregation, and decrypt operations
  • Refactor control flow to use switch statements instead of if-else chains for system type comparisons
  • Simplify error handling patterns (remove redundant err != nil checks before errors.Is)
  • Add PBS datastore.cfg recovery mechanism from inventory snapshots
  • Add manual prefilter CLI tool for testing
  • Update documentation for PBS restore behavior, prefilter safety, and API coverage

Reviewed changes

Copilot reviewed 57 out of 57 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pbs_restore_behavior.go New type defining PBS restore behavior (Merge vs Clean)
pbs_api_apply.go Core PBS API apply utilities with credential redaction
pbs_notifications_api_apply.go PBS notifications API restore implementation
pbs_notifications_api_apply_test.go Tests for PBS notifications API apply
pbs_staged_apply.go Refactored PBS staged apply with API/file fallback logic
pbs_staged_apply_test.go Test for datastore.cfg recovery from inventory
collector_pbs_notifications_summary.go PBS notifications snapshot summary generation
optimizations.go Improved prefilter with structured config path exclusions
optimizations_structured_test.go Tests verifying prefilter skips structured configs
optimizations_helpers_test.go Tests for prefilter normalization functions
workflow_ui*.go Add SelectPBSRestoreBehavior to UI interfaces
restore_workflow_ui.go Interactive PBS behavior selection integration
restore.go Remove unused logger parameter, simplify function signatures
restore_notifications.go PBS notifications API restore with Merge/Clean logic
restore_access_control.go Add context validation, remove unused logger parameters
restore_sdn.go, restore_ha.go, restore_firewall.go Simplify error handling patterns, refactor to switch statements
restore_plan.go Add PBSRestoreBehavior field to RestorePlan
orchestrator.go Add context cancellation to bundle creation, remove duplicate config assignment
network_*.go Remove unused logger parameters
decrypt*.go Add context cancellation checks, remove unused logger parameters
config.go, backup.env Add BackupPBSNotifications and BackupPBSNotificationsPriv settings
collector_pbs.go Collect PBS notifications configs and write summary
categories.go Add PBS notification inventory paths, refactor to switch statements
deps.go Remove duplicate base.Config assignment
cmd/prefilter-manual/main.go New manual prefilter CLI tool for testing
Documentation files Extensive updates for PBS restore behavior, API coverage, prefilter safety
Test files Fix test names, update expectations, add context support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Implement smart file chunking and reassembly with metadata and tests. Introduces chunkedFileMetadata, splitFile returning chunk/SHA256 info, robust writeChunk logic, and marker (.chunked) handling in chunkLargeFiles; adds ReassembleChunkedFiles, discoverChunks (numeric sort), concatenateChunks, validation and metadata application (chmod/chown/mtime). Enhance normalization/minification helpers (safe config normalization, json.Compact-based minify) and add extensive tests covering chunking, reassembly, JSON/minify behavior and config normalization. Update orchestrator to map chunk artifacts back to original paths (originalPathFromChunk), improve path matching for selective restores, and trigger reassembly after extract; also fix a decrypt test script argument.
Add a TestMain in internal/orchestrator to detect and fail if tests leave an artifact named "--progress" in the package directory, and to remove any stale artifact before running tests. Also harden the rclone-copy stub in internal/orchestrator/decrypt_test.go to treat empty or flag-like targets (starting with "--") as invalid destinations (prints error and exits 2), preventing tests from accidentally writing to a flag string.
Makefile: improve VERSION derivation to produce stable dev-style version strings (include tag/dev count/sha and .dirty handling) for both build and build-release targets.

internal/orchestrator/pbs_staged_apply.go: capture PBS API availability error, surface a descriptive error when merge mode skips API-applied PBS categories, and adjust logging for the merge-skip case. The function now returns an error if API-applied categories were skipped.

internal/orchestrator/restore_workflow_ui.go: consider logger.HasWarnings() when deciding final restore summary so logged warnings are reflected in the final message.

internal/orchestrator/restore_workflow_warnings_test.go: add a test (TestRunRestoreWorkflow_FinalSummaryReflectsLoggedWarnings) that asserts the final summary reports logged warnings; add required imports and test setup.
Ensure deselecting a category actually removes it from the selection map and update selection counting. ShowCategorySelectionMenuWithReader now uses len(selected) to determine how many categories are chosen and deletes map entries when a category is toggled off, instead of leaving false-valued keys. Added a test to exercise toggling a category off then on and to verify continue behavior when no categories are selected. Affected files: internal/orchestrator/selective.go, internal/orchestrator/additional_helpers_test.go.
Add secure staging directory lifecycle for restores: createRestoreStageDir() creates /tmp/proxsave/restore-stage-* with 0700 perms and a pid+sequence suffix; stageDestRoot() now includes pid. Implement cleanupOldRestoreStageDirs() to remove aged staging dirs and wire it into orchestrator startup and UI paths. Add preserve behavior via PROXSAVE_PRESERVE_RESTORE_STAGING (and preserve on warnings/network staged installs), auto-remove staging dir after successful clean restores, and log actions. Update docs to document 0700 perms and auto-clean behavior, and add tests covering stage dir creation and old-dir cleanup.
Introduce a single pveClusterServices list and use it for starting/stopping PVE services. stopPVEClusterServices now iterates the list in reverse so services are stopped in the correct order (pvestatd → pveproxy → pvedaemon → pve-cluster), and startPVEClusterServices uses the forward order. Update tests to assert stop order rather than just presence, and update restore documentation/diagrams/guides to reflect the new service stop sequence and formatting fixes.
Introduce a dedicated API-backed staged apply path for PBS: maybeApplyPBSConfigsViaAPIFromStage. File-based PBS configs are applied while PBS services remain stopped; API-backed categories (node, datastores, remotes, jobs, notifications) are applied in a final API phase which may temporarily start services and will attempt to stop them again after completion. Added fallback handling for merge/clean behaviors, improved error logging, and consolidated file-fallbacks. Update restore workflow to invoke the final PBS API staged apply and move PBS notifications to be applied as part of that phase. Add unit test (pbs_service_restart_order_test.go) to ensure services are not restarted during file-based staged apply and are started during the API phase. Also update docs to describe the new staged-API ordering.
Change AnalyzeBackupCategories to accept a context and scan archives streaming (O(1) memory), allowing cancellation and robust error handling. Avoid double-closing underlying files from decompression readers and detect category availability on-the-fly. Expose analyzeBackupCategoriesFunc to allow injection and fall back to a safe full restore with a user-facing message when analysis fails. Update collectArchivePaths to return errors and update tests: add a truncated-tar read error test and a workflow test that verifies fallback behavior; adjust existing tests to the new API and behavior. Misc: add required imports and minor housekeeping.
Refine compatibility detection and messages, enhance staging directory creation, and ensure firewall rollback cleans up scripts.

- ValidateCompatibility: clearer error/warning messages when current system or backup type cannot be detected; include backup/system type info where available and adjust tests.
- Tests: add tests for unknown backup type detection, firewall rollback script removal, and staging directory uniqueness; update restore manifest tests to include ProxmoxType.
- FakeFS: allow onDisk to accept already-mapped paths so helpers returning OS paths work with the fake FS.
- Staging: remove atomic sequence variable and switch to using MkdirTemp with a timestamp/pid pattern to create unique staging dirs; update tests accordingly.
- Firewall rollback: remove rollback script file when disarming and add corresponding test; log failures without failing the flow.

These changes improve robustness of restore operations and make tests more reliable.
Tighten file permissions and ensure rollback logs are created. Use OpenFile to create the backup archive with 0600, change various WriteFile calls to create files with 0600 (markers, scripts, location file) instead of more permissive modes, and create empty rollback log files before writing markers/scripts. Also remove redundant shebang lines from generated rollback scripts. Affected files: internal/orchestrator/backup_safety.go, network_apply.go, restore_access_control_ui.go, restore_firewall.go, restore_ha.go. These changes improve security of stored artifacts and rollback handling.
Detect unprivileged/container user-namespace contexts and treat known privilege-sensitive command failures as informational SKIPs instead of warnings. Adds collector_privilege_sensitive.go implementing detection via /proc/self/{uid_map,gid_map} and heuristics for commands (dmidecode, blkid, sensors, smartctl), wires a new DetectUnprivilegedContainer dep, and updates Collector.safeCmdOutput and captureCommandOutput to log SKIP with helpful details/restore hints. Includes unit tests for detection and behavior, and documentation updates (CLI, restore guides, troubleshooting) explaining SKIP labels and impact on automated /etc/fstab remapping. Minor formatting/field alignment changes in CollectorConfig/defaults.
Introduce PVESH_TIMEOUT configuration and propagate it through config, templates, docs and the orchestrator; the collector now applies a per-call timeout to pvesh invocations. Extend PVE storage parsing to capture runtime fields (active, enabled, status) from pvesh output with tolerant parsing of bool/int/string forms, include runtime info in logging, and skip storages that appear unavailable to reduce hangs. Also add helpers for formatting runtime metadata.

Separately, enhance lock-file handling in checks: parse pid/host/time metadata, perform same-host PID liveness checks (with injectable killFunc) and remove stale locks when appropriate; update related tests to cover PID-not-running behavior and adjust stat-failure test. Update unit tests for PVE storage parsing to cover runtime fields.
Introduce FS_IO_TIMEOUT (config/ENV) and integrate a new internal/safefs helper to perform filesystem probes (stat, readdir, statfs) with a configurable timeout to avoid hangs on unreachable network mounts. Propagate the FS IO timeout through CollectorConfig and use it in datastore/storage sampling, PXAR metadata collection, directory/file sampling and report generation; timeouts are handled gracefully with warnings and skips. Also: fix pvesh command context cancellation handling, tighten RunCommand/LookPath deps signatures, update tests to accept context/timeout parameters, and update documentation (config/CLI/restore/troubleshooting) to describe the new option and adjust service stop ordering and other doc cleanups.
Remove the legacy PXAR sampling implementation and related privilege-sensitive helpers/tests, and introduce a bounded filesystem sampler.

Details:
- Added fs_sampling_bounded.go and its test to provide a new bounded sampling mechanism.
- Removed large legacy PXAR sampling code: sampleDirectories, sampleFiles, computePxarWorkerRoots and many helper functions/types (deterministic shuffling, root selector, hashing, uniquePaths, etc.).
- Cleaned up Collector and CollectorConfig by removing obsolete fields (PxarIntraConcurrency, PxarScanFanoutLevel, PxarScanMaxRoots, PxarStopOnCap, PxarEnumWorkers, PxarEnumBudgetMs) and related cache/mutex fields (rootsCache, rootsMu).
- Deleted privilege-sensitive detection and its tests (collector_privilege_sensitive.*).
- Updated tests and usages to stop referencing removed config fields and adjust debug logging messages.
- Minor fixes/formatting: adjusted ownership check bracing in runtime_helpers.go, small struct/tag spacing changes, and other tidy-ups.

This refactor simplifies the collector by replacing the complex fanout-based sampling and privileged heuristics with a bounded sampling approach and removes deprecated code paths.
Adjust documentation and template comments around PXAR scanning and locking behavior. Changes include:

- docs/BACKUP_ENV_MAPPING.md: update legacy mappings for PXAR-related env vars (PXAR_STOP_ON_CAP, BACKUP_SMALL_PXAR, MAX_PXAR_SIZE) and clarify PXAR metadata sampling semantics.
- docs/CONFIGURATION.md: remove several low-level PXAR tuning options from the public config example, simplify PXAR include pattern default, and add a note that include/exclude patterns are reused for PVE datastore sampling.
- docs/TROUBLESHOOTING.md: clarify the lock file format and behavior (pid/host/time and PID liveness checks to avoid stuck locks).
- internal/config/templates/backup.env: update PXAR file pattern comments (localized phrasing) and clarify defaults.

These edits clean up deprecated/legacy config surface, clarify runtime behavior, and harmonize documentation with the Go implementation.
@tis24dev tis24dev closed this Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant