Conversation
tis24dev
commented
Feb 10, 2026
- Clarify PVE ACL and silence domains warning
- Implement atomic writes and preserve ownership
- Update README.md
- tests: configure CheckerConfig with temp dirs
Clarify PVE access-control handling and avoid noisy warnings when domains.cfg is absent. Changes: - docs/CONFIGURATION.md, docs/RESTORE_GUIDE.md: expand BACKUP_PVE_ACL/docs to note users/roles/groups/ACL and that domains/realms are optional when not present. - internal/config/templates/backup.env: add explanatory comment for BACKUP_PVE_ACL. - internal/backup/collector_pve.go: introduce suppressNotFoundLog and mark domains.cfg as optional so it's not counted or warned about on default standalone PVE installs. - internal/orchestrator/.backup.lock: update PID and timestamp. Rationale: domains.cfg may not exist on default standalone installations; suppressing the warning and not marking it as missing reduces false-positive log noise and incorrect missing counts.
Add a centralized atomic write utility and directory metadata helpers (internal/orchestrator/fs_atomic.go) and unit tests (fs_atomic_test.go). Replace ad-hoc MkdirAll/WriteFile calls in staged-apply paths (network, PBS, PVE, firewall, HA, notifications) to use ensureDirExistsWithInheritedMeta and writeFileAtomic so staged restores are applied atomically and final ownership/permissions are enforced (avoiding umask issues). Update docs: README credits release testers and docs/RESTORE_GUIDE.md and TROUBLESHOOTING.md explain atomic/staged behavior and PBS permission recovery steps. Update internal backup lock metadata (.backup.lock). Also remove the previous duplicate writeFileAtomic in restore_access_control.go in favor of the new centralized implementation.
Remove stale .backup.lock and update orchestrator tests to create isolated temp directories for backup, logs and lock files. Tests now populate CheckerConfig (BackupPath, LogPath, LockDirPath, LockFilePath, MinDiskPrimaryGB, SafetyFactor, MaxLockAge), call Validate(), and register a cleanup to release the backup lock. This makes the tests self-contained and avoids reliance on a repository lock file, reducing flakiness.
There was a problem hiding this comment.
Pull request overview
Syncs development changes into main, primarily improving restore correctness/clarity for Proxmox (PVE/PBS) by making staged restore writes atomic while preserving expected permissions/ownership, and updating docs/tests accordingly.
Changes:
- Introduce
writeFileAtomic()+ensureDirExistsWithInheritedMeta()and migrate staged-apply paths to use them. - Reduce noise/clarify semantics around PVE access control + optional
domains.cfghandling. - Update tests and documentation to reflect new staged-apply behavior and fix pre-backup check test configuration.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/orchestrator/restore_notifications.go | Switch staged notification config writes to atomic replace. |
| internal/orchestrator/restore_ha.go | Use inherited-metadata directory creation for /etc/pve/ha; remove trailing whitespace. |
| internal/orchestrator/restore_firewall.go | Use inherited-metadata directory creation during exact directory sync. |
| internal/orchestrator/restore_access_control.go | Remove local writeFileAtomic in favor of shared implementation. |
| internal/orchestrator/pve_staged_apply.go | Switch vzdump config writes to atomic replace; minor struct alignment. |
| internal/orchestrator/pbs_staged_apply.go | Switch PBS config writes to atomic replace. |
| internal/orchestrator/orchestrator_test.go | Configure CheckerConfig with temp dirs + validation; cleanup backup lock. |
| internal/orchestrator/network_staged_apply.go | Use inherited-metadata directory creation and atomic file writes for staged network overlay. |
| internal/orchestrator/fs_atomic_test.go | Add tests around mode enforcement and ownership/group behavior for atomic writes. |
| internal/orchestrator/fs_atomic.go | Add shared atomic write + inherited directory metadata helpers. |
| internal/orchestrator/.backup.lock | Remove committed runtime lock artifact. |
| internal/config/templates/backup.env | Clarify BACKUP_PVE_ACL meaning in template comment. |
| internal/backup/collector_pve.go | Silence/won’t-count missing domains.cfg to avoid warnings on default standalone installs. |
| docs/TROUBLESHOOTING.md | Add troubleshooting section for PBS permission issues after restore. |
| docs/RESTORE_GUIDE.md | Document atomic staged writes and clarify domains.cfg/ACL semantics. |
| docs/CONFIGURATION.md | Clarify BACKUP_PVE_ACL meaning in docs. |
| README.md | Add release testing acknowledgements section; keep repo activity section. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/orchestrator/fs_atomic.go
Outdated
| if err := restoreFS.MkdirAll(dir, perm); err != nil { | ||
| return fmt.Errorf("mkdir %s: %w", dir, err) | ||
| } | ||
|
|
||
| f, err := restoreFS.Open(dir) | ||
| if err != nil { | ||
| return fmt.Errorf("open dir %s: %w", dir, err) | ||
| } | ||
| defer f.Close() | ||
|
|
||
| if os.Geteuid() == 0 && owner.ok { | ||
| if err := f.Chown(owner.uid, owner.gid); err != nil { | ||
| return fmt.Errorf("chown dir %s: %w", dir, err) | ||
| } | ||
| } | ||
| if err := f.Chmod(perm); err != nil { | ||
| return fmt.Errorf("chmod dir %s: %w", dir, err) | ||
| } |
There was a problem hiding this comment.
ensureDirExistsWithInheritedMeta uses MkdirAll(dir, perm) and then only fixes ownership/mode on the final directory. If dir contains multiple missing path components, intermediate directories created by MkdirAll will keep default ownership/mode (and umask effects), so metadata won’t be consistently inherited along the whole created path. Consider creating directories one segment at a time and applying inherited chown/chmod to each newly created segment (or walking back through created dirs).
| if err := restoreFS.MkdirAll(dir, perm); err != nil { | |
| return fmt.Errorf("mkdir %s: %w", dir, err) | |
| } | |
| f, err := restoreFS.Open(dir) | |
| if err != nil { | |
| return fmt.Errorf("open dir %s: %w", dir, err) | |
| } | |
| defer f.Close() | |
| if os.Geteuid() == 0 && owner.ok { | |
| if err := f.Chown(owner.uid, owner.gid); err != nil { | |
| return fmt.Errorf("chown dir %s: %w", dir, err) | |
| } | |
| } | |
| if err := f.Chmod(perm); err != nil { | |
| return fmt.Errorf("chmod dir %s: %w", dir, err) | |
| } | |
| // Determine the deepest existing ancestor of dir. | |
| existing := "" | |
| candidate := dir | |
| for { | |
| info, err := restoreFS.Stat(candidate) | |
| if err == nil && info != nil && info.IsDir() { | |
| existing = candidate | |
| break | |
| } | |
| if err != nil && !errors.Is(err, os.ErrNotExist) { | |
| return fmt.Errorf("stat %s: %w", candidate, err) | |
| } | |
| parent := filepath.Dir(candidate) | |
| if parent == candidate || parent == "." || parent == "" { | |
| break | |
| } | |
| candidate = parent | |
| } | |
| if existing == "" { | |
| // Fall back to current directory as the existing base if none was found. | |
| existing = "." | |
| } | |
| // Build the list of path components that need to be created, from the | |
| // nearest existing ancestor down to dir, in creation order. | |
| var toCreate []string | |
| cur := dir | |
| for { | |
| if cur == existing || cur == "" || cur == "." { | |
| break | |
| } | |
| toCreate = append([]string{cur}, toCreate...) | |
| parent := filepath.Dir(cur) | |
| if parent == cur { | |
| break | |
| } | |
| cur = parent | |
| } | |
| // Create each missing directory and apply inherited ownership and mode. | |
| for _, p := range toCreate { | |
| if err := restoreFS.MkdirAll(p, perm); err != nil { | |
| return fmt.Errorf("mkdir %s: %w", p, err) | |
| } | |
| f, err := restoreFS.Open(p) | |
| if err != nil { | |
| return fmt.Errorf("open dir %s: %w", p, err) | |
| } | |
| if os.Geteuid() == 0 && owner.ok { | |
| if err := f.Chown(owner.uid, owner.gid); err != nil { | |
| f.Close() | |
| return fmt.Errorf("chown dir %s: %w", p, err) | |
| } | |
| } | |
| if err := f.Chmod(perm); err != nil { | |
| f.Close() | |
| return fmt.Errorf("chmod dir %s: %w", p, err) | |
| } | |
| if err := f.Close(); err != nil { | |
| return fmt.Errorf("close dir %s: %w", p, err) | |
| } | |
| } |
| writeErr := func() error { | ||
| if len(data) == 0 { | ||
| return nil | ||
| } | ||
| _, err := f.Write(data) | ||
| return err | ||
| }() |
There was a problem hiding this comment.
writeFileAtomic writes + closes + renames the temp file without calling Sync() on the file (and ideally syncing the containing directory). Without fsync, a crash/power loss can result in the renamed file having incomplete data despite the rename succeeding. Consider calling f.Sync() after writing (before close/rename), and if you need stronger durability guarantees, also fsync the parent directory after the rename.
| if os.Geteuid() != 0 { | ||
| t.Skip("requires root") | ||
| } |
There was a problem hiding this comment.
These tests are skipped unless running as root, which will likely skip in most CI environments and leave the ownership-preservation code paths untested. Consider injecting the effective UID check (e.g., via a package-level var/function like geteuid) so tests can simulate root, and/or extending FakeFS to allow chown/chmod without requiring real root.
| if os.Geteuid() != 0 { | ||
| t.Skip("requires root") | ||
| } |
There was a problem hiding this comment.
Same issue as above: root-only skip means this behavior may never be exercised in CI. Prefer making the root-only branch testable via dependency injection for geteuid (and/or a fake chown layer) so group-inheritance logic is covered by default test runs.
Update atomic restore behavior to propagate ownership/permissions from the nearest existing parent when creating directories or writing files. ensureDirExistsWithInheritedMeta now walks up to find an existing ancestor, creates missing nested dirs, and applies inherited uid/gid and mode reliably (with proper error handling and closes). desiredOwnershipForAtomicWrite was adjusted to repair files that have root:root by inheriting the parent's group, prefer parent gid when appropriate, and handle existing files consistently. Added unit tests to cover repairing root group on existing dest and creating nested dirs with inherited metadata. Docs updated to note that staged restores enforce ownership/permissions for created parent directories and mention the repair of common root:root group regressions.
Add a default branch in discoverRcloneBackups to cancel and skip unsupported switch cases. Enhance writeFileAtomic to ensure durability: call f.Sync() after writing, introduce syncDir to open and fsync the parent directory (handling EINVAL/ENOTSUP as non-fatal for dir fsync), and return proper wrapped errors on failures. This ensures both file data and directory metadata are flushed for atomic writes.
Introduce test hooks (atomicGeteuid, atomicFileChown, atomicFileChmod, atomicFileSync) and refactor fs_atomic to use them instead of direct os/file calls to improve testability. Update tests to use a statOverrideFS and fakeFileInfo to simulate file metadata and capture chown calls, removing the need to run tests as root. Adjust expectations for seeded modes and add a new test to verify chown behavior when simulating root for nested directory creation.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |