-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Executive Summary
userTransfer.php is a critical migration tool that has caused multiple incidents due to lack of observability, pre-flight checks, and autonomous operation support. This issue proposes a comprehensive redesign based on 5+ documented migration failures.
Documented Failure Modes
From sysadmin/memory/lessons/ — these are real incidents:
1. Silent Auth Failures (Fuckup Level: HIGH)
Incident: eyfqehxs migration burned 31 passes (hours) with zero bytes transferred because password didn't match. Nobody noticed until customer complained days later.
Root cause: No pre-flight auth check. Script optimistically starts all passes assuming auth works.
2. Password Coordination Failures
Incident: Source lockdown changed password without coordinating with PMSS_USER_TRANSFER_PASSWORD. Transfer silently failed.
Root cause: No validation that provided password actually works before starting.
3. Unmonitored Transfers
Incident: Screen session started and forgotten. Transfer stalled, nobody knew for days.
Root cause: No progress reporting, no health checks, no notifications.
4. Source Not Locked Down Properly
Incident: Tried removing .rtorrentExecuteRun — ineffective. Cron watchdog restarted rtorrent within 2 minutes. Data kept changing during sync.
Root cause: Lockdown logic not integrated into userTransfer. Operator must know to rename .rtorrent.rc.
5. Disk Space Exhaustion
Incident: Transfer started without checking if target had enough space. Failed mid-transfer.
Root cause: No pre-flight disk space check.
6. No Resume Capability
Incident: Transfer interrupted, had to restart from scratch.
Root cause: Each pass is independent; no stateful checkpointing.
Design Principles for Ideal Tool
For Agent Operation (Primary User)
- Fire-and-forget with confidence — Start once, get notification on completion/failure
- Observable — Structured status file that agents can poll
- Self-healing — Retry transient failures (network blips, temp disk full)
- Clear failure signals — Exit immediately on unrecoverable errors (auth failure)
- Predictable — Estimate completion time based on data size and transfer speed
For Customer Experience
- Fast — Maximize bandwidth (compression disabled, direct paths)
- Minimal disruption — Atomic cutover when ready
- Data integrity — Verify all files transferred correctly
- Communication — Provide progress updates for customer-facing replies
For Operator
- Hands-off — No babysitting required
- Alerting — Only notified on problems via webhook/Discord
- Audit trail — Full structured log of what happened
- Easy restart — Resume from checkpoint if interrupted
Proposed Architecture
Phase 1: Pre-Flight (BLOCKING)
userTransfer.php --preflight USER SOURCEBefore any data transfer:
- Auth check — Verify SSH/password works with a simple
lscommand - Disk space — Compare source size vs target free space
- Network connectivity — Verify route to source
- Source status — Check if rtorrent/qbittorrent running (warn if source not locked)
- Target status — Verify user exists, home writable
Exit with clear error if ANY pre-flight fails. No wasted passes.
Phase 2: Source Lockdown (Optional, Integrated)
userTransfer.php --lockdown-source USER SOURCEIntegrated lockdown that actually works:
- SSH to source
- Rename
.rtorrent.rc→.rtorrent.rc.migration-disabled - Rename
.qbittorrentEnable→.qbittorrentEnable.migration-disabled - Kill running torrent processes
- Wait 2+ minutes
- Verify no respawn
Phase 3: Transfer with Progress
During transfer, maintain status file:
// /var/run/pmss/userTransfer-USER.json
{
"user": "eyfqehxs",
"source": "le4-0-105-210funnelout.pulsedmedia.com",
"target": "le4-0-104-131symmetra.pulsedmedia.com",
"started": "2026-02-04T05:09:00Z",
"status": "running",
"phase": "main",
"pass": 5,
"total_passes": 31,
"bytes_transferred": 15000000000000,
"bytes_total": 36000000000000,
"percent": 41.7,
"eta_seconds": 7200,
"last_update": "2026-02-04T07:30:00Z",
"errors": []
}Agent can poll this file. No log parsing required.
Phase 4: Transfer Completion
On completion:
- Run final sync passes
- Run post-setup (permissions, ruTorrent rename)
- Verify data integrity (optional checksum mode)
- Restore source configs (optional, for rollback capability)
- Write completion status
- Send webhook notification (if configured)
Phase 5: Notification
userTransfer.php --webhook https://discord.webhook.url USER SOURCEOn completion/failure, POST structured payload:
{
"event": "migration_complete",
"user": "eyfqehxs",
"source": "funnelout",
"target": "symmetra",
"duration_seconds": 14400,
"bytes_transferred": 36000000000000,
"status": "success"
}CLI Interface
Usage:
userTransfer.php [OPTIONS] LOCAL_USER REMOTE_HOST
userTransfer.php --preflight LOCAL_USER REMOTE_HOST
userTransfer.php --status LOCAL_USER
userTransfer.php --abort LOCAL_USER
Options:
--preflight Run pre-flight checks only, no transfer
--lockdown-source Lock down source before transfer (rename configs, kill processes)
--unlock-source Restore source configs after transfer
--status Show current transfer status (JSON)
--abort Abort running transfer gracefully
--webhook URL Send completion/failure notification to URL
--json-log Output structured JSON logs (for agent parsing)
--no-sleep Disable sleep between passes (for fast internal transfers)
--verify Verify checksums after transfer (slow but safe)
--resume Resume from last checkpoint
--password-from-stdin Read password from stdin (more secure than env)
--daemon Daemonize after starting (no screen needed)
Status File Location
/var/run/pmss/userTransfer-{USER}.json # Current status
/var/run/pmss/userTransfer-{USER}.pid # PID file for daemon mode
/var/log/pmss/userTransfer-{USER}.log # Human-readable log
/var/log/pmss/userTransfer-{USER}.jsonl # Machine-readable log (JSON lines)
Agent Workflow
# 1. Pre-flight (fails fast if anything wrong)
php /scripts/util/userTransfer.php --preflight eyfqehxs funnelout
# 2. Start transfer with webhook notification
php /scripts/util/userTransfer.php --daemon --webhook "$DISCORD_URL" \
--lockdown-source eyfqehxs funnelout
# 3. Agent can check progress anytime
cat /var/run/pmss/userTransfer-eyfqehxs.json | jq '.percent'
# 4. Webhook fires on completion — agent doesn't need to pollMinimum Viable Implementation (Phase 1)
If full redesign is too much, at minimum implement:
--preflightflag — Auth + disk space check before transfer- Exit code 2 for auth failure — Distinct from other errors
- Status file — Simple JSON with current pass/percent
--abortflag — Graceful shutdown
These 4 changes would have prevented all documented incidents.
Related Lessons
sysadmin/memory/lessons/infrastructure/20260201-migration-eyfqehxs-multiple-fuckups.mdsysadmin/memory/lessons/infrastructure/20260204-usertransfer-stalled-migration-fuckup.mdsysadmin/memory/lessons/infrastructure/20260201-usertransfer-password-must-match-source.mdsysadmin/memory/lessons/operations/20260201-migration-source-lockdown-correct-method.mdsysadmin/memory/lessons/operations/20260201-penny-migration-lessons.mdsysadmin/memory/lessons/customer/20260201-migration-email-must-mention-transfer-time.mdsysadmin/memory/lessons/pmss/20260131-usertransfer-php-must-run-from-target-server.md