-
-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Step-by-Step Instructions: Clean Up gh-pages History Before 2025
Date: October 27, 2025
Repository: lecture-python-programming.myst
Action: Remove all gh-pages commits before January 1, 2025
Summary of Changes
- Commits to remove: 128 (October 2020 - December 2024)
- Commits to keep: 3 (2025 deployments)
- Expected space savings: 250-350 MB (~40-58% reduction)
- Commits being preserved:
Prerequisites
Required Tools
-
git-filter-repo (recommended method)
pip install git-filter-repo
-
Verify git-filter-repo installation:
git filter-repo --version
Before You Begin
- Inform all collaborators that history will be rewritten
- Ensure you have proper backup (see Step 1)
- Have write access to the remote repository
- Close any open pull requests that might conflict
Step 1: Create a Complete Backup
Create a full mirror backup of the repository before making any changes:
cd /Users/mmcky/work/quantecon
# Create a backup directory
mkdir -p repo-backups
cd repo-backups
# Create a complete mirror backup
git clone --mirror https://github.com/QuantEcon/lecture-python-programming.myst.git backup-$(date +%Y%m%d).git
# Verify backup
cd backup-$(date +%Y%m%d).git
git log gh-pages --oneline | head -5
cd ../..Verification: You should see all 131 commits in the backup.
Step 2: Prepare Your Working Repository
Navigate to your working repository and ensure it's clean:
cd /Users/mmcky/work/quantecon/lecture-python-programming.myst
# Check status - should be clean
git status
# Ensure you're on main branch
git checkout main
# Fetch latest from remote
git fetch --all
# Pull latest changes
git pull origin mainExpected output: "Already up to date" or successful pull.
Step 3: Find the First 2025 Commit
We need to identify the exact commit to use as the new base:
# Find the oldest commit in 2025
git log gh-pages --reverse --since="2025-01-01" --format="%H %ai %s" | head -1Expected output:
bb75a513ab71050c5084fa0515c7bf06f557b6df 2025-03-11 04:33:15 +0000 deploy: 83d5f364...
Note this commit hash: bb75a513ab71050c5084fa0515c7bf06f557b6df
Step 4: Create a New Orphan gh-pages Branch
We'll create a fresh gh-pages branch starting from the first 2025 commit:
# Checkout the first 2025 commit
git checkout bb75a513ab71050c5084fa0515c7bf06f557b6df
# Create a new orphan branch from this point
git checkout --orphan gh-pages-new
# Add all files from this commit
git add -A
# Commit with original message and date
git commit -m "deploy: 83d5f364194ef23492b74112617cfacc36b76758" \
--date="2025-03-11 04:33:15 +0000"Verification: You now have a fresh branch with just this one commit.
Step 5: Cherry-pick Remaining 2025 Commits
Apply the remaining 2025 commits on top:
# Cherry-pick the second 2025 commit
git cherry-pick a9cdec4a6524cf44e83e13123c1632b7625ed50d
# Cherry-pick the most recent commit
git cherry-pick f13c8a0c9ed6158922404b9f36f0bce30aaf39bcVerification:
# Should show exactly 3 commits
git log --onelineExpected output:
f13c8a0 deploy: 2bfbcadc7992e8314094ab3e6ad3c0f00c97b209
a9cdec4 deploy: 65e3b703d040d47abb830441eaad458d645f4c69
bb75a51 deploy: 83d5f364194ef23492b74112617cfacc36b76758
Step 6: Replace Old gh-pages Branch
Now replace the old bloated branch with the clean one:
# Delete the old gh-pages branch
git branch -D gh-pages
# Rename the new branch to gh-pages
git branch -m gh-pages-new gh-pages
# Verify the branch
git log gh-pages --onelineVerification: Should show only 3 commits.
Step 7: Check Repository Size (Before Force Push)
Check the current size:
# Current size
du -sh .git
# Run garbage collection
git reflog expire --expire=now --all
git gc --prune=now --aggressive
# New size
du -sh .gitNote: Full size reduction won't be visible until everyone updates (Step 9).
Step 8: Force Push to Remote
# First, do a dry-run to see what will happen
git push origin gh-pages --force --dry-run
# If dry-run looks good, do the actual force push
git push origin gh-pages --forceVerification: Check GitHub to confirm only 3 commits in gh-pages history.
Step 9: Clean Up for All Team Members
Everyone who has cloned this repository must run these commands:
cd /path/to/lecture-python-programming.myst
# Fetch updated refs
git fetch origin
# If you're on gh-pages, switch away
git checkout main
# Delete local gh-pages
git branch -D gh-pages
# Get the new gh-pages
git checkout --track origin/gh-pages
# Clean up old objects
git reflog expire --expire=now --all
git gc --prune=now --aggressive
# Verify new size
du -sh .gitStep 10: Verify Everything Works
Test that the deployment still works:
-
Check GitHub Pages site:
# Visit your GitHub Pages URL to confirm site is working open https://quantecon.github.io/lecture-python-programming.myst/ -
Verify git operations:
# On main branch git checkout main # Can still build and test jb build lectures --path-output ./ -n -W --keep-going # Can switch to gh-pages git checkout gh-pages # Has expected content ls -la
Expected Results
Before Cleanup
- Repository size: ~603 MB
- gh-pages commits: 131
- History span: October 2020 - June 2025
After Cleanup
- Repository size: ~250-300 MB (after everyone runs gc)
- gh-pages commits: 3
- History span: March 2025 - June 2025
- Space savings: ~300-350 MB (50-58% reduction)
Rollback Procedure (If Needed)
If something goes wrong, you can restore from backup:
cd /Users/mmcky/work/quantecon/lecture-python-programming.myst
# Fetch refs from backup
git remote add backup /Users/mmcky/work/quantecon/repo-backups/backup-YYYYMMDD.git
# Restore gh-pages
git fetch backup
git branch -D gh-pages
git checkout -b gh-pages backup/gh-pages
# Force push to restore
git push origin gh-pages --forceAlternative Method: Using git-filter-repo
If you prefer using git-filter-repo (more automated):
# Create backup first (Step 1)
# Clone fresh copy
cd /tmp
git clone https://github.com/QuantEcon/lecture-python-programming.myst.git temp-cleanup
cd temp-cleanup
# Filter to keep only 2025 commits on gh-pages
git filter-repo --refs refs/heads/gh-pages \
--commit-callback '
import datetime
if commit.committer_date < b"2025-01-01":
commit.skip()
' --force
# Verify
git log gh-pages --oneline
# Add remote back (filter-repo removes it)
git remote add origin https://github.com/QuantEcon/lecture-python-programming.myst.git
# Force push
git push origin gh-pages --force
# Clean up temp directory
cd ..
rm -rf temp-cleanupTroubleshooting
Issue: "refusing to delete the current branch"
Solution: Switch to main first: git checkout main
Issue: "remote rejected (protected branch)"
Solution: Temporarily disable branch protection on GitHub:
- Go to Settings → Branches → Branch protection rules
- Edit gh-pages protection
- Temporarily disable
- Force push
- Re-enable protection
Issue: Clone size hasn't reduced
Solution: Run garbage collection:
git reflog expire --expire=now --all
git gc --prune=now --aggressiveIssue: Build artifacts missing
Solution: Trigger a new deployment from main branch
Post-Cleanup Recommendations
-
Update CI/CD to use shallow clones:
- uses: actions/checkout@v4 with: fetch-depth: 1
-
Consider deployment without history:
Usepeaceiris/actions-gh-pages@v3withforce_orphan: true -
Schedule annual cleanup:
Add to calendar to review and clean old deploys yearly -
Monitor repository size:
# Add to monthly checks du -sh .git git count-objects -vH
Questions or Issues?
If you encounter any problems:
- Check the backup is intact
- Review the verification steps
- Consult the troubleshooting section
- You can always rollback using the backup
Remember: The GitHub Pages site will continue to work throughout this process. You're only cleaning up historical deployment snapshots.