Skip to content

Clean up gh-pages branch history (pre-2025) #419

@mmcky

Description

@mmcky

Step-by-Step Instructions: Clean Up gh-pages History Before 2025

Date: October 27, 2025
Repository: lecture-python-programming.myst
Action: Remove all gh-pages commits before January 1, 2025

Summary of Changes

  • Commits to remove: 128 (October 2020 - December 2024)
  • Commits to keep: 3 (2025 deployments)
  • Expected space savings: 250-350 MB (~40-58% reduction)
  • Commits being preserved:

Prerequisites

⚠️ IMPORTANT: This operation will rewrite git history. Coordinate with your team before proceeding.

Required Tools

  1. git-filter-repo (recommended method)

    pip install git-filter-repo
  2. Verify git-filter-repo installation:

    git filter-repo --version

Before You Begin

  • Inform all collaborators that history will be rewritten
  • Ensure you have proper backup (see Step 1)
  • Have write access to the remote repository
  • Close any open pull requests that might conflict

Step 1: Create a Complete Backup

⚠️ DO NOT SKIP THIS STEP

Create a full mirror backup of the repository before making any changes:

cd /Users/mmcky/work/quantecon

# Create a backup directory
mkdir -p repo-backups
cd repo-backups

# Create a complete mirror backup
git clone --mirror https://github.com/QuantEcon/lecture-python-programming.myst.git backup-$(date +%Y%m%d).git

# Verify backup
cd backup-$(date +%Y%m%d).git
git log gh-pages --oneline | head -5
cd ../..

Verification: You should see all 131 commits in the backup.


Step 2: Prepare Your Working Repository

Navigate to your working repository and ensure it's clean:

cd /Users/mmcky/work/quantecon/lecture-python-programming.myst

# Check status - should be clean
git status

# Ensure you're on main branch
git checkout main

# Fetch latest from remote
git fetch --all

# Pull latest changes
git pull origin main

Expected output: "Already up to date" or successful pull.


Step 3: Find the First 2025 Commit

We need to identify the exact commit to use as the new base:

# Find the oldest commit in 2025
git log gh-pages --reverse --since="2025-01-01" --format="%H %ai %s" | head -1

Expected output:

bb75a513ab71050c5084fa0515c7bf06f557b6df 2025-03-11 04:33:15 +0000 deploy: 83d5f364...

Note this commit hash: bb75a513ab71050c5084fa0515c7bf06f557b6df


Step 4: Create a New Orphan gh-pages Branch

We'll create a fresh gh-pages branch starting from the first 2025 commit:

# Checkout the first 2025 commit
git checkout bb75a513ab71050c5084fa0515c7bf06f557b6df

# Create a new orphan branch from this point
git checkout --orphan gh-pages-new

# Add all files from this commit
git add -A

# Commit with original message and date
git commit -m "deploy: 83d5f364194ef23492b74112617cfacc36b76758" \
  --date="2025-03-11 04:33:15 +0000"

Verification: You now have a fresh branch with just this one commit.


Step 5: Cherry-pick Remaining 2025 Commits

Apply the remaining 2025 commits on top:

# Cherry-pick the second 2025 commit
git cherry-pick a9cdec4a6524cf44e83e13123c1632b7625ed50d

# Cherry-pick the most recent commit
git cherry-pick f13c8a0c9ed6158922404b9f36f0bce30aaf39bc

Verification:

# Should show exactly 3 commits
git log --oneline

Expected output:

f13c8a0 deploy: 2bfbcadc7992e8314094ab3e6ad3c0f00c97b209
a9cdec4 deploy: 65e3b703d040d47abb830441eaad458d645f4c69
bb75a51 deploy: 83d5f364194ef23492b74112617cfacc36b76758

Step 6: Replace Old gh-pages Branch

Now replace the old bloated branch with the clean one:

# Delete the old gh-pages branch
git branch -D gh-pages

# Rename the new branch to gh-pages
git branch -m gh-pages-new gh-pages

# Verify the branch
git log gh-pages --oneline

Verification: Should show only 3 commits.


Step 7: Check Repository Size (Before Force Push)

Check the current size:

# Current size
du -sh .git

# Run garbage collection
git reflog expire --expire=now --all
git gc --prune=now --aggressive

# New size
du -sh .git

Note: Full size reduction won't be visible until everyone updates (Step 9).


Step 8: Force Push to Remote

⚠️ WARNING: This rewrites remote history. Ensure team is notified!

# First, do a dry-run to see what will happen
git push origin gh-pages --force --dry-run

# If dry-run looks good, do the actual force push
git push origin gh-pages --force

Verification: Check GitHub to confirm only 3 commits in gh-pages history.


Step 9: Clean Up for All Team Members

Everyone who has cloned this repository must run these commands:

cd /path/to/lecture-python-programming.myst

# Fetch updated refs
git fetch origin

# If you're on gh-pages, switch away
git checkout main

# Delete local gh-pages
git branch -D gh-pages

# Get the new gh-pages
git checkout --track origin/gh-pages

# Clean up old objects
git reflog expire --expire=now --all
git gc --prune=now --aggressive

# Verify new size
du -sh .git

Step 10: Verify Everything Works

Test that the deployment still works:

  1. Check GitHub Pages site:

    # Visit your GitHub Pages URL to confirm site is working
    open https://quantecon.github.io/lecture-python-programming.myst/
  2. Verify git operations:

    # On main branch
    git checkout main
    
    # Can still build and test
    jb build lectures --path-output ./ -n -W --keep-going
    
    # Can switch to gh-pages
    git checkout gh-pages
    
    # Has expected content
    ls -la

Expected Results

Before Cleanup

  • Repository size: ~603 MB
  • gh-pages commits: 131
  • History span: October 2020 - June 2025

After Cleanup

  • Repository size: ~250-300 MB (after everyone runs gc)
  • gh-pages commits: 3
  • History span: March 2025 - June 2025
  • Space savings: ~300-350 MB (50-58% reduction)

Rollback Procedure (If Needed)

If something goes wrong, you can restore from backup:

cd /Users/mmcky/work/quantecon/lecture-python-programming.myst

# Fetch refs from backup
git remote add backup /Users/mmcky/work/quantecon/repo-backups/backup-YYYYMMDD.git

# Restore gh-pages
git fetch backup
git branch -D gh-pages
git checkout -b gh-pages backup/gh-pages

# Force push to restore
git push origin gh-pages --force

Alternative Method: Using git-filter-repo

If you prefer using git-filter-repo (more automated):

# Create backup first (Step 1)

# Clone fresh copy
cd /tmp
git clone https://github.com/QuantEcon/lecture-python-programming.myst.git temp-cleanup
cd temp-cleanup

# Filter to keep only 2025 commits on gh-pages
git filter-repo --refs refs/heads/gh-pages \
  --commit-callback '
import datetime
if commit.committer_date < b"2025-01-01":
    commit.skip()
' --force

# Verify
git log gh-pages --oneline

# Add remote back (filter-repo removes it)
git remote add origin https://github.com/QuantEcon/lecture-python-programming.myst.git

# Force push
git push origin gh-pages --force

# Clean up temp directory
cd ..
rm -rf temp-cleanup

Troubleshooting

Issue: "refusing to delete the current branch"

Solution: Switch to main first: git checkout main

Issue: "remote rejected (protected branch)"

Solution: Temporarily disable branch protection on GitHub:

  1. Go to Settings → Branches → Branch protection rules
  2. Edit gh-pages protection
  3. Temporarily disable
  4. Force push
  5. Re-enable protection

Issue: Clone size hasn't reduced

Solution: Run garbage collection:

git reflog expire --expire=now --all
git gc --prune=now --aggressive

Issue: Build artifacts missing

Solution: Trigger a new deployment from main branch


Post-Cleanup Recommendations

  1. Update CI/CD to use shallow clones:

    - uses: actions/checkout@v4
      with:
        fetch-depth: 1
  2. Consider deployment without history:
    Use peaceiris/actions-gh-pages@v3 with force_orphan: true

  3. Schedule annual cleanup:
    Add to calendar to review and clean old deploys yearly

  4. Monitor repository size:

    # Add to monthly checks
    du -sh .git
    git count-objects -vH

Questions or Issues?

If you encounter any problems:

  1. Check the backup is intact
  2. Review the verification steps
  3. Consult the troubleshooting section
  4. You can always rollback using the backup

Remember: The GitHub Pages site will continue to work throughout this process. You're only cleaning up historical deployment snapshots.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions