gh-repo-archive

A shell script for archiving all repositories from a GitHub organization or user. The script clones all repositories, creates a compressed archive, and optionally uploads it to AWS S3.

Prerequisites

This script depends on the following tools:

sudo apt-get install -y jq curl git

For S3 upload functionality, you'll also need the AWS CLI:

# Install AWS CLI (example for Ubuntu/Debian)
sudo apt-get install -y awscli

Environment Variables

GITHUB_TOKEN: Required GitHub personal access token with appropriate permissions to read organization or user repositories

Usage

Basic Usage

Clone and archive all repositories from an organization:

export GITHUB_TOKEN="your_github_token_here"
./gh-repo-archive -o <organization_name>

Clone and archive all repositories from a user:

export GITHUB_TOKEN="your_github_token_here"
./gh-repo-archive -u <username>

With S3 Upload

Archive organization repositories and upload to S3:

export GITHUB_TOKEN="your_github_token_here"
./gh-repo-archive -o <organization_name> -b s3://your-bucket-name/path/

Archive user repositories and upload to S3:

export GITHUB_TOKEN="your_github_token_here"
./gh-repo-archive -u <username> -b s3://your-bucket-name/path/

Command Line Options

-o, --organization: GitHub organization name to archive (required if not using -u)
-u, --user: GitHub user name to archive (required if not using -o)
-b, --bucket: S3 bucket path for uploading the archive (optional)
-n, --name: Custom archive filename (optional, defaults to <org/user>-gh-archive-<timestamp>.tar.gz)
-t, --tmp: Custom temporary directory path (optional, defaults to /tmp/<org/user>-gh-archive-<timestamp>)

How It Works

Pre-flight checks: Validates that required tools (jq, git, aws cli if needed) are installed
Input validation: Ensures organization name or user name and GitHub token are provided
Fetches repository list: Uses GitHub API to get all repositories from the specified organization or user
Handles pagination: Automatically follows pagination links to get all repositories
Clones repositories: Creates a local clone of each repository in a temporary directory
Error handling: Provides detailed error messages for different failure scenarios
Creates archive: Compresses all cloned repositories into a .tar.gz file
Optional S3 upload: Uploads the archive to the specified S3 bucket if provided
Cleanup: Removes temporary files and directories

Output

The script creates an archive named <organization/user>-gh-archive-<timestamp>.tar.gz containing all cloned repositories.

GitHub Token Permissions

Your GitHub token needs the following permissions:

repo (if accessing private repositories)
read:org (for organization repositories)
read:user (for user repositories)

For public repositories only, a token with public_repo scope is sufficient.

Examples

# Archive public repositories from 'myorg'
export GITHUB_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxx"
./gh-repo-archive -o myorg

# Archive public repositories from user 'myuser'
export GITHUB_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxx"
./gh-repo-archive -u myuser

# Archive organization and upload to S3
export GITHUB_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxx"
./gh-repo-archive -o myorg -b s3://my-backup-bucket/github-archives/

# Archive user repositories with custom name and temp directory
export GITHUB_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxx"
./gh-repo-archive -u myuser -n "my-custom-backup.tar.gz" -t "/home/user/temp"

# Make the script executable first
chmod +x gh-repo-archive

Notes

The script automatically checks for required dependencies (jq, git, aws cli) and exits with helpful error messages if they're missing
Uses temporary directories under /tmp/ by default, but can be customized with the -t option
Archives are created in the current working directory by default, but filename can be customized with the -n option
If S3 upload is configured, the script validates that AWS CLI is available before proceeding
The script handles GitHub API pagination automatically to ensure all repositories are captured
Progress is displayed with colored output showing start/success/failure status for each repository
Provides specific error handling for common git clone issues:
- Repository not found (404)
- Repository already exists locally
- Authentication failures
- Other git-related errors with detailed messages

Disclaimer

This almost certainly would have been easier to write, read, and understand had it been written in a higher level language such as python.

But... why?

While this could have been implemented more simply in a higher-level language, it was an enjoyable exercise in advanced shell scripting techniques and working with APIs using basic command-line tools.

Ok, seriously

I needed a way to backup and archive my personal projects hosted on GitHub. After exploring GitHub's excellent API using basic tools like curl, I decided it would be interesting to build a complete solution using these fundamental command-line utilities.

...

Maybe you and I don't necessarily agree on what the meaning of "enjoyable" is.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.vscode		.vscode
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gh-repo-archive		gh-repo-archive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

gh-repo-archive

Prerequisites

Environment Variables

Usage

Basic Usage

With S3 Upload

Command Line Options

How It Works

Output

GitHub Token Permissions

Examples

Notes

Disclaimer

But... why?

Ok, seriously

...

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ecoutu/gp-repo-archive

Folders and files

Latest commit

History

Repository files navigation

gh-repo-archive

Prerequisites

Environment Variables

Usage

Basic Usage

With S3 Upload

Command Line Options

How It Works

Output

GitHub Token Permissions

Examples

Notes

Disclaimer

But... why?

Ok, seriously

...

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages