Repo2GPT is a Python application that clones a GitHub repository (or points at an existing local checkout) and produces:
- A repomap describing the directory structure plus key classes/functions per source file.
- A consolidated code bundle that merges the relevant source files into a single prompt-friendly text file.
Repo2GPT extracts richer function and type information for Python, JavaScript/TypeScript, Go, Rust, Ruby, and PHP source files, helping the repo map highlight the most relevant entry points in those ecosystems.
The tool now defaults to a code-centric include list so that dependency locks, build artefacts, and other filler stay out of your prompt window. When you do need to override the defaults, Repo2GPT recognises .gptignore / .gptinclude files as well as inline CLI switches.
With the virtual environment activated (optional), install the packages listed in requirements.txt:
pip install -r requirements.txtRepo2GPT includes a pytest suite that covers critical helpers and a full repository processing path. After installing the dependencies, run:
pytestUse the same command in continuous integration jobs once dependency installation has completed.
With everything set up, you can now use Repo2GPT:
python main.py <repo-url-or-path> \
--copy both \
--extra-include "*.yml" \
--extra-extensions mdKey options:
--copy {map|code|both}will push the generated outputs into the system clipboard, ready for pasting into your AI chat..gptignore/.gptincludefiles (at the repo root or supplied via--gptignore/--gptinclude) mirror the patterns used by popular alternatives such as git2gpt. Include patterns add to the default code-centric filter: code files remain eligible even when a.gptincludeexists, while non-code files require a matching include rule.--extra-ignore,--extra-include, and--extra-extensionslet you fine-tune experiment-specific filters without editing dotfiles.--max-file-bytes(default 500 KB) prevents enormous compiled or vendor files from exploding the output; pass0to disable.--include {code|all}toggles between the default code-centric include list and the legacy “include everything” behaviour (helpful for HTML/Markdown–heavy repos).--include-allremains as a backwards-compatible alias for--include all.--enable-token-countsprints an estimated token budget for each consolidated chunk; install the optionaltiktokenpackage for model-aware counts.--chunk-sizesplits the consolidated output into numbered files once a chunk nears the requested token ceiling (set to0to disable).
Repo2GPT writes repomap.txt and consolidated_code.txt to your current working directory unless you override the paths. If the targets live inside the repository directory, they are automatically excluded from the generated output.
Install the optional tokenizer dependency to obtain model-compatible counts:
pip install tiktokenThen invoke Repo2GPT with the token helpers enabled:
python main.py <repo-url-or-path> \
--enable-token-counts \
--chunk-size 3500The CLI now reports token usage per chunk, the total estimated budget, and whether the counts are approximate (when tiktoken is unavailable). When chunking is enabled, the primary consolidated file retains its original name and subsequent chunks are written as <name>_partXX.ext. Clipboard copies combine the numbered chunks with lightweight headings so you can paste the full series into your prompt workflow.
Repo2GPT now ships with a FastAPI-powered service that accepts repository processing jobs and streams progress back to clients. The API layers asynchronous job execution, on-disk persistence, and live Server-Sent Event (SSE) feeds on top of the existing snapshot engine.
Install the dependencies and start the application with Uvicorn:
pip install -r requirements.txt
uvicorn api.server:app --host 0.0.0.0 --port 8000Set REPO2GPT_STORAGE_ROOT if you want processed artifacts and status files to live somewhere other than the default ~/.repo2gpt/jobs directory. Jobs are persisted on disk so they survive restarts.
Create a job with POST /jobs. The payload must include a source describing where the repository comes from (git, archive_url, or archive_upload) and optional processing tweaks:
{
"source": {
"type": "git",
"url": "https://github.com/openai/repo2gpt.git",
"ref": "main"
},
"chunk_token_limit": 3500,
"enable_token_counts": true,
"options": {
"ignore_patterns": ["*.ipynb"],
"allow_non_code": false
}
}The endpoint responds immediately with a job identifier while processing continues in the background. The work runs in an asynchronous task so request threads remain free.
GET /jobs/{id}returns the job metadata, progress log, and any token statistics.GET /jobs/{id}/artifactsfetches the generated repo map and consolidated chunks once the job has completed.GET /jobs/{id}/eventsstreams incremental updates as SSE messages. Clients receive progress notifications, chunk statistics, and final status changes in near real time.GET /healthzexposes a simple readiness probe for load balancers and orchestration systems.
SSE streams emit status, progress, chunk, repomap, and tokens events, each carrying structured JSON data. The server keeps connections alive with heartbeat comments so browsers do not time out on long-running repositories.
The web/ directory hosts a Vite + React dashboard that wraps the API with a friendly interface:
- Submit jobs from Git URLs, downloadable archives, or uploaded tar/zip bundles.
- Watch live progress updates rendered from the SSE stream, including per-chunk token counts.
- Preview the generated repo map and chunk contents, copy a shareable job link, or download the entire artifact bundle as a ZIP file.
- Trigger a Gemini File API upload using a stored access token and view success/error feedback inline.
The UI stores the API base URL, API token, and Gemini credentials in localStorage. To run it locally:
cd web
npm install
npm run devPass the same API token that you configured on the FastAPI service. For production builds, run npm run build and serve the contents of web/dist/ alongside the API (see docs/deployment.md for hosting suggestions).
Set the REPO2GPT_API_KEY environment variable to enforce API-key based authentication. When configured, every request must include an X-API-Key header that matches the configured secret. Leave the variable unset for unauthenticated local development.
The repository provides a production-ready Dockerfile. Build and run the container with:
docker build -t repo2gpt-api .
docker run --rm -p 8000:8000 -e REPO2GPT_API_KEY=super-secret \
-e REPO2GPT_STORAGE_ROOT=/data/jobs -v $(pwd)/jobs:/data/jobs repo2gpt-apiFor bare-metal or virtual machine deployments you can rely on Gunicorn’s Uvicorn worker class:
gunicorn api.server:app -k uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000 --workers 2 --timeout 300Make sure the REPO2GPT_STORAGE_ROOT directory is writable by the service account and, when authentication is enabled, store the API key securely (for example via environment-injected secrets).
Repo2GPT also exposes a Model Context Protocol server that lets IDE agents such as OpenAI Codex and Anthropic Claude Code request fresh repository snapshots over JSON-RPC.
Set the following environment variables before launching the server:
REPO2GPT_GITHUB_PAT(orGITHUB_TOKEN/GITHUB_PAT) – optional GitHub personal access token for cloning private repositories.REPO2GPT_GEMINI_API_KEY(orGEMINI_API_KEY/GOOGLE_API_KEY) – optional Gemini File API credentials for downstream tooling.REPO2GPT_GEMINI_MODEL– override the default Gemini model identifier advertised to clients.
Install dependencies with pip install -r requirements.txt, then start both services in separate terminals:
uvicorn api.server:app --host 0.0.0.0 --port 8000 # REST API for background jobs
uvicorn integrations.mcp.server:app --host 0.0.0.0 --port 3030 # MCP endpoint for IDE agentsThe MCP server listens for JSON-RPC requests on / and offers three tools via listTools:
processRepo– clone or read a repository and return repo map + chunk artifacts.listRecentJobs– enumerate the most recent MCP jobs and their artifact descriptors.getArtifact– fetch the contents of a previously produced artifact.
Use the GET /healthz endpoint on either service for basic readiness checks.
Once the MCP server is running, register it with your editor or hosted agent:
- OpenAI Codex – open the Codex tool configuration panel, choose Add MCP endpoint, and supply the server URL (
http://localhost:3030/). Codex will callinitialize,listTools, and then invokecallToolas you issue prompts. - Anthropic Claude Code – from the Claude Code sidebar choose Custom tools → Connect endpoint, enter the same URL, and confirm. Claude Code will immediately list the available tools and let you trigger
processRepofrom chat.
Both clients automatically reuse the job identifiers returned by processRepo so you can later call listRecentJobs or getArtifact during the same session.
- Add ASM traversal and mapping similar to ctags.
- Ship a web version or VS Code extension.
- Better language-specific parsers for the repo map summaries.
Repo2GPT is licensed under the terms of the MIT license. See LICENSE for more details.