A two-phase tool for creating comprehensive, public-facing documentation of all repositories within a GitHub organization.
# 1. Setup this indexer tool
./setup.sh
# Edit .env with your credentials
# 2. Collect repository metadata
source venv/bin/activate
python collect_repos.py --summary
# Generates: repositories.json
# 3. Prepare your public-facing repository
cd ../my-org-public-repos # Your target repo
cp ../github-repo-public-indexer/repositories.json .
# 4. Generate README with Cursor/Windsurf
# - Open Cursor/Windsurf in the target repo directory
# - Use PROMPT_TEMPLATE.md with repositories.json
# - Review and commit the generated README.mdImportant: The tool fetches metadata via GitHub API - no need to clone all your org's repositories.
See USAGE.md and WORKFLOW.md for detailed instructions.
Organizations with multiple GitHub repositories often lack centralized, up-to-date documentation showing:
- What repositories exist and their purposes
- Current status (active, archived, experimental)
- Licensing information
- Points of contact and ownership
- Technology stacks and dependencies
A two-phase automated approach:
Phase 1: Data Collection
- Script fetches all repository metadata via GitHub API
- Generates structured JSON file with comprehensive information
- Handles pagination, rate limits, and errors gracefully
Phase 2: Documentation Generation
- JSON file is provided to Cursor/Windsurf AI assistant
- AI analyzes data and generates human-readable README
- Organized, categorized, and easy-to-navigate documentation
- Separation of concerns: Data collection vs. presentation
- Reusability: JSON can be used for multiple outputs
- Flexibility: Different documentation styles from same data
- Context management: Avoid AI context limits
- Caching: Avoid re-fetching data during iteration
Current Version: 0.1.0 (Pre-release)
- Requirements gathering
- Specification documentation
- JSON schema design
- Workflow documentation
- Prompt template creation
- Phase 1 script implementation (Python)
- Testing infrastructure (pytest, CI/CD)
- Achieve 95% test coverage
- Testing with sample organization
- Phase 2 validation with Cursor/Windsurf
- Refinement and optimization
- Production deployment (v1.0.0)
- USAGE.md - Installation and usage guide
- QUICK_REFERENCE.md - One-page summary
- TESTING.md - Testing guide and best practices
- CLAUDE.md - Guide for Claude Code development
- SPECIFICATION.md - Full project scope and requirements
- JSON_SCHEMA.md - Output data structure
- WORKFLOW.md - End-to-end process guide
- PROMPT_TEMPLATE.md - Template for Cursor/Windsurf
- CONTRIBUTING_TEMPLATE.md - Template CONTRIBUTING.md for index repositories
Python-based data collection tool with:
- GitHub API Client (
src/github_client.py) - Authentication and rate limiting - Metadata Collector (
src/metadata_collector.py) - Extracts comprehensive repo data - JSON Generator (
src/json_generator.py) - Creates validated output - CLI Interface (
collect_repos.py) - User-friendly command-line tool with progress bars
Key features:
- Automatic rate limit handling
- Graceful error recovery
- Rich console output with progress tracking
- Configurable via CLI options or environment variables
- JSON validation and backup
- GitHub Enterprise Server support - Works with custom GitHub instances
-
Test with Your Organization
python collect_repos.py --org your-org-name --summary
-
Generate Documentation
- Use
PROMPT_TEMPLATE.mdwith Cursor/Windsurf - Attach the generated
repositories.json - Review and refine the output
- Use
-
Automate Updates
- Schedule regular runs (weekly/monthly)
- Track changes over time
- Keep documentation current
- Open Source Projects: Share your organization's repositories with the community
- Internal Teams: Document internal tools and libraries
- Compliance: Track licensing across all repositories
- Onboarding: Help new team members discover existing projects
- Portfolio: Showcase your organization's work
- Archival: Document repository history and evolution
- Python 3.8+ (tested on 3.8-3.12)
- PyGithub - GitHub API client
- Click - CLI framework
- Rich - Beautiful terminal output
- python-dotenv - Environment management
- pytest - Testing framework with 95% coverage target
Contributions welcome! Please:
- Follow the architecture in SPECIFICATION.md
- Ensure JSON output matches JSON_SCHEMA.md
- Add tests for new features
- Update documentation
- Parallel processing for large organizations
- API response caching
- Diff tool to compare runs
- GitHub Actions automation
- Web dashboard interface
- Multi-organization support
Apache License 2.0 - See LICENSE file for details.
For questions or suggestions, please open an issue.