🚀 LangGraph Extract Agent

Production-ready extraction pipeline using LangExtract, LangGraph, and MinIO for structured German business data extraction from Impressum/About pages.

✨ Features

🤖 LangExtract Integration - Google's structured extraction with German business prompts
🔄 LangGraph Workflow - State-based orchestration
📦 MinIO Storage - S3-compatible object storage
⚡ Parallel Processing - 5 workers with thread pool
🔁 Retry Logic - Exponential backoff (3 attempts)
🚦 Rate Limiting - API quota protection (20 req/min)
📊 Statistics - Real-time metrics & JSON reports
🐳 Docker Ready - Full containerization
🔌 Model Flexibility - Gemini API or Ollama (local)

🚀 Quick Start

Docker (Recommended)

# Clone repository
git clone https://github.com/MrBozkay/langraph_extract_agent.git
cd langraph_extract_agent

# Configure environment
cp .env.example .env
# Edit .env with your GOOGLE_API_KEY and MinIO credentials

# Run with Docker
docker-compose up --build

Manual Setup

# Setup virtual environment
./setup_venv.sh
source venv/bin/activate

# Configure
cp .env.example .env
nano .env  # Add your credentials

# Run production batch
python src/agents/run_batch_production.py

📊 Agent Selection

Agent	Use Case	Speed	Features
Production Batch	Production deployment	⚡⚡⚡	Parallel, Retry, Rate limit, Limited objects
LangGraph	State tracking	⚡	Workflow visualization, Limited objects
Simple Batch	Testing/Debug	⚡	Easy to understand

💡 Recommendation: Use run_batch_production.py for production.

🔧 New Features:

list_objects() with limit parameter (default: 50)
recursive=False for non-recursive listing (default: True)
Command-line parameters for test_minio.py

🎯 Example Output

{
  "owner_name": "Hans Müller",
  "position": "Geschäftsführer",
  "company_name": "Mustermann GmbH",
  "email": "h.mueller@mustermann.de",
  "phone": "+49 123 456789",
  "website": "www.mustermann.de",
  "sector": "Consulting"
}

📚 Documentation

Quick Start Guide - 5-minute setup
Agent Selection Guide - Choose the right agent
Production Deployment - Production best practices
Contributing - How to contribute

🔧 Configuration

# MinIO (Remote or Local)
MINIO_ENDPOINT=your-minio-server:9000
MINIO_ACCESS_KEY=your-access-key
MINIO_SECRET_KEY=your-secret-key
MINIO_BUCKET_NAME=your-bucket

# LLM Model
GOOGLE_API_KEY=your-gemini-api-key
LANGEXTRACT_MODEL=gemini-2.0-flash-exp

# Production Settings
EXTRACTION_MAX_WORKERS=5
EXTRACTION_RETRY_COUNT=3
RATE_LIMIT_REQUESTS_PER_MINUTE=20

🐳 Docker Commands

# Build image
docker build -t langraph-extract-agent .

# Run with docker-compose
docker-compose up -d

# View logs
docker-compose logs -f extraction-app

# Stop
docker-compose down

🧪 Testing

# Test MinIO connection (with parameters)
python test_minio.py --help                    # Show help
python test_minio.py                           # Default: 5 files, non-recursive
python test_minio.py --recursive --limit 20     # Recursive with 20 files
python test_minio.py --prefix "folder/" --limit 10  # Custom prefix

# Test extraction
python test_extraction.py

# Test production features
python test_production_features.py

📈 Performance

Processing Speed: ~2-3 files/second (5 workers)
Success Rate: >95% (with retry logic)
Extraction Time: ~2-5 seconds per file
Memory Usage: Optimized with object limiting
Network Efficiency: Non-recursive listing by default

🔄 Ollama Support

Switch to local inference:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull gpt-oss:20b

# Update .env
LANGEXTRACT_MODEL=ollama/gpt-oss:20b

🤝 Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

LangExtract - Google's extraction library
LangGraph - Workflow orchestration
MinIO - S3-compatible object storage

Built with ❤️ for German business data extraction

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
AGENT_GUIDE.md		AGENT_GUIDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOYMENT_SUMMARY.md		DEPLOYMENT_SUMMARY.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
PRODUCTION.md		PRODUCTION.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SUMMARY.md		SUMMARY.md
create_sample_data.py		create_sample_data.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup_venv.sh		setup_venv.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 LangGraph Extract Agent

✨ Features

🚀 Quick Start

Docker (Recommended)

Manual Setup

📊 Agent Selection

🎯 Example Output

📚 Documentation

🔧 Configuration

🐳 Docker Commands

🧪 Testing

📈 Performance

🔄 Ollama Support

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 LangGraph Extract Agent

✨ Features

🚀 Quick Start

Docker (Recommended)

Manual Setup

📊 Agent Selection

🎯 Example Output

📚 Documentation

🔧 Configuration

🐳 Docker Commands

🧪 Testing

📈 Performance

🔄 Ollama Support

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages