A professional AI development environment with Docker containers for comparing AI agent frameworks through standardized multi-agent workflows. This system enables fair "apples-to-apples" comparison of CrewAI, LangChain, LangGraph, and LlamaIndex using identical agents, prompts, and workflows.
- Docker and Docker Compose installed
- NVIDIA GPU with NVIDIA Docker support (optional, for GPU acceleration)
- Please see the included NVIDIA_CUDA_Guide.md file for basic CUDA installation instructions
# Clone the repository
git clone https://github.com/yourusername/multi-agent-ai-research-system.git
cd multi-agent-ai-research-system
# Copy environment configuration
cp .env.example .env
# Start all services
docker-compose up- Streamlit Web App: http://localhost:8501
- Jupyter Lab: http://localhost:8888 (token:
your-secure-token-here) - Ollama API: http://localhost:11434
- ChromaDB: http://localhost:8000
# Download the AI model (in a separate terminal)
docker-compose exec ollama ollama pull llama3.1:8b-instruct-q4_K_M
# Verify all services are running
docker-compose ps- π¦ Ollama: Local LLM server with GPU support for privacy and performance
- ποΈ ChromaDB: Vector database for embeddings and semantic search
- π Jupyter Lab: Interactive development environment for AI experimentation
- π Streamlit: Web interface for multi-agent analysis and framework comparison
multi-agent-ai-research-system/
βββ docker-compose.yml # Service orchestration
βββ Dockerfile.jupyter # Jupyter Lab container
βββ Dockerfile.streamlit # Streamlit app container
βββ requirements.txt # Python dependencies
βββ .env.example # Environment configuration template
βββ notebooks/ # AI framework implementations
β βββ crewai_functions.py # CrewAI implementation (baseline)
β βββ langchain_functions.py # LangChain standardized implementation
β βββ langgraph_functions.py # LangGraph standardized implementation
β βββ llamaindex_functions.py # LlamaIndex standardized implementation
β βββ streamlit_multi_agent_app.py # Main web interface
βββ projects/ # Analysis results and exports
βββ data/ # Data files and documents
βββ models/ # Local AI model storage
- Multi-Agent Analysis: Deploy teams of 5 specialized AI agents to analyze complex topics from multiple expert perspectives
- Framework Comparison: Run identical workflows across 4 different AI frameworks to compare performance and capabilities
- Standardized Evaluation: Ensure fair comparison by using identical agent roles, prompts, and workflows across all frameworks
- Containerized Environment: Complete development setup with one command - no dependency conflicts or setup issues
- Healthcare Domain Expert - Medical and clinical perspective (400-500 words)
- AI Technical Analyst - Engineering and implementation analysis (400-500 words)
- Healthcare Regulatory Specialist - Legal compliance and approval pathways (400-500 words)
- Healthcare Economics Analyst - Financial impact and business case (400-500 words)
- Strategic Content Synthesizer - Integrated executive report (1500-2000 words)
- Role: Reference implementation that others are standardized to match
- Strengths: Purpose-built for agent collaboration, intuitive crew management
- Use Case: Natural choice for multi-agent workflows
- Role: General-purpose LLM framework adapted for multi-agent analysis
- Strengths: Mature ecosystem, extensive integrations, production-ready
- Implementation: Custom agent class with manual coordination
- Role: Stateful graph-based workflow engine
- Strengths: Persistent state, visual workflows, sophisticated routing
- Implementation: Graph nodes with shared state management
- Role: RAG-focused framework adapted for agent workflows
- Strengths: Document processing excellence, query optimization
- Implementation: Tool-based approach with comprehensive diagnostics
- One-Command Setup: Complete AI development environment in minutes
- GPU Acceleration: NVIDIA GPU support for faster AI model inference
- Isolated Dependencies: No conflicts with existing Python installations
- Persistent Storage: Data and models preserved between container restarts
- Service Orchestration: All components work together seamlessly
- Identical Agent Definitions: Same roles, goals, and backgrounds across all frameworks
- Identical Prompts: Word-for-word matching instructions for fair comparison
- Identical Workflows: Sequential execution with consistent context passing
- Identical Metrics: Standardized performance measurement and reporting
- Real-time Metrics: Duration, word count, generation speed tracking
- Historical Trending: Performance comparison across multiple runs
- Framework Rankings: Automated performance leaderboards
- Visual Analytics: Interactive charts and performance visualization
- Jupyter Integration: Interactive notebooks for experimentation and development
- Hot Reload: Code changes reflected immediately without container restart
- Comprehensive Logging: Detailed logs for debugging and monitoring
- Volume Mounting: Direct file editing on host machine
- Healthcare AI Assessment: Evaluate new medical AI technologies
- Technology Adoption Planning: Multi-perspective analysis of emerging tools
- Regulatory Strategy: Navigate complex approval processes
- Investment Analysis: Comprehensive due diligence for healthcare tech
- Performance Benchmarking: Compare AI framework efficiency and capabilities
- Architecture Analysis: Understand trade-offs between different approaches
- Development Planning: Choose optimal framework for specific use cases
- Academic Research: Study multi-agent system implementations
- Create new implementation file in
notebooks/ - Follow standardization guidelines from existing implementations
- Update Streamlit interface to include new framework
- Test with Docker environment
- Edit agent definitions in framework files
- Maintain consistency across all framework implementations
- Update documentation for any changes
- Verify standardization remains intact
# Add new services to docker-compose.yml
# Example: Adding Redis for caching
redis:
image: redis:alpine
ports:
- "6379:6379"
networks:
- ai-network# Start all services
docker-compose up
# Start in background
docker-compose up -d
# Stop all services
docker-compose down
# Rebuild containers
docker-compose up --build
# View logs
docker-compose logs
# View logs for specific service
docker-compose logs streamlit# Enter Jupyter container
docker-compose exec jupyter bash
# Enter Streamlit container
docker-compose exec streamlit bash
# Restart specific service
docker-compose restart ollama
# Update AI model
docker-compose exec ollama ollama pull llama3.1:8b-instruct-q4_K_M
# This model will likely be needed to support your Jupyter-Lab environment
docker-compose exec ollama ollama pull nomic-embed-text# Check service status
docker-compose ps
# View resource usage
docker stats
# Clean up
docker-compose down -v # Removes volumes
docker system prune # Clean unused containers/images- Concept Explanations: Clear explanations of agents, frameworks, and workflows
- Practical Examples: Real-world applications of multi-agent systems
- Comparative Learning: Understand different approaches to the same problem
- Hands-on Experience: Interactive exploration of AI capabilities in Jupyter
- Framework Comparison: Direct performance and capability assessment
- Implementation Examples: Complete, working code for each framework
- Best Practices: Standardization techniques for fair evaluation
- Docker Proficiency: Learn containerization for AI applications
- Standardized Benchmarking: Fair comparison methodology
- Performance Analytics: Detailed metrics and trending analysis
- Reproducible Results: Consistent evaluation across frameworks
- Extensible Platform: Foundation for additional research
Traditional AI framework comparisons often use different prompts, agents, or workflows, making it impossible to determine if performance differences are due to framework architecture or implementation differences.
By standardizing every aspect of the multi-agent workflow while preserving each framework's unique architecture, this system enables true "apples-to-apples" comparison of framework capabilities and performance.
The Docker-based setup eliminates the "it works on my machine" problem and provides a consistent, reproducible environment for AI development and research.
The system is optimized for NVIDIA GPUs but works on CPU-only systems:
# In docker-compose.yml - GPU enabled (default)
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
# For CPU-only systems, comment out the deploy section- Memory: Ollama requires 4-8GB RAM depending on model size
- Storage: Models require 4-7GB disk space
- CPU: Multi-threading support for parallel analysis
- Network: Internal Docker network for optimal service communication
We welcome contributions! See CONTRIBUTING.md for detailed guidelines on:
# Fork and clone the repository
git clone https://github.com/yourusername/multi-agent-ai-research-system.git
cd multi-agent-ai-research-system
# Start development environment
cp .env.example .env
docker-compose up
# Make changes and test
docker-compose down && docker-compose up --build- π§ Adding new AI frameworks
- π³ Improving Docker configuration
- π Enhancing performance analytics
- π Improving documentation
- π¨ UI/UX improvements
- π§ͺ Adding test coverage
This project is licensed under the MIT License.
- CrewAI: For the foundational multi-agent framework
- LangChain: For the comprehensive LLM ecosystem
- LangGraph: For advanced stateful workflows
- LlamaIndex: For document processing capabilities
- Ollama: For local AI model deployment
- Streamlit: For the intuitive web interface framework
- Docker: For containerization and development environment consistency
Built with: Python, Docker, Streamlit, Ollama, CrewAI, LangChain, LangGraph, LlamaIndex
Hardware: Optimized for local GPU acceleration (supports NVIDIA GPUs)
Status: Active Development
Environment: Professional Docker-based development setup# π€ Multi-Agent AI Research System
A comprehensive web application for comparing AI agent frameworks through standardized multi-agent workflows. This system enables fair "apples-to-apples" comparison of CrewAI, LangChain, LangGraph, and LlamaIndex by using identical agents, prompts, and workflows across all frameworks.
- Multi-Agent Analysis: Deploy teams of 5 specialized AI agents to analyze complex topics from multiple expert perspectives
- Framework Comparison: Run identical workflows across 4 different AI frameworks to compare performance and capabilities
- Standardized Evaluation: Ensure fair comparison by using identical agent roles, prompts, and workflows across all frameworks
- Real-time Monitoring: Track analysis progress, performance metrics, and generate comprehensive reports
- Healthcare Domain Expert - Medical and clinical perspective (400-500 words)
- AI Technical Analyst - Engineering and implementation analysis (400-500 words)
- Healthcare Regulatory Specialist - Legal compliance and approval pathways (400-500 words)
- Healthcare Economics Analyst - Financial impact and business case (400-500 words)
- Strategic Content Synthesizer - Integrated executive report (1500-2000 words)
- Role: Reference implementation that others are standardized to match
- Strengths: Purpose-built for agent collaboration, intuitive crew management
- Use Case: Natural choice for multi-agent workflows
- Role: General-purpose LLM framework adapted for multi-agent analysis
- Strengths: Mature ecosystem, extensive integrations, production-ready
- Implementation: Custom agent class with manual coordination
- Role: Stateful graph-based workflow engine
- Strengths: Persistent state, visual workflows, sophisticated routing
- Implementation: Graph nodes with shared state management
- Role: RAG-focused framework adapted for agent workflows
- Strengths: Document processing excellence, query optimization
- Implementation: Tool-based approach with comprehensive diagnostics
- Identical Agent Definitions: Same roles, goals, and backgrounds across all frameworks
- Identical Prompts: Word-for-word matching instructions for fair comparison
- Identical Workflows: Sequential execution with consistent context passing
- Identical Metrics: Standardized performance measurement and reporting
- Real-time Metrics: Duration, word count, generation speed tracking
- Historical Trending: Performance comparison across multiple runs
- Framework Rankings: Automated performance leaderboards
- Visual Analytics: Interactive charts and performance visualization
- Web Interface: Clean, responsive Streamlit-based UI
- Predefined Topics: Curated healthcare AI research topics
- Custom Analysis: User-defined research topics and parameters
- Export Options: JSON data export and formatted text reports
- Progress Tracking: Real-time agent status and workflow progress
- Error Handling: Graceful failure recovery with detailed diagnostics
- Debug Mode: Comprehensive troubleshooting for framework issues
- Session Management: Persistent results and analysis history
- Mobile Responsive: Works across desktop, tablet, and mobile devices
- Healthcare AI Assessment: Evaluate new medical AI technologies
- Technology Adoption Planning: Multi-perspective analysis of emerging tools
- Regulatory Strategy: Navigate complex approval processes
- Investment Analysis: Comprehensive due diligence for healthcare tech
- Performance Benchmarking: Compare AI framework efficiency and capabilities
- Architecture Analysis: Understand trade-offs between different approaches
- Development Planning: Choose optimal framework for specific use cases
- Academic Research: Study multi-agent system implementations
- Local AI Models: Ollama integration for privacy and control
- Model: Llama 3.1 8B (quantized for efficiency)
- Hardware: Optimized for RTX 4070 GPU acceleration
- Privacy: No external API calls, complete data privacy
- Modular Design: Each framework in separate, well-documented modules
- Standardized Interface: Consistent API across all implementations
- Error Isolation: Framework failures don't affect others
- Extensible: Easy to add new frameworks or modify existing ones
- Session Persistence: Results maintained across user interactions
- Performance Tracking: Comprehensive metrics storage and analysis
- Export Capabilities: Multiple output formats for different use cases
- History Management: Complete analysis audit trail
- Concept Explanations: Clear explanations of agents, frameworks, and workflows
- Practical Examples: Real-world applications of multi-agent systems
- Comparative Learning: Understand different approaches to the same problem
- Hands-on Experience: Interactive exploration of AI capabilities
- Framework Comparison: Direct performance and capability assessment
- Implementation Examples: Complete, working code for each framework
- Best Practices: Standardization techniques for fair evaluation
- Architecture Patterns: Different approaches to multi-agent coordination
- Standardized Benchmarking: Fair comparison methodology
- Performance Analytics: Detailed metrics and trending analysis
- Reproducible Results: Consistent evaluation across frameworks
- Extensible Platform: Foundation for additional research
Traditional AI framework comparisons often use different prompts, agents, or workflows, making it impossible to determine if performance differences are due to framework architecture or implementation differences.
By standardizing every aspect of the multi-agent workflow while preserving each framework's unique architecture, this system enables true "apples-to-apples" comparison of framework capabilities and performance.
- Developers: Make informed framework selection decisions
- Researchers: Conduct rigorous comparative studies
- Organizations: Evaluate AI technologies with confidence
- Community: Advance understanding of multi-agent system architectures
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull the required model
ollama pull llama3.1:8b-instruct-q4_K_M# Clone the repository
git clone [your-repo-url]
cd multi-agent-ai-research-system
# Install dependencies
pip install -r requirements.txt# Start the Streamlit app
streamlit run streamlit_multi_agent_app.py
# Open your browser to http://localhost:8501multi-agent-ai-research-system/
βββ crewai_functions.py # CrewAI implementation (baseline)
βββ langchain_functions.py # LangChain standardized implementation
βββ langgraph_functions.py # LangGraph standardized implementation
βββ llamaindex_functions.py # LlamaIndex standardized implementation
βββ streamlit_multi_agent_app.py # Main web interface
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ docs/ # Additional documentation
βββ framework_comparison.md # Detailed framework analysis
βββ standardization_guide.md # Standardization methodology
βββ examples/ # Example analyses and outputs
The system tracks and compares:
- Execution Speed: Words generated per second
- Response Quality: Standardized word count targets
- Reliability: Success rates and error handling
- Resource Usage: Memory and computation efficiency
- Scalability: Performance with complex topics
Depending on how many CUDA and Tensors cores your graphics card has, along with the amount of available VRAM will determine how long these models take to complete.
If you have less than 12GB VRAM, then you will like want to consider running a small llama model.
We welcome contributions! Please see our Contributing Guidelines for details on:
- Adding new AI frameworks
- Improving standardization methodology
- Enhancing the user interface
- Adding new analysis domains beyond healthcare
- This repository welcomes contributions via pull requests. Direct pushes to the main branch are not permitted. Please fork the repository and submit your changes via pull request for review.
This project is licensed under the MIT License.
- CrewAI: For the foundational multi-agent framework
- LangChain: For the comprehensive LLM ecosystem
- LangGraph: For advanced stateful workflows
- LlamaIndex: For document processing capabilities
- Ollama: For local AI model deployment
- Streamlit: For the intuitive web interface framework
Built with: Python, Streamlit, Ollama, CrewAI, LangChain, LangGraph, LlamaIndex
Hardware: Optimized for local GPU acceleration (RTX 4070)
Status: Active Development
Contributions: Welcome