🤖 Multi-Agent AI Research System

A professional AI development environment with Docker containers for comparing AI agent frameworks through standardized multi-agent workflows. This system enables fair "apples-to-apples" comparison of CrewAI, LangChain, LangGraph, and LlamaIndex using identical agents, prompts, and workflows.

🚀 Quick Start with Docker

Prerequisites

Docker and Docker Compose installed
NVIDIA GPU with NVIDIA Docker support (optional, for GPU acceleration)
Please see the included NVIDIA_CUDA_Guide.md file for basic CUDA installation instructions

One-Command Setup

# Clone the repository
git clone https://github.com/yourusername/multi-agent-ai-research-system.git
cd multi-agent-ai-research-system

# Copy environment configuration
cp .env.example .env

# Start all services
docker-compose up

Access Your Services

Streamlit Web App: http://localhost:8501
Jupyter Lab: http://localhost:8888 (token: your-secure-token-here)
Ollama API: http://localhost:11434
ChromaDB: http://localhost:8000

First-Time Setup

# Download the AI model (in a separate terminal)
docker-compose exec ollama ollama pull llama3.1:8b-instruct-q4_K_M

# Verify all services are running
docker-compose ps

🏗️ Complete Development Environment

🐳 Docker Services

🦙 Ollama: Local LLM server with GPU support for privacy and performance
🗃️ ChromaDB: Vector database for embeddings and semantic search
📊 Jupyter Lab: Interactive development environment for AI experimentation
🌐 Streamlit: Web interface for multi-agent analysis and framework comparison

📁 Project Structure

multi-agent-ai-research-system/
├── docker-compose.yml           # Service orchestration
├── Dockerfile.jupyter           # Jupyter Lab container
├── Dockerfile.streamlit         # Streamlit app container
├── requirements.txt             # Python dependencies
├── .env.example                 # Environment configuration template
├── notebooks/                   # AI framework implementations
│   ├── crewai_functions.py      # CrewAI implementation (baseline)
│   ├── langchain_functions.py   # LangChain standardized implementation
│   ├── langgraph_functions.py   # LangGraph standardized implementation
│   ├── llamaindex_functions.py  # LlamaIndex standardized implementation
│   └── streamlit_multi_agent_app.py # Main web interface
├── projects/                    # Analysis results and exports
├── data/                        # Data files and documents
└── models/                      # Local AI model storage

🎯 What This System Does

Core Functionality

Multi-Agent Analysis: Deploy teams of 5 specialized AI agents to analyze complex topics from multiple expert perspectives
Framework Comparison: Run identical workflows across 4 different AI frameworks to compare performance and capabilities
Standardized Evaluation: Ensure fair comparison by using identical agent roles, prompts, and workflows across all frameworks
Containerized Environment: Complete development setup with one command - no dependency conflicts or setup issues

The 5-Agent Analysis Team

Healthcare Domain Expert - Medical and clinical perspective (400-500 words)
AI Technical Analyst - Engineering and implementation analysis (400-500 words)
Healthcare Regulatory Specialist - Legal compliance and approval pathways (400-500 words)
Healthcare Economics Analyst - Financial impact and business case (400-500 words)
Strategic Content Synthesizer - Integrated executive report (1500-2000 words)

🔧 Supported AI Frameworks

CrewAI (Baseline)

Role: Reference implementation that others are standardized to match
Strengths: Purpose-built for agent collaboration, intuitive crew management
Use Case: Natural choice for multi-agent workflows

LangChain (Standardized)

Role: General-purpose LLM framework adapted for multi-agent analysis
Strengths: Mature ecosystem, extensive integrations, production-ready
Implementation: Custom agent class with manual coordination

LangGraph (Standardized)

Role: Stateful graph-based workflow engine
Strengths: Persistent state, visual workflows, sophisticated routing
Implementation: Graph nodes with shared state management

LlamaIndex (Standardized)

Role: RAG-focused framework adapted for agent workflows
Strengths: Document processing excellence, query optimization
Implementation: Tool-based approach with comprehensive diagnostics

🚀 Key Features

🐳 Professional Docker Environment

One-Command Setup: Complete AI development environment in minutes
GPU Acceleration: NVIDIA GPU support for faster AI model inference
Isolated Dependencies: No conflicts with existing Python installations
Persistent Storage: Data and models preserved between container restarts
Service Orchestration: All components work together seamlessly

🔄 Standardization Engine

Identical Agent Definitions: Same roles, goals, and backgrounds across all frameworks
Identical Prompts: Word-for-word matching instructions for fair comparison
Identical Workflows: Sequential execution with consistent context passing
Identical Metrics: Standardized performance measurement and reporting

📊 Performance Analytics

Real-time Metrics: Duration, word count, generation speed tracking
Historical Trending: Performance comparison across multiple runs
Framework Rankings: Automated performance leaderboards
Visual Analytics: Interactive charts and performance visualization

💻 Development Experience

Jupyter Integration: Interactive notebooks for experimentation and development
Hot Reload: Code changes reflected immediately without container restart
Comprehensive Logging: Detailed logs for debugging and monitoring
Volume Mounting: Direct file editing on host machine

📊 Example Use Cases

Research Applications

Healthcare AI Assessment: Evaluate new medical AI technologies
Technology Adoption Planning: Multi-perspective analysis of emerging tools
Regulatory Strategy: Navigate complex approval processes
Investment Analysis: Comprehensive due diligence for healthcare tech

Framework Evaluation

Performance Benchmarking: Compare AI framework efficiency and capabilities
Architecture Analysis: Understand trade-offs between different approaches
Development Planning: Choose optimal framework for specific use cases
Academic Research: Study multi-agent system implementations

🛠️ Development and Customization

Adding New Frameworks

Create new implementation file in notebooks/
Follow standardization guidelines from existing implementations
Update Streamlit interface to include new framework
Test with Docker environment

Modifying Agent Behavior

Edit agent definitions in framework files
Maintain consistency across all framework implementations
Update documentation for any changes
Verify standardization remains intact

Extending the Environment

# Add new services to docker-compose.yml
# Example: Adding Redis for caching
redis:
  image: redis:alpine
  ports:
    - "6379:6379"
  networks:
    - ai-network

🐳 Docker Commands Reference

Basic Operations

# Start all services
docker-compose up

# Start in background
docker-compose up -d

# Stop all services
docker-compose down

# Rebuild containers
docker-compose up --build

# View logs
docker-compose logs

# View logs for specific service
docker-compose logs streamlit

Development Commands

# Enter Jupyter container
docker-compose exec jupyter bash

# Enter Streamlit container
docker-compose exec streamlit bash

# Restart specific service
docker-compose restart ollama

# Update AI model
docker-compose exec ollama ollama pull llama3.1:8b-instruct-q4_K_M
# This model will likely be needed to support your Jupyter-Lab environment
docker-compose exec ollama ollama pull nomic-embed-text

Troubleshooting

# Check service status
docker-compose ps

# View resource usage
docker stats

# Clean up
docker-compose down -v  # Removes volumes
docker system prune     # Clean unused containers/images

🎓 Educational Value

For AI Beginners

Concept Explanations: Clear explanations of agents, frameworks, and workflows
Practical Examples: Real-world applications of multi-agent systems
Comparative Learning: Understand different approaches to the same problem
Hands-on Experience: Interactive exploration of AI capabilities in Jupyter

For Developers

Framework Comparison: Direct performance and capability assessment
Implementation Examples: Complete, working code for each framework
Best Practices: Standardization techniques for fair evaluation
Docker Proficiency: Learn containerization for AI applications

For Researchers

Standardized Benchmarking: Fair comparison methodology
Performance Analytics: Detailed metrics and trending analysis
Reproducible Results: Consistent evaluation across frameworks
Extensible Platform: Foundation for additional research

🏆 Why This Matters

Fair Comparison Problem

Traditional AI framework comparisons often use different prompts, agents, or workflows, making it impossible to determine if performance differences are due to framework architecture or implementation differences.

Our Solution

By standardizing every aspect of the multi-agent workflow while preserving each framework's unique architecture, this system enables true "apples-to-apples" comparison of framework capabilities and performance.

Professional Development Environment

The Docker-based setup eliminates the "it works on my machine" problem and provides a consistent, reproducible environment for AI development and research.

⚡ Performance Optimization

GPU Configuration

The system is optimized for NVIDIA GPUs but works on CPU-only systems:

# In docker-compose.yml - GPU enabled (default)
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: all
          capabilities: [gpu]

# For CPU-only systems, comment out the deploy section

Resource Management

Memory: Ollama requires 4-8GB RAM depending on model size
Storage: Models require 4-7GB disk space
CPU: Multi-threading support for parallel analysis
Network: Internal Docker network for optimal service communication

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for detailed guidelines on:

Development Setup

# Fork and clone the repository
git clone https://github.com/yourusername/multi-agent-ai-research-system.git
cd multi-agent-ai-research-system

# Start development environment
cp .env.example .env
docker-compose up

# Make changes and test
docker-compose down && docker-compose up --build

Contribution Areas

🔧 Adding new AI frameworks
🐳 Improving Docker configuration
📊 Enhancing performance analytics
📝 Improving documentation
🎨 UI/UX improvements
🧪 Adding test coverage

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

CrewAI: For the foundational multi-agent framework
LangChain: For the comprehensive LLM ecosystem
LangGraph: For advanced stateful workflows
LlamaIndex: For document processing capabilities
Ollama: For local AI model deployment
Streamlit: For the intuitive web interface framework
Docker: For containerization and development environment consistency

Built with: Python, Docker, Streamlit, Ollama, CrewAI, LangChain, LangGraph, LlamaIndex
Hardware: Optimized for local GPU acceleration (supports NVIDIA GPUs)
Status: Active Development
Environment: Professional Docker-based development setup# 🤖 Multi-Agent AI Research System

A comprehensive web application for comparing AI agent frameworks through standardized multi-agent workflows. This system enables fair "apples-to-apples" comparison of CrewAI, LangChain, LangGraph, and LlamaIndex by using identical agents, prompts, and workflows across all frameworks.

🎯 What This App Does

Core Functionality

Multi-Agent Analysis: Deploy teams of 5 specialized AI agents to analyze complex topics from multiple expert perspectives
Framework Comparison: Run identical workflows across 4 different AI frameworks to compare performance and capabilities
Standardized Evaluation: Ensure fair comparison by using identical agent roles, prompts, and workflows across all frameworks
Real-time Monitoring: Track analysis progress, performance metrics, and generate comprehensive reports

The 5-Agent Analysis Team

Healthcare Domain Expert - Medical and clinical perspective (400-500 words)
AI Technical Analyst - Engineering and implementation analysis (400-500 words)
Healthcare Regulatory Specialist - Legal compliance and approval pathways (400-500 words)
Healthcare Economics Analyst - Financial impact and business case (400-500 words)
Strategic Content Synthesizer - Integrated executive report (1500-2000 words)

🔧 Supported AI Frameworks

CrewAI (Baseline)

Role: Reference implementation that others are standardized to match
Strengths: Purpose-built for agent collaboration, intuitive crew management
Use Case: Natural choice for multi-agent workflows

LangChain (Standardized)

Role: General-purpose LLM framework adapted for multi-agent analysis
Strengths: Mature ecosystem, extensive integrations, production-ready
Implementation: Custom agent class with manual coordination

LangGraph (Standardized)

Role: Stateful graph-based workflow engine
Strengths: Persistent state, visual workflows, sophisticated routing
Implementation: Graph nodes with shared state management

LlamaIndex (Standardized)

Role: RAG-focused framework adapted for agent workflows
Strengths: Document processing excellence, query optimization
Implementation: Tool-based approach with comprehensive diagnostics

🚀 Key Features

Standardization Engine

Identical Agent Definitions: Same roles, goals, and backgrounds across all frameworks
Identical Prompts: Word-for-word matching instructions for fair comparison
Identical Workflows: Sequential execution with consistent context passing
Identical Metrics: Standardized performance measurement and reporting

Performance Analytics

Real-time Metrics: Duration, word count, generation speed tracking
Historical Trending: Performance comparison across multiple runs
Framework Rankings: Automated performance leaderboards
Visual Analytics: Interactive charts and performance visualization

User Experience

Web Interface: Clean, responsive Streamlit-based UI
Predefined Topics: Curated healthcare AI research topics
Custom Analysis: User-defined research topics and parameters
Export Options: JSON data export and formatted text reports
Progress Tracking: Real-time agent status and workflow progress

Advanced Capabilities

Error Handling: Graceful failure recovery with detailed diagnostics
Debug Mode: Comprehensive troubleshooting for framework issues
Session Management: Persistent results and analysis history
Mobile Responsive: Works across desktop, tablet, and mobile devices

📊 Example Use Cases

Research Applications

Healthcare AI Assessment: Evaluate new medical AI technologies
Technology Adoption Planning: Multi-perspective analysis of emerging tools
Regulatory Strategy: Navigate complex approval processes
Investment Analysis: Comprehensive due diligence for healthcare tech

Framework Evaluation

Performance Benchmarking: Compare AI framework efficiency and capabilities
Architecture Analysis: Understand trade-offs between different approaches
Development Planning: Choose optimal framework for specific use cases
Academic Research: Study multi-agent system implementations

🏗️ Technical Architecture

Backend Infrastructure

Local AI Models: Ollama integration for privacy and control
Model: Llama 3.1 8B (quantized for efficiency)
Hardware: Optimized for RTX 4070 GPU acceleration
Privacy: No external API calls, complete data privacy

Framework Integration

Modular Design: Each framework in separate, well-documented modules
Standardized Interface: Consistent API across all implementations
Error Isolation: Framework failures don't affect others
Extensible: Easy to add new frameworks or modify existing ones

Data Management

Session Persistence: Results maintained across user interactions
Performance Tracking: Comprehensive metrics storage and analysis
Export Capabilities: Multiple output formats for different use cases
History Management: Complete analysis audit trail

🎓 Educational Value

For AI Beginners

Concept Explanations: Clear explanations of agents, frameworks, and workflows
Practical Examples: Real-world applications of multi-agent systems
Comparative Learning: Understand different approaches to the same problem
Hands-on Experience: Interactive exploration of AI capabilities

For Developers

Framework Comparison: Direct performance and capability assessment
Implementation Examples: Complete, working code for each framework
Best Practices: Standardization techniques for fair evaluation
Architecture Patterns: Different approaches to multi-agent coordination

For Researchers

Standardized Benchmarking: Fair comparison methodology
Performance Analytics: Detailed metrics and trending analysis
Reproducible Results: Consistent evaluation across frameworks
Extensible Platform: Foundation for additional research

🏆 Why This Matters

Fair Comparison Problem

Traditional AI framework comparisons often use different prompts, agents, or workflows, making it impossible to determine if performance differences are due to framework architecture or implementation differences.

Our Solution

By standardizing every aspect of the multi-agent workflow while preserving each framework's unique architecture, this system enables true "apples-to-apples" comparison of framework capabilities and performance.

Impact

Developers: Make informed framework selection decisions
Researchers: Conduct rigorous comparative studies
Organizations: Evaluate AI technologies with confidence
Community: Advance understanding of multi-agent system architectures

🚀 Quick Start

Prerequisites

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the required model
ollama pull llama3.1:8b-instruct-q4_K_M

Installation

# Clone the repository
git clone [your-repo-url]
cd multi-agent-ai-research-system

# Install dependencies
pip install -r requirements.txt

Running the Application

# Start the Streamlit app
streamlit run streamlit_multi_agent_app.py

# Open your browser to http://localhost:8501

📁 Project Structure

multi-agent-ai-research-system/
├── crewai_functions.py          # CrewAI implementation (baseline)
├── langchain_functions.py       # LangChain standardized implementation
├── langgraph_functions.py       # LangGraph standardized implementation
├── llamaindex_functions.py      # LlamaIndex standardized implementation
├── streamlit_multi_agent_app.py # Main web interface
├── requirements.txt             # Python dependencies
├── README.md                    # This file
└── docs/                        # Additional documentation
    ├── framework_comparison.md  # Detailed framework analysis
    ├── standardization_guide.md # Standardization methodology
    └── examples/                # Example analyses and outputs

📊 Performance Metrics

The system tracks and compares:

Execution Speed: Words generated per second
Response Quality: Standardized word count targets
Reliability: Success rates and error handling
Resource Usage: Memory and computation efficiency
Scalability: Performance with complex topics

Please Note

Depending on how many CUDA and Tensors cores your graphics card has, along with the amount of available VRAM will determine how long these models take to complete.
If you have less than 12GB VRAM, then you will like want to consider running a small llama model.

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details on:

Adding new AI frameworks
Improving standardization methodology
Enhancing the user interface
Adding new analysis domains beyond healthcare
This repository welcomes contributions via pull requests. Direct pushes to the main branch are not permitted. Please fork the repository and submit your changes via pull request for review.

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

CrewAI: For the foundational multi-agent framework
LangChain: For the comprehensive LLM ecosystem
LangGraph: For advanced stateful workflows
LlamaIndex: For document processing capabilities
Ollama: For local AI model deployment
Streamlit: For the intuitive web interface framework

Built with: Python, Streamlit, Ollama, CrewAI, LangChain, LangGraph, LlamaIndex
Hardware: Optimized for local GPU acceleration (RTX 4070)
Status: Active Development
Contributions: Welcome

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
notebooks		notebooks
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.jupyter		Dockerfile.jupyter
Dockerfile.streamlit		Dockerfile.streamlit
LICENSE		LICENSE
NVIDIA_CUDA_Guide.md		NVIDIA_CUDA_Guide.md
README.md		README.md
agent_framework_comparison.md		agent_framework_comparison.md
docker-compose.yml		docker-compose.yml
env.example		env.example
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🤖 Multi-Agent AI Research System

🚀 Quick Start with Docker

Prerequisites

One-Command Setup

Access Your Services

First-Time Setup

🏗️ Complete Development Environment

🐳 Docker Services

📁 Project Structure

🎯 What This System Does

Core Functionality

The 5-Agent Analysis Team

🔧 Supported AI Frameworks

CrewAI (Baseline)

LangChain (Standardized)

LangGraph (Standardized)

LlamaIndex (Standardized)

🚀 Key Features

🐳 Professional Docker Environment

🔄 Standardization Engine

📊 Performance Analytics

💻 Development Experience

📊 Example Use Cases

Research Applications

Framework Evaluation

🛠️ Development and Customization

Adding New Frameworks

Modifying Agent Behavior

Extending the Environment

🐳 Docker Commands Reference

Basic Operations

Development Commands

Troubleshooting

🎓 Educational Value

For AI Beginners

For Developers

For Researchers

🏆 Why This Matters

Fair Comparison Problem

Our Solution

Professional Development Environment

⚡ Performance Optimization

GPU Configuration

Resource Management

🤝 Contributing

Development Setup

Contribution Areas

📄 License

🙏 Acknowledgments

🎯 What This App Does

Core Functionality

The 5-Agent Analysis Team

🔧 Supported AI Frameworks

CrewAI (Baseline)

LangChain (Standardized)

LangGraph (Standardized)

LlamaIndex (Standardized)

🚀 Key Features

Standardization Engine

Performance Analytics

User Experience

Advanced Capabilities

📊 Example Use Cases

Research Applications

Framework Evaluation

🏗️ Technical Architecture

Backend Infrastructure

Framework Integration

Data Management

🎓 Educational Value

For AI Beginners

For Developers

For Researchers

🏆 Why This Matters

Fair Comparison Problem

Our Solution

Packages