Skip to content

griswold-tech/multi-agent-ai-research-system

Repository files navigation

πŸ€– Multi-Agent AI Research System

A professional AI development environment with Docker containers for comparing AI agent frameworks through standardized multi-agent workflows. This system enables fair "apples-to-apples" comparison of CrewAI, LangChain, LangGraph, and LlamaIndex using identical agents, prompts, and workflows.

πŸš€ Quick Start with Docker

Prerequisites

One-Command Setup

# Clone the repository
git clone https://github.com/yourusername/multi-agent-ai-research-system.git
cd multi-agent-ai-research-system

# Copy environment configuration
cp .env.example .env

# Start all services
docker-compose up

Access Your Services

First-Time Setup

# Download the AI model (in a separate terminal)
docker-compose exec ollama ollama pull llama3.1:8b-instruct-q4_K_M

# Verify all services are running
docker-compose ps

πŸ—οΈ Complete Development Environment

🐳 Docker Services

  • πŸ¦™ Ollama: Local LLM server with GPU support for privacy and performance
  • πŸ—ƒοΈ ChromaDB: Vector database for embeddings and semantic search
  • πŸ“Š Jupyter Lab: Interactive development environment for AI experimentation
  • 🌐 Streamlit: Web interface for multi-agent analysis and framework comparison

πŸ“ Project Structure

multi-agent-ai-research-system/
β”œβ”€β”€ docker-compose.yml           # Service orchestration
β”œβ”€β”€ Dockerfile.jupyter           # Jupyter Lab container
β”œβ”€β”€ Dockerfile.streamlit         # Streamlit app container
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ .env.example                 # Environment configuration template
β”œβ”€β”€ notebooks/                   # AI framework implementations
β”‚   β”œβ”€β”€ crewai_functions.py      # CrewAI implementation (baseline)
β”‚   β”œβ”€β”€ langchain_functions.py   # LangChain standardized implementation
β”‚   β”œβ”€β”€ langgraph_functions.py   # LangGraph standardized implementation
β”‚   β”œβ”€β”€ llamaindex_functions.py  # LlamaIndex standardized implementation
β”‚   └── streamlit_multi_agent_app.py # Main web interface
β”œβ”€β”€ projects/                    # Analysis results and exports
β”œβ”€β”€ data/                        # Data files and documents
└── models/                      # Local AI model storage

🎯 What This System Does

Core Functionality

  • Multi-Agent Analysis: Deploy teams of 5 specialized AI agents to analyze complex topics from multiple expert perspectives
  • Framework Comparison: Run identical workflows across 4 different AI frameworks to compare performance and capabilities
  • Standardized Evaluation: Ensure fair comparison by using identical agent roles, prompts, and workflows across all frameworks
  • Containerized Environment: Complete development setup with one command - no dependency conflicts or setup issues

The 5-Agent Analysis Team

  1. Healthcare Domain Expert - Medical and clinical perspective (400-500 words)
  2. AI Technical Analyst - Engineering and implementation analysis (400-500 words)
  3. Healthcare Regulatory Specialist - Legal compliance and approval pathways (400-500 words)
  4. Healthcare Economics Analyst - Financial impact and business case (400-500 words)
  5. Strategic Content Synthesizer - Integrated executive report (1500-2000 words)

πŸ”§ Supported AI Frameworks

CrewAI (Baseline)

  • Role: Reference implementation that others are standardized to match
  • Strengths: Purpose-built for agent collaboration, intuitive crew management
  • Use Case: Natural choice for multi-agent workflows

LangChain (Standardized)

  • Role: General-purpose LLM framework adapted for multi-agent analysis
  • Strengths: Mature ecosystem, extensive integrations, production-ready
  • Implementation: Custom agent class with manual coordination

LangGraph (Standardized)

  • Role: Stateful graph-based workflow engine
  • Strengths: Persistent state, visual workflows, sophisticated routing
  • Implementation: Graph nodes with shared state management

LlamaIndex (Standardized)

  • Role: RAG-focused framework adapted for agent workflows
  • Strengths: Document processing excellence, query optimization
  • Implementation: Tool-based approach with comprehensive diagnostics

πŸš€ Key Features

🐳 Professional Docker Environment

  • One-Command Setup: Complete AI development environment in minutes
  • GPU Acceleration: NVIDIA GPU support for faster AI model inference
  • Isolated Dependencies: No conflicts with existing Python installations
  • Persistent Storage: Data and models preserved between container restarts
  • Service Orchestration: All components work together seamlessly

πŸ”„ Standardization Engine

  • Identical Agent Definitions: Same roles, goals, and backgrounds across all frameworks
  • Identical Prompts: Word-for-word matching instructions for fair comparison
  • Identical Workflows: Sequential execution with consistent context passing
  • Identical Metrics: Standardized performance measurement and reporting

πŸ“Š Performance Analytics

  • Real-time Metrics: Duration, word count, generation speed tracking
  • Historical Trending: Performance comparison across multiple runs
  • Framework Rankings: Automated performance leaderboards
  • Visual Analytics: Interactive charts and performance visualization

πŸ’» Development Experience

  • Jupyter Integration: Interactive notebooks for experimentation and development
  • Hot Reload: Code changes reflected immediately without container restart
  • Comprehensive Logging: Detailed logs for debugging and monitoring
  • Volume Mounting: Direct file editing on host machine

πŸ“Š Example Use Cases

Research Applications

  • Healthcare AI Assessment: Evaluate new medical AI technologies
  • Technology Adoption Planning: Multi-perspective analysis of emerging tools
  • Regulatory Strategy: Navigate complex approval processes
  • Investment Analysis: Comprehensive due diligence for healthcare tech

Framework Evaluation

  • Performance Benchmarking: Compare AI framework efficiency and capabilities
  • Architecture Analysis: Understand trade-offs between different approaches
  • Development Planning: Choose optimal framework for specific use cases
  • Academic Research: Study multi-agent system implementations

πŸ› οΈ Development and Customization

Adding New Frameworks

  1. Create new implementation file in notebooks/
  2. Follow standardization guidelines from existing implementations
  3. Update Streamlit interface to include new framework
  4. Test with Docker environment

Modifying Agent Behavior

  1. Edit agent definitions in framework files
  2. Maintain consistency across all framework implementations
  3. Update documentation for any changes
  4. Verify standardization remains intact

Extending the Environment

# Add new services to docker-compose.yml
# Example: Adding Redis for caching
redis:
  image: redis:alpine
  ports:
    - "6379:6379"
  networks:
    - ai-network

🐳 Docker Commands Reference

Basic Operations

# Start all services
docker-compose up

# Start in background
docker-compose up -d

# Stop all services
docker-compose down

# Rebuild containers
docker-compose up --build

# View logs
docker-compose logs

# View logs for specific service
docker-compose logs streamlit

Development Commands

# Enter Jupyter container
docker-compose exec jupyter bash

# Enter Streamlit container
docker-compose exec streamlit bash

# Restart specific service
docker-compose restart ollama

# Update AI model
docker-compose exec ollama ollama pull llama3.1:8b-instruct-q4_K_M
# This model will likely be needed to support your Jupyter-Lab environment
docker-compose exec ollama ollama pull nomic-embed-text

Troubleshooting

# Check service status
docker-compose ps

# View resource usage
docker stats

# Clean up
docker-compose down -v  # Removes volumes
docker system prune     # Clean unused containers/images

πŸŽ“ Educational Value

For AI Beginners

  • Concept Explanations: Clear explanations of agents, frameworks, and workflows
  • Practical Examples: Real-world applications of multi-agent systems
  • Comparative Learning: Understand different approaches to the same problem
  • Hands-on Experience: Interactive exploration of AI capabilities in Jupyter

For Developers

  • Framework Comparison: Direct performance and capability assessment
  • Implementation Examples: Complete, working code for each framework
  • Best Practices: Standardization techniques for fair evaluation
  • Docker Proficiency: Learn containerization for AI applications

For Researchers

  • Standardized Benchmarking: Fair comparison methodology
  • Performance Analytics: Detailed metrics and trending analysis
  • Reproducible Results: Consistent evaluation across frameworks
  • Extensible Platform: Foundation for additional research

πŸ† Why This Matters

Fair Comparison Problem

Traditional AI framework comparisons often use different prompts, agents, or workflows, making it impossible to determine if performance differences are due to framework architecture or implementation differences.

Our Solution

By standardizing every aspect of the multi-agent workflow while preserving each framework's unique architecture, this system enables true "apples-to-apples" comparison of framework capabilities and performance.

Professional Development Environment

The Docker-based setup eliminates the "it works on my machine" problem and provides a consistent, reproducible environment for AI development and research.

⚑ Performance Optimization

GPU Configuration

The system is optimized for NVIDIA GPUs but works on CPU-only systems:

# In docker-compose.yml - GPU enabled (default)
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: all
          capabilities: [gpu]

# For CPU-only systems, comment out the deploy section

Resource Management

  • Memory: Ollama requires 4-8GB RAM depending on model size
  • Storage: Models require 4-7GB disk space
  • CPU: Multi-threading support for parallel analysis
  • Network: Internal Docker network for optimal service communication

🀝 Contributing

We welcome contributions! See CONTRIBUTING.md for detailed guidelines on:

Development Setup

# Fork and clone the repository
git clone https://github.com/yourusername/multi-agent-ai-research-system.git
cd multi-agent-ai-research-system

# Start development environment
cp .env.example .env
docker-compose up

# Make changes and test
docker-compose down && docker-compose up --build

Contribution Areas

  • πŸ”§ Adding new AI frameworks
  • 🐳 Improving Docker configuration
  • πŸ“Š Enhancing performance analytics
  • πŸ“ Improving documentation
  • 🎨 UI/UX improvements
  • πŸ§ͺ Adding test coverage

πŸ“„ License

This project is licensed under the MIT License.

πŸ™ Acknowledgments

  • CrewAI: For the foundational multi-agent framework
  • LangChain: For the comprehensive LLM ecosystem
  • LangGraph: For advanced stateful workflows
  • LlamaIndex: For document processing capabilities
  • Ollama: For local AI model deployment
  • Streamlit: For the intuitive web interface framework
  • Docker: For containerization and development environment consistency

Built with: Python, Docker, Streamlit, Ollama, CrewAI, LangChain, LangGraph, LlamaIndex
Hardware: Optimized for local GPU acceleration (supports NVIDIA GPUs)
Status: Active Development
Environment: Professional Docker-based development setup# πŸ€– Multi-Agent AI Research System

A comprehensive web application for comparing AI agent frameworks through standardized multi-agent workflows. This system enables fair "apples-to-apples" comparison of CrewAI, LangChain, LangGraph, and LlamaIndex by using identical agents, prompts, and workflows across all frameworks.

🎯 What This App Does

Core Functionality

  • Multi-Agent Analysis: Deploy teams of 5 specialized AI agents to analyze complex topics from multiple expert perspectives
  • Framework Comparison: Run identical workflows across 4 different AI frameworks to compare performance and capabilities
  • Standardized Evaluation: Ensure fair comparison by using identical agent roles, prompts, and workflows across all frameworks
  • Real-time Monitoring: Track analysis progress, performance metrics, and generate comprehensive reports

The 5-Agent Analysis Team

  1. Healthcare Domain Expert - Medical and clinical perspective (400-500 words)
  2. AI Technical Analyst - Engineering and implementation analysis (400-500 words)
  3. Healthcare Regulatory Specialist - Legal compliance and approval pathways (400-500 words)
  4. Healthcare Economics Analyst - Financial impact and business case (400-500 words)
  5. Strategic Content Synthesizer - Integrated executive report (1500-2000 words)

πŸ”§ Supported AI Frameworks

CrewAI (Baseline)

  • Role: Reference implementation that others are standardized to match
  • Strengths: Purpose-built for agent collaboration, intuitive crew management
  • Use Case: Natural choice for multi-agent workflows

LangChain (Standardized)

  • Role: General-purpose LLM framework adapted for multi-agent analysis
  • Strengths: Mature ecosystem, extensive integrations, production-ready
  • Implementation: Custom agent class with manual coordination

LangGraph (Standardized)

  • Role: Stateful graph-based workflow engine
  • Strengths: Persistent state, visual workflows, sophisticated routing
  • Implementation: Graph nodes with shared state management

LlamaIndex (Standardized)

  • Role: RAG-focused framework adapted for agent workflows
  • Strengths: Document processing excellence, query optimization
  • Implementation: Tool-based approach with comprehensive diagnostics

πŸš€ Key Features

Standardization Engine

  • Identical Agent Definitions: Same roles, goals, and backgrounds across all frameworks
  • Identical Prompts: Word-for-word matching instructions for fair comparison
  • Identical Workflows: Sequential execution with consistent context passing
  • Identical Metrics: Standardized performance measurement and reporting

Performance Analytics

  • Real-time Metrics: Duration, word count, generation speed tracking
  • Historical Trending: Performance comparison across multiple runs
  • Framework Rankings: Automated performance leaderboards
  • Visual Analytics: Interactive charts and performance visualization

User Experience

  • Web Interface: Clean, responsive Streamlit-based UI
  • Predefined Topics: Curated healthcare AI research topics
  • Custom Analysis: User-defined research topics and parameters
  • Export Options: JSON data export and formatted text reports
  • Progress Tracking: Real-time agent status and workflow progress

Advanced Capabilities

  • Error Handling: Graceful failure recovery with detailed diagnostics
  • Debug Mode: Comprehensive troubleshooting for framework issues
  • Session Management: Persistent results and analysis history
  • Mobile Responsive: Works across desktop, tablet, and mobile devices

πŸ“Š Example Use Cases

Research Applications

  • Healthcare AI Assessment: Evaluate new medical AI technologies
  • Technology Adoption Planning: Multi-perspective analysis of emerging tools
  • Regulatory Strategy: Navigate complex approval processes
  • Investment Analysis: Comprehensive due diligence for healthcare tech

Framework Evaluation

  • Performance Benchmarking: Compare AI framework efficiency and capabilities
  • Architecture Analysis: Understand trade-offs between different approaches
  • Development Planning: Choose optimal framework for specific use cases
  • Academic Research: Study multi-agent system implementations

πŸ—οΈ Technical Architecture

Backend Infrastructure

  • Local AI Models: Ollama integration for privacy and control
  • Model: Llama 3.1 8B (quantized for efficiency)
  • Hardware: Optimized for RTX 4070 GPU acceleration
  • Privacy: No external API calls, complete data privacy

Framework Integration

  • Modular Design: Each framework in separate, well-documented modules
  • Standardized Interface: Consistent API across all implementations
  • Error Isolation: Framework failures don't affect others
  • Extensible: Easy to add new frameworks or modify existing ones

Data Management

  • Session Persistence: Results maintained across user interactions
  • Performance Tracking: Comprehensive metrics storage and analysis
  • Export Capabilities: Multiple output formats for different use cases
  • History Management: Complete analysis audit trail

πŸŽ“ Educational Value

For AI Beginners

  • Concept Explanations: Clear explanations of agents, frameworks, and workflows
  • Practical Examples: Real-world applications of multi-agent systems
  • Comparative Learning: Understand different approaches to the same problem
  • Hands-on Experience: Interactive exploration of AI capabilities

For Developers

  • Framework Comparison: Direct performance and capability assessment
  • Implementation Examples: Complete, working code for each framework
  • Best Practices: Standardization techniques for fair evaluation
  • Architecture Patterns: Different approaches to multi-agent coordination

For Researchers

  • Standardized Benchmarking: Fair comparison methodology
  • Performance Analytics: Detailed metrics and trending analysis
  • Reproducible Results: Consistent evaluation across frameworks
  • Extensible Platform: Foundation for additional research

πŸ† Why This Matters

Fair Comparison Problem

Traditional AI framework comparisons often use different prompts, agents, or workflows, making it impossible to determine if performance differences are due to framework architecture or implementation differences.

Our Solution

By standardizing every aspect of the multi-agent workflow while preserving each framework's unique architecture, this system enables true "apples-to-apples" comparison of framework capabilities and performance.

Impact

  • Developers: Make informed framework selection decisions
  • Researchers: Conduct rigorous comparative studies
  • Organizations: Evaluate AI technologies with confidence
  • Community: Advance understanding of multi-agent system architectures

πŸš€ Quick Start

Prerequisites

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the required model
ollama pull llama3.1:8b-instruct-q4_K_M

Installation

# Clone the repository
git clone [your-repo-url]
cd multi-agent-ai-research-system

# Install dependencies
pip install -r requirements.txt

Running the Application

# Start the Streamlit app
streamlit run streamlit_multi_agent_app.py

# Open your browser to http://localhost:8501

πŸ“ Project Structure

multi-agent-ai-research-system/
β”œβ”€β”€ crewai_functions.py          # CrewAI implementation (baseline)
β”œβ”€β”€ langchain_functions.py       # LangChain standardized implementation
β”œβ”€β”€ langgraph_functions.py       # LangGraph standardized implementation
β”œβ”€β”€ llamaindex_functions.py      # LlamaIndex standardized implementation
β”œβ”€β”€ streamlit_multi_agent_app.py # Main web interface
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ README.md                    # This file
└── docs/                        # Additional documentation
    β”œβ”€β”€ framework_comparison.md  # Detailed framework analysis
    β”œβ”€β”€ standardization_guide.md # Standardization methodology
    └── examples/                # Example analyses and outputs

πŸ“Š Performance Metrics

The system tracks and compares:

  • Execution Speed: Words generated per second
  • Response Quality: Standardized word count targets
  • Reliability: Success rates and error handling
  • Resource Usage: Memory and computation efficiency
  • Scalability: Performance with complex topics

Please Note

Depending on how many CUDA and Tensors cores your graphics card has, along with the amount of available VRAM will determine how long these models take to complete.
If you have less than 12GB VRAM, then you will like want to consider running a small llama model.

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details on:

  • Adding new AI frameworks
  • Improving standardization methodology
  • Enhancing the user interface
  • Adding new analysis domains beyond healthcare
  • This repository welcomes contributions via pull requests. Direct pushes to the main branch are not permitted. Please fork the repository and submit your changes via pull request for review.

πŸ“„ License

This project is licensed under the MIT License.

πŸ™ Acknowledgments

  • CrewAI: For the foundational multi-agent framework
  • LangChain: For the comprehensive LLM ecosystem
  • LangGraph: For advanced stateful workflows
  • LlamaIndex: For document processing capabilities
  • Ollama: For local AI model deployment
  • Streamlit: For the intuitive web interface framework

Built with: Python, Streamlit, Ollama, CrewAI, LangChain, LangGraph, LlamaIndex
Hardware: Optimized for local GPU acceleration (RTX 4070)
Status: Active Development
Contributions: Welcome

Releases

No releases published

Packages

 
 
 

Contributors

Languages