A production-ready MCP (Model Context Protocol) server that enables comprehensive research and analysis capabilities for Claude and other MCP-compatible AI assistants. This server integrates web and academic search functionality with an optional web interface for interactive research and AI-powered report generation.
# 1. Clone and navigate to the project
git clone https://github.com/DeepKariaX/Analysis-Alpaca-Researcher.git
cd Analysis-Alpaca-Researcher
# 2. Install dependencies (use virtual environment recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -e .
# 3. Start the MCP server
python http_server.py
# Server runs on http://localhost:8001
# API documentation: http://localhost:8001/docs- Features
- Architecture
- Installation
- Configuration
- Usage
- Web Interface
- API Reference
- Development
- Testing
- Deployment
- Troubleshooting
- Contributing
- Multi-Source Search: Combines DuckDuckGo web search and Semantic Scholar academic research
- Content Extraction: Intelligent extraction of relevant information from web pages
- Academic Integration: Direct access to scholarly articles and research papers
- Smart Formatting: Properly formatted research with citations and structured output
- Rate Limiting: Built-in retry logic and graceful handling of API limits
- Interactive Research: User-friendly web interface for conducting research
- Job Management: Track multiple research jobs with progress monitoring
- AI-Powered Reports: Generate comprehensive PDF reports using OpenAI, Anthropic, or Groq
- PDF Export: Download research results as properly named PDF files
- Real-time Updates: Live progress tracking with WebSocket-like polling
- Comprehensive Error Handling: Graceful degradation when services are unavailable
- Extensive Logging: Detailed logging for debugging and monitoring
- Configurable Settings: Environment-based configuration management
- Auto-Dependency Installation: Automatic installation of missing dependencies
- Modular Architecture: Easy to extend and customize
analysis_alpaca/
βββ src/analysis_alpaca/ # Core MCP server implementation
β βββ core/ # Server and research orchestration
β βββ search/ # Search engine implementations
β βββ models/ # Data models and schemas
β βββ utils/ # Utility functions and helpers
β βββ exceptions/ # Custom exception handling
βββ web_ui/ # Optional web interface
β βββ frontend/ # React.js frontend application
β βββ backend/ # FastAPI backend for web UI
βββ tests/ # Test suite
βββ http_server.py # HTTP API wrapper for MCP server
βββ requirements.txt # Unified dependencies
-
MCP Server (
src/analysis_alpaca/core/server.py)- FastMCP-based server exposing research tools to Claude
- Main tool:
deep_research()for comprehensive research - Built-in prompt templates for structured research methodology
-
Research Service (
src/analysis_alpaca/core/research_service.py)- Orchestrates the entire research workflow
- Coordinates web and academic searches
- Manages content extraction and result formatting
- Handles parallel execution and error recovery
-
Search Implementations
- WebSearcher: DuckDuckGo web search with result parsing
- AcademicSearcher: Semantic Scholar API integration with retry logic
- ContentExtractor: Web page content extraction and processing
-
HTTP Server (
http_server.py)- REST API wrapper for MCP functionality
- Enables direct HTTP access to research capabilities
- CORS-enabled for web interface integration
-
Web Interface
- Frontend: React.js application with PDF generation
- Backend: FastAPI server for job management and AI report generation
- Python 3.8+ (recommended: Python 3.11+)
- Node.js 16+ (only if using web interface)
- npm or yarn (only if using web interface)
# Clone the repository
git clone https://github.com/DeepKariaX/Analysis-Alpaca-Researcher.git
cd Analysis-Alpaca-Researcher
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install the package
pip install -e .
# Or install with all optional dependencies
pip install -e ".[dev,ai]"# Install frontend dependencies
cd web_ui/frontend
npm install
# Return to project root
cd ../..Core Dependencies:
httpx>=0.25.0- HTTP client for API requestsbeautifulsoup4>=4.12.0- HTML parsing for content extractionmcp>=0.1.0- Model Context Protocol server frameworkfastapi>=0.104.0- Web framework for HTTP APIuvicorn>=0.24.0- ASGI server for FastAPI
Optional AI Dependencies:
pip install -e ".[ai]" # Installs OpenAI, Anthropic, and Groq clientsDevelopment Dependencies:
pip install -e ".[dev]" # Installs testing and linting toolsCreate a .env file in the project root:
# Search Configuration
AA_MAX_RESULTS=5 # Maximum results per search
AA_DEFAULT_NUM_RESULTS=3 # Default number of results
AA_WEB_TIMEOUT=15.0 # Web search timeout (seconds)
AA_USER_AGENT="AnalysisAlpaca 1.0"
# Content Configuration
AA_MAX_CONTENT_SIZE=10000 # Maximum response size
AA_MAX_EXTRACTION_SIZE=150000 # Maximum content to extract
# Server Configuration
AA_LOG_LEVEL=INFO # Logging level (DEBUG, INFO, WARNING, ERROR)
AA_LOG_FILE="logs/research.log" # Optional log file path
AA_AUTO_INSTALL_DEPS=true # Auto-install missing dependencies
# AI Provider API Keys (Optional - for web interface)
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
GROQ_API_KEY=your_groq_key_here
# Web UI Configuration
MCP_SERVER_URL=http://localhost:8001 # URL of the MCP HTTP serverThe system uses a hierarchical configuration approach:
- Default values in
config.py - Environment variables (override defaults)
- Optional
.envfile (override environment)
Add to your Claude Desktop configuration:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"analysis-alpaca": {
"command": "/path/to/python",
"args": ["/path/to/analysis_alpaca/http_server.py"],
"env": {
"AA_MAX_RESULTS": "5",
"AA_LOG_LEVEL": "INFO"
}
}
}
}# Start the HTTP API server
python http_server.py
# Server runs on http://localhost:8001
# API documentation available at http://localhost:8001/docsThe web interface provides a user-friendly way to interact with AnalysisAlpaca through a browser.
Requirements:
- Node.js 16+ and npm for the frontend
- The MCP HTTP server must be running (see above)
Setup:
# Install frontend dependencies
cd web_ui/frontend
npm install
cd ../..Manual Startup (2 terminals required):
Terminal 1 - Backend API Server:
cd web_ui/backend
python main.py
# Backend runs on http://localhost:8000
# API documentation: http://localhost:8000/docsTerminal 2 - Frontend Development Server:
cd web_ui/frontend
npm start
# Frontend runs on http://localhost:3000
# Access the web interface at http://localhost:3000Complete Setup (3 servers total):
- MCP Server (Terminal 1):
python http_server.pyβ http://localhost:8001 - Backend API (Terminal 2):
cd web_ui/backend && python main.pyβ http://localhost:8000 - Frontend UI (Terminal 3):
cd web_ui/frontend && npm startβ http://localhost:3000
The main deep_research tool accepts these parameters:
- query (required): The research question or topic
- sources (optional): "web", "academic", or "both" (default: "both")
- num_results (optional): Number of sources to examine (default: 2)
Research the latest developments in quantum computing using both web and academic sources.
Can you do comprehensive research on climate change mitigation strategies? Focus on academic sources and examine 3 results.
I need detailed information about the impact of artificial intelligence on healthcare. Use the deep_research tool with web sources only.
# Research via HTTP API
curl -X POST "http://localhost:8001/deep_research" \
-H "Content-Type: application/json" \
-d '{
"query": "artificial intelligence in healthcare",
"sources": "both",
"num_results": 3
}'- Research Form: Interactive form to submit research queries
- Progress Tracking: Real-time progress updates with detailed logs
- Job Management: View and manage multiple research jobs
- AI Report Generation: Generate comprehensive reports using various LLM providers
- PDF Export: Download reports as properly named PDF files
- History: Browse previous research jobs and results
- OpenAI: GPT-4, GPT-3.5-turbo, and other models
- Anthropic: Claude 3 (Sonnet, Opus, Haiku)
- Groq: Fast inference with various open-source models
Downloaded reports use the format: {sanitized_title}_{source_type}.pdf
Example: artificial_intelligence_healthcare_web_academic.pdf
Perform comprehensive research on a topic.
Parameters:
query(string, required): Research question or topicsources(string, optional): Source type ("web", "academic", "both")num_results(integer, optional): Number of sources to examine
Returns: Formatted research results with sources and content
Generate a structured research prompt for multi-stage research.
Parameters:
topic(string, required): Topic to research
Returns: Comprehensive research prompt with methodology
Execute research query via HTTP.
{
"query": "string",
"sources": "both",
"num_results": 2
}Health check endpoint.
Interactive API documentation (Swagger UI).
Start a new research job.
Get research job status and results.
Get detailed progress for a research job.
analysis_alpaca/
βββ src/analysis_alpaca/
β βββ __init__.py
β βββ config.py # Configuration management
β βββ core/
β β βββ __init__.py
β β βββ server.py # MCP server implementation
β β βββ research_service.py # Research orchestration
β βββ search/
β β βββ __init__.py
β β βββ base.py # Base searcher class
β β βββ web_search.py # DuckDuckGo implementation
β β βββ academic_search.py # Semantic Scholar implementation
β β βββ content_extractor.py # Content extraction
β βββ models/
β β βββ __init__.py
β β βββ research.py # Data models
β βββ utils/
β β βββ __init__.py
β β βββ logging.py # Logging utilities
β β βββ text.py # Text processing
β βββ exceptions/
β βββ __init__.py
β βββ base.py # Custom exceptions
βββ web_ui/
β βββ frontend/ # React.js application
β βββ backend/ # FastAPI backend
βββ tests/ # Test suite
βββ http_server.py # HTTP wrapper
βββ requirements.txt # Dependencies
βββ pyproject.toml # Package configuration
βββ Makefile # Development commands
# Install with development dependencies
pip install -e ".[dev,ai]"
# Set up pre-commit hooks (optional)
pre-commit install
# Run tests
make test
# Code formatting
make format
# Linting
make lint
# Type checking
make type-check- Create a new searcher class inheriting from
BaseSearcher - Implement the
search()method - Add the searcher to
ResearchService - Update configuration and documentation
Example:
from .base import BaseSearcher
class NewSearcher(BaseSearcher):
async def search(self, query: str, num_results: int) -> List[SearchResult]:
# Implement search logic
pass# Run all tests
make test
# Run with coverage
make test-cov
# Run specific test file
pytest tests/test_models.py
# Run with verbose output
pytest -vtests/test_models.py- Data model teststests/test_utils.py- Utility function teststests/conftest.py- Test configuration and fixtures
Tests use pytest and pytest-asyncio for async testing:
import pytest
from analysis_alpaca.models.research import ResearchQuery
@pytest.mark.asyncio
async def test_research_query():
query = ResearchQuery(query="test", sources="web", num_results=2)
assert query.query == "test"FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install -e .
EXPOSE 8001
CMD ["python", "http_server.py"]For production, set these environment variables:
AA_LOG_LEVEL=WARNING
AA_LOG_FILE=/var/log/analysis-alpaca.log
AA_AUTO_INSTALL_DEPS=false
AA_MAX_RESULTS=3
AA_WEB_TIMEOUT=20.0server {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://127.0.0.1:8001;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}The application provides comprehensive logging. Monitor these key metrics:
- Research request rates
- Search success/failure rates
- Content extraction success rates
- Response times
- Error patterns
- The application is stateless and can be horizontally scaled
- Consider implementing Redis for caching search results
- Use a proper message queue for background processing in high-traffic scenarios
# Ensure proper installation
pip install -e .
# Check Python path
python -c "import analysis_alpaca; print('OK')"# Increase timeout values
export AA_WEB_TIMEOUT=30.0
export AA_ACADEMIC_TIMEOUT=30.0The system automatically handles Semantic Scholar rate limits with:
- Exponential backoff retry logic
- Graceful degradation (returns web results only)
- Request spacing
- Check network connectivity
- Verify target site availability
- Some sites may block automated requests
# Increase content size limits
export AA_MAX_CONTENT_SIZE=15000
export AA_MAX_EXTRACTION_SIZE=200000Enable detailed logging:
export AA_LOG_LEVEL=DEBUG
export AA_LOG_FILE="debug.log"
python http_server.pyView logs:
tail -f debug.log- Check the logs for detailed error messages
- Verify your configuration against the examples
- Test with simple queries first
- Ensure all dependencies are properly installed
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes with tests
- Run quality checks:
make check-all - Submit a pull request
The project uses:
- Black for code formatting
- isort for import sorting
- flake8 for linting
- mypy for type checking
Run all checks:
make check-allUse conventional commits:
feat:for new featuresfix:for bug fixesdocs:for documentationtest:for testsrefactor:for refactoring
MIT License - see LICENSE file for details.
- Semantic Scholar for academic search API
- DuckDuckGo for web search functionality
- Model Context Protocol for the integration framework
- FastMCP for the server implementation
- React.js and FastAPI for the web interface
-
Additional Search Providers
- Google Scholar integration
- Bing Academic search
- ArXiv direct integration
-
Enhanced Content Processing
- PDF content extraction
- Image and chart analysis
- Table data extraction
-
Performance Improvements
- Redis caching layer
- Async processing optimization
- Response streaming
-
Advanced Features
- Citation graph analysis
- Research trend detection
- Multi-language support
-
Enterprise Features
- User authentication
- Usage analytics
- API rate limiting
- Custom search domains
- v1.0.0 - Initial release with core research functionality
- v1.1.0 - Added web interface and PDF export
- v1.2.0 - Enhanced error handling and rate limiting
- Current - Comprehensive cleanup and documentation
For the latest updates and detailed changelog, visit the GitHub repository.