ra-mcp

⚠️ Work in Progress (WIP)

This repository is the result of two hackathons and is currently under development. It's more of a proof of concept than a production-ready solution. The codebase, documentation, and build processes are being continuously refined.

Please note:

Expect changes and breaking updates

APIs and interfaces may change without notice

Use in production environments at your own risk

Contributions and feedback are welcome as we work toward stability

MCPs for Riksarkivet

A MCP server and command-line tools for searching and browsing transcribed historical documents from the Swedish National Archives (Riksarkivet).

Features

Full-text search across millions of transcribed historical documents
Complete page transcriptions with accurate text extraction from historical manuscripts
Reference-based document browsing using official archive reference codes
Contextual search highlighting to identify relevant content quickly
High-resolution image access to original document scans via IIIF

Getting Started

MCP

Adding ra-mcp with streamable http for ChatGPT or Claude:

url: https://riksarkivet-ra-mcp.hf.space/mcp

Claude Code

claude mcp add --transport http ra-mcp https://riksarkivet-ra-mcp.hf.space/mcp

IDEs

cat > mcp.json <<'EOF'
{
  "mcpServers": {
    "ra-mcp": {
      "type": "streamable-http",
      "url": "https://riksarkivet-ra-mcp.hf.space/mcp",
      "note": "ra-mcp server (FastMCP) - via Streamable HTTP"
    }
  }
}
EOF

CLI

Install cli

uv pip install ra-mcp
# or
uv add ra-mcp

How to Use

1. Search for Keywords

Find documents containing specific words or phrases:

# Basic search
uv run ra search "Stockholm"

# Wildcard search - single character (?)
uv run ra search "St?ckholm"  # Matches "Stockholm", "Stäckholm", etc.

# Wildcard search - multiple characters (*)
uv run ra search "Stock*"     # Matches "Stockholm", "Stocksund", "Stocken", etc.
uv run ra search "St*holm"    # Matches "Stockholm", "Strömholm", etc.
uv run ra search "*holm"      # Matches "Stockholm", "Söderholm", etc.

# Fuzzy search - find similar words
uv run ra search "Stockholm~"   # Matches "Stockholm", "Stokholm", "Stokholms", etc.
uv run ra search "Stockholm~1"  # Matches "Stockholm", "Stokholm" (max edit distance: 1)

# Proximity search - find words within distance
uv run ra search '"Stockholm trolldom"~10'  # "Stockholm" and "trolldom" within 10 words

# Boosting terms - increase relevance of specific terms
uv run ra search "Stockholm^4 trol*"  # Boost "Stockholm" relevance with wildcard
uv run ra search '("Stockholm dom*"^4 Reg*)'  # Boost entire phrase with wildcard

# Boolean operators - combine search terms
uv run ra search "(Stockholm AND trolldom)"  # Both terms required
uv run ra search "(Stockholm OR Göteborg)"  # Either term (or both)
uv run ra search "(Stockholm NOT trolldom)"  # Stockholm but not trolldom
uv run ra search "+Stockholm -trolldom"  # Require Stockholm, exclude trolldom

# Grouping - create complex queries with sub-queries
uv run ra search "((Stockholm OR Göteborg) AND troll*)"  # Either city + häxprocess
uv run ra search "((troll* OR häx*) AND (Stockholm OR Göteborg))"  # Complex grouping

Search Options:

--max N - Maximum search results (default: 50)
--max-display N - Maximum results to display (default: 20)
--max-hits-per-vol N - Maximum hits to return per volume (default: 3)

Search Types:

Type	Syntax	Example	Description
Exact	`"word"`	`"Stockholm"`	Find exact matches
Wildcard (single)	`?`	`"St?ckholm"`	Matches any single character
Wildcard (multiple)	`*`	`"Stock*"`	Matches zero or more characters
Fuzzy	`~`	`"Stockholm~"`	Finds similar terms based on edit distance (default: 2)
Fuzzy (custom)	`~N`	`"Stockholm~1"`	Finds similar terms with max edit distance N (0-2)
Proximity	`"word1 word2"~N`	`"Stockholm trolldom"~10`	Finds terms within N words of each other
Boosting	`^N`	`"Stockholm^4 trol*"`	Increases relevance of boosted term (default: 1)
Boolean AND	`AND` or `&&`	`(Stockholm AND trolldom)`	Both terms must be present
Boolean OR	`OR` or `\|\|`	`(Stockholm OR Göteborg)`	Either term (or both) must be present
Boolean NOT	`NOT` or `!`	`(Stockholm NOT trolldom)`	First term without second term
Required/Exclude	`+` / `-`	`+Stockholm -trolldom`	Require term (+) or exclude term (-)
Grouping	`(...)`	`((Stockholm OR Göteborg) AND troll*)`	Group clauses to form sub-queries

2. Browse Specific Documents

When you find interesting documents, browse them directly:

# View single page
uv run ra browse "SE/RA/123" --page 5

# View page range
uv run ra browse "SE/RA/123" --pages "1-10"

# View specific pages with search highlighting
uv run ra browse "SE/RA/123" --page "5,7,9" --search-term "Stockholm"

Options:

--page or --pages - Page numbers (e.g., "5", "1-10", "5,7,9")
--search-term - Highlight this term in the text
--max-display N - Maximum pages to display (default: 20)

Output Features

🔍 Search Results

When you run a search, results are presented with:

Document grouping - Related pages grouped together for context
Institution & dates - Archive location and document dates
Page numbers - Specific pages containing your search terms
Highlighted snippets - Preview text with keywords emphasized
Browse commands - Ready-to-run commands for deeper exploration

Example output:

Document: SE/RA/310187/1 - Kommissorialrätt i Stockholm ang. trolldom
Institution: Riksarkivet i Stockholm/Täby | Date: 1676 - 1677
├─ Page 2: "... **trolldom** ..."
├─ Page 7: "... **Trolldoms** ..."
├─ Page 8: "... **Trolldoms**..."

Browse commands:
  uv run ra browse "SE/RA/310187/1" --page 7 --search-term "trolldom"
  uv run ra browse "SE/RA/310187/1" --pages "2,7,8,52,72" --search-term "trolldom"

🔗 Available Resources

Each result provides direct access to:

Resource	Description	Use Case
ALTO XML	Structured transcription data with precise positioning	Text analysis, data extraction
IIIF Images	High-resolution document scans with zoom/crop support	Visual inspection, citations
Bildvisning	Interactive web viewer with search highlighting	Online browsing, sharing
Collections	IIIF metadata for document series	Understanding document context

Examples

Basic Workflow

Search for a keyword:
```
uv run ra search "Stockholm"
```

Browse specific documents:

uv run ra browse "SE/RA/123456" --page "10-15" --search-term "Stockholm"

Advanced Usage

# Targeted document browsing
uv run ra browse "SE/RA/760264" --pages "1,5,10-12" --search-term "trolldom"

# Large search with selective display
uv run ra search "trolldom" --max 100 --max-display 30

Technical Details

Riksarkivet APIs & Data Sources

This tool integrates with multiple Riksarkivet APIs to provide comprehensive access to historical documents:

Current Integrations

Search API - Primary endpoint for full-text search across transcribed materials (Documentation)
IIIF Collections - Access to digitized document collections via IIIF standard (Documentation)
ALTO XML - Structured text transcriptions with precise positioning data
IIIF Images - High-resolution document images with zoom and cropping capabilities
Bildvisning - Interactive document viewer with search highlighting
OAI-PMH - Metadata harvesting for archive records and references (Documentation)

Additional Resources

The Riksarkivet Data Platform Wiki provides comprehensive documentation for building additional MCP integrations.

Experimental Features

Förvaltningshistorik - Semantic search interface (under evaluation)
AI-Riksarkivet HTRflow - Handwritten text recognition pipeline (PyPI package)

Troubleshooting

Common Issues

No results found: Try broader search terms or check spelling
Page not loading: Some pages may not have transcriptions available
Network timeouts: Tool includes retry logic, but very slow connections may time out

Getting Help

uv run ra --help
uv run ra search --help
uv run ra browse --help
uv run ra serve --help

MCP Server Development

# clone repo
git clone https://github.com/AI-Riksarkivet/ra-mcp.git

Running the MCP Server

# Install dependencies
uv sync && uv pip install -e .

# Run the main MCP server (stdio)
cd src/ra_mcp && uv run ra serve

# Run with SSE/HTTP transport on port 8000
cd src/ra_mcp && uv run ra serve --http

Testing with MCP Inspector

Use the MCP Inspector to test and debug the MCP server:

# Test the server interactively
npx @modelcontextprotocol/inspector uv run ra serve --http

The MCP Inspector provides a web interface to test server tools, resources, and prompts during development.

Building and Publishing with Dagger

The project uses Dagger for containerized builds and publishing to Docker registries. Pre-built images are available on Docker Hub.

Prerequisites

Dagger CLI installed
Docker registry credentials (for publishing)

Available Commands

Build locally:

dagger call build

Run tests:

dagger call test

Build and publish to Docker registry:

# Set environment variables

export DOCKER_PASSWORD="your-password"

# Build and publish
dagger call publish \
  --docker-username="username" \
  --docker-password=env:DOCKER_PASSWORD \
  --image-repository="riksarkivet/ra-mcp" \
  --tag="latest" \
  --source=.

Available Dagger Functions

build: Creates a production-ready container image using the Dockerfile
test: Runs the test suite using pytest with coverage reporting
publish: Builds and publishes container image to registry with authentication
build-local: Build with custom environment variables and registry settings

The Dagger configuration is located in .dagger/main.go and provides a complete CI/CD pipeline for the project.

Current MCP Server Implementation

The MCP server provides access to transcribed historical documents from the Swedish National Archives (Riksarkivet) through three primary tools and two resources:

🔧 Available Tools

1. search_transcribed

Search for keywords in transcribed materials with pagination support.

search_transcribed(
    keyword="trolldom",          # Search term
    offset=0,                    # Pagination offset (required)
    show_context=False,          # Full page text (default: False for more results)
    max_results=10,              # Maximum results to return
    max_hits_per_document=3      # Max hits per document
)

2. browse_document

Browse specific pages of a document by reference code.

browse_document(
    reference_code="SE/RA/310187/1",  # Document reference
    pages="7,8,52",                   # Page numbers or ranges
    highlight_term="trolldom",        # Optional keyword highlighting
    max_pages=20                       # Maximum pages to display
)

📚 Available Resources

riksarkivet://contents/table_of_contents - Complete guide index (Innehållsförteckning)
riksarkivet://guide/{filename} - Specific guide sections (e.g., '01_Domstolar.md', '02_Fangelse.md')

🔄 Typical Workflow

Search → search_transcribed("trolldom", offset=0) to find relevant documents
Paginate → Continue with offset=50, 100, 150... for comprehensive discovery
Browse → Use browse_document() to view specific pages with full transcriptions

💡 Search Strategy Tips

Start with show_context=False to maximize hit coverage
Use pagination (increasing offsets) to find all matches
Enable show_context=True only when you need full page text for specific hits
Browse specific pages for detailed examination with keyword highlighting

Name		Name	Last commit message	Last commit date
Latest commit History 214 Commits
.dagger		.dagger
.github		.github
.vscode		.vscode
assets		assets
demo		demo
resources		resources
src/ra_mcp		src/ra_mcp
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
dagger.json		dagger.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

AI-Riksarkivet/ra-mcp

Folders and files

Latest commit

History

Repository files navigation