⚠️ Work in Progress (WIP)This repository is the result of two hackathons and is currently under development. It's more of a proof of concept than a production-ready solution. The codebase, documentation, and build processes are being continuously refined.
Please note:
- Expect changes and breaking updates
- APIs and interfaces may change without notice
- Use in production environments at your own risk
- Contributions and feedback are welcome as we work toward stability
A MCP server and command-line tools for searching and browsing transcribed historical documents from the Swedish National Archives (Riksarkivet).
- Full-text search across millions of transcribed historical documents
- Complete page transcriptions with accurate text extraction from historical manuscripts
- Reference-based document browsing using official archive reference codes
- Contextual search highlighting to identify relevant content quickly
- High-resolution image access to original document scans via IIIF
Adding ra-mcp with streamable http for ChatGPT or Claude:
url: https://riksarkivet-ra-mcp.hf.space/mcp
claude mcp add --transport http ra-mcp https://riksarkivet-ra-mcp.hf.space/mcpcat > mcp.json <<'EOF'
{
"mcpServers": {
"ra-mcp": {
"type": "streamable-http",
"url": "https://riksarkivet-ra-mcp.hf.space/mcp",
"note": "ra-mcp server (FastMCP) - via Streamable HTTP"
}
}
}
EOFInstall cli
uv pip install ra-mcp
# or
uv add ra-mcpFind documents containing specific words or phrases:
# Basic search
uv run ra search "Stockholm"
# Wildcard search - single character (?)
uv run ra search "St?ckholm" # Matches "Stockholm", "Stäckholm", etc.
# Wildcard search - multiple characters (*)
uv run ra search "Stock*" # Matches "Stockholm", "Stocksund", "Stocken", etc.
uv run ra search "St*holm" # Matches "Stockholm", "Strömholm", etc.
uv run ra search "*holm" # Matches "Stockholm", "Söderholm", etc.
# Fuzzy search - find similar words
uv run ra search "Stockholm~" # Matches "Stockholm", "Stokholm", "Stokholms", etc.
uv run ra search "Stockholm~1" # Matches "Stockholm", "Stokholm" (max edit distance: 1)
# Proximity search - find words within distance
uv run ra search '"Stockholm trolldom"~10' # "Stockholm" and "trolldom" within 10 words
# Boosting terms - increase relevance of specific terms
uv run ra search "Stockholm^4 trol*" # Boost "Stockholm" relevance with wildcard
uv run ra search '("Stockholm dom*"^4 Reg*)' # Boost entire phrase with wildcard
# Boolean operators - combine search terms
uv run ra search "(Stockholm AND trolldom)" # Both terms required
uv run ra search "(Stockholm OR Göteborg)" # Either term (or both)
uv run ra search "(Stockholm NOT trolldom)" # Stockholm but not trolldom
uv run ra search "+Stockholm -trolldom" # Require Stockholm, exclude trolldom
# Grouping - create complex queries with sub-queries
uv run ra search "((Stockholm OR Göteborg) AND troll*)" # Either city + häxprocess
uv run ra search "((troll* OR häx*) AND (Stockholm OR Göteborg))" # Complex groupingSearch Options:
--max N- Maximum search results (default: 50)--max-display N- Maximum results to display (default: 20)--max-hits-per-vol N- Maximum hits to return per volume (default: 3)
Search Types:
| Type | Syntax | Example | Description |
|---|---|---|---|
| Exact | "word" |
"Stockholm" |
Find exact matches |
| Wildcard (single) | ? |
"St?ckholm" |
Matches any single character |
| Wildcard (multiple) | * |
"Stock*" |
Matches zero or more characters |
| Fuzzy | ~ |
"Stockholm~" |
Finds similar terms based on edit distance (default: 2) |
| Fuzzy (custom) | ~N |
"Stockholm~1" |
Finds similar terms with max edit distance N (0-2) |
| Proximity | "word1 word2"~N |
"Stockholm trolldom"~10 |
Finds terms within N words of each other |
| Boosting | ^N |
"Stockholm^4 trol*" |
Increases relevance of boosted term (default: 1) |
| Boolean AND | AND or && |
(Stockholm AND trolldom) |
Both terms must be present |
| Boolean OR | OR or || |
(Stockholm OR Göteborg) |
Either term (or both) must be present |
| Boolean NOT | NOT or ! |
(Stockholm NOT trolldom) |
First term without second term |
| Required/Exclude | + / - |
+Stockholm -trolldom |
Require term (+) or exclude term (-) |
| Grouping | (...) |
((Stockholm OR Göteborg) AND troll*) |
Group clauses to form sub-queries |
When you find interesting documents, browse them directly:
# View single page
uv run ra browse "SE/RA/123" --page 5
# View page range
uv run ra browse "SE/RA/123" --pages "1-10"
# View specific pages with search highlighting
uv run ra browse "SE/RA/123" --page "5,7,9" --search-term "Stockholm"Options:
--pageor--pages- Page numbers (e.g., "5", "1-10", "5,7,9")--search-term- Highlight this term in the text--max-display N- Maximum pages to display (default: 20)
When you run a search, results are presented with:
- Document grouping - Related pages grouped together for context
- Institution & dates - Archive location and document dates
- Page numbers - Specific pages containing your search terms
- Highlighted snippets - Preview text with keywords emphasized
- Browse commands - Ready-to-run commands for deeper exploration
Example output:
Document: SE/RA/310187/1 - Kommissorialrätt i Stockholm ang. trolldom
Institution: Riksarkivet i Stockholm/Täby | Date: 1676 - 1677
├─ Page 2: "... **trolldom** ..."
├─ Page 7: "... **Trolldoms** ..."
├─ Page 8: "... **Trolldoms**..."
Browse commands:
uv run ra browse "SE/RA/310187/1" --page 7 --search-term "trolldom"
uv run ra browse "SE/RA/310187/1" --pages "2,7,8,52,72" --search-term "trolldom"
Each result provides direct access to:
| Resource | Description | Use Case |
|---|---|---|
| ALTO XML | Structured transcription data with precise positioning | Text analysis, data extraction |
| IIIF Images | High-resolution document scans with zoom/crop support | Visual inspection, citations |
| Bildvisning | Interactive web viewer with search highlighting | Online browsing, sharing |
| Collections | IIIF metadata for document series | Understanding document context |
-
Search for a keyword:
uv run ra search "Stockholm" -
Browse specific documents:
uv run ra browse "SE/RA/123456" --page "10-15" --search-term "Stockholm"
# Targeted document browsing
uv run ra browse "SE/RA/760264" --pages "1,5,10-12" --search-term "trolldom"
# Large search with selective display
uv run ra search "trolldom" --max 100 --max-display 30This tool integrates with multiple Riksarkivet APIs to provide comprehensive access to historical documents:
- Search API - Primary endpoint for full-text search across transcribed materials (Documentation)
- IIIF Collections - Access to digitized document collections via IIIF standard (Documentation)
- ALTO XML - Structured text transcriptions with precise positioning data
- IIIF Images - High-resolution document images with zoom and cropping capabilities
- Bildvisning - Interactive document viewer with search highlighting
- OAI-PMH - Metadata harvesting for archive records and references (Documentation)
The Riksarkivet Data Platform Wiki provides comprehensive documentation for building additional MCP integrations.
- Förvaltningshistorik - Semantic search interface (under evaluation)
- AI-Riksarkivet HTRflow - Handwritten text recognition pipeline (PyPI package)
- No results found: Try broader search terms or check spelling
- Page not loading: Some pages may not have transcriptions available
- Network timeouts: Tool includes retry logic, but very slow connections may time out
uv run ra --help
uv run ra search --help
uv run ra browse --help
uv run ra serve --help# clone repo
git clone https://github.com/AI-Riksarkivet/ra-mcp.git# Install dependencies
uv sync && uv pip install -e .
# Run the main MCP server (stdio)
cd src/ra_mcp && uv run ra serve
# Run with SSE/HTTP transport on port 8000
cd src/ra_mcp && uv run ra serve --httpUse the MCP Inspector to test and debug the MCP server:
# Test the server interactively
npx @modelcontextprotocol/inspector uv run ra serve --httpThe MCP Inspector provides a web interface to test server tools, resources, and prompts during development.
The project uses Dagger for containerized builds and publishing to Docker registries. Pre-built images are available on Docker Hub.
- Dagger CLI installed
- Docker registry credentials (for publishing)
Build locally:
dagger call buildRun tests:
dagger call testBuild and publish to Docker registry:
# Set environment variables
export DOCKER_PASSWORD="your-password"
# Build and publish
dagger call publish \
--docker-username="username" \
--docker-password=env:DOCKER_PASSWORD \
--image-repository="riksarkivet/ra-mcp" \
--tag="latest" \
--source=.build: Creates a production-ready container image using the Dockerfiletest: Runs the test suite using pytest with coverage reportingpublish: Builds and publishes container image to registry with authenticationbuild-local: Build with custom environment variables and registry settings
The Dagger configuration is located in .dagger/main.go and provides a complete CI/CD pipeline for the project.
The MCP server provides access to transcribed historical documents from the Swedish National Archives (Riksarkivet) through three primary tools and two resources:
Search for keywords in transcribed materials with pagination support.
search_transcribed(
keyword="trolldom", # Search term
offset=0, # Pagination offset (required)
show_context=False, # Full page text (default: False for more results)
max_results=10, # Maximum results to return
max_hits_per_document=3 # Max hits per document
)Browse specific pages of a document by reference code.
browse_document(
reference_code="SE/RA/310187/1", # Document reference
pages="7,8,52", # Page numbers or ranges
highlight_term="trolldom", # Optional keyword highlighting
max_pages=20 # Maximum pages to display
)- riksarkivet://contents/table_of_contents - Complete guide index (Innehållsförteckning)
- riksarkivet://guide/{filename} - Specific guide sections (e.g., '01_Domstolar.md', '02_Fangelse.md')
- Search →
search_transcribed("trolldom", offset=0)to find relevant documents - Paginate → Continue with
offset=50, 100, 150...for comprehensive discovery - Browse → Use
browse_document()to view specific pages with full transcriptions
- Start with
show_context=Falseto maximize hit coverage - Use pagination (increasing offsets) to find all matches
- Enable
show_context=Trueonly when you need full page text for specific hits - Browse specific pages for detailed examination with keyword highlighting

