Rust service that downloads selected IMDb non-commercial datasets, builds a search index with Tantivy, and exposes a simple HTTP API with Axum.
- Downloads the official IMDb non-commercial TSV archives for names, titles, crew, principals, episodes, akas, and ratings.
- Stores compressed and decompressed TSV files in a configurable data directory.
- Builds Tantivy indices for titles (primary, original, and international AKA titles) and names, enabling multilingual full-text search.
- Async downloader with resumable streaming and background decompression.
- Filterable JSON API for titles (type, year range, genres, rating, vote counts) with optional ranking by rating or votes; title search also matches crew/cast names and tolerates minor typos.
- Dedicated name search API backed by the IMDb
name.basics.tsvdataset with range/attribute filters and typo tolerance.
- Rust 1.75+ (project uses the Rust 2024 edition and async/await).
- Sufficient disk space (the full dataset is tens of gigabytes once decompressed).
- Network access to
https://datasets.imdbws.com.
⚠️ The IMDb datasets are licensed for non-commercial use only. Review the IMDb dataset terms before using this project and ensure compliance.
Configuration is supplied via environment variables (an optional .env file is loaded on startup):
| Variable | Default | Description |
|---|---|---|
IMDB_DATA_DIR |
./data |
Directory where compressed and decompressed TSV files are stored. |
IMDB_INDEX_DIR |
<IMDB_DATA_DIR>/tantivy_index |
Location of the Tantivy index. |
IMDB_BIND_ADDR |
127.0.0.1:3000 |
Address for the Axum HTTP server. |
# Download datasets, build the index, and start the API server
cargo run --releaseThe first launch will download and decompress all required archives and build the index. Subsequent runs reuse the existing data and index. Delete the index directory if you need to force a rebuild after updating datasets.
Simple health check endpoint returning "ok".
Searches titles (movies, TV shows, etc.). Supported query parameters:
query(optional) – search expression (multilingual via primary, original, and AKA titles).limit(optional) – max results (1–50, default 10).title_type– filter by exact title type (e.g.movie,tvSeries).start_year_min,start_year_max– inclusive production year range filters.end_year_min,end_year_max– inclusive range for series end year (defaults mirror start year behaviour).min_rating,max_rating– inclusive average rating range (floating-point).min_votes,max_votes– inclusive vote-count range.genres– repeatable parameter to require specific genres (e.g.genres=Action&genres=Sci-Fi).sort– one ofrelevance(default),rating_desc,rating_asc,votes_desc,votes_asc.- Defaults (can be overridden):
title_type=movie,tvSeries,start_year_min=1980,end_year_min=1980.
Response example:
{
"results": [
{
"tconst": "tt0133093",
"primary_title": "The Matrix",
"original_title": "The Matrix",
"title_type": "movie",
"start_year": 1999,
"end_year": 1999,
"genres": ["Action", "Sci-Fi"],
"average_rating": 8.7,
"num_votes": 1900000,
"score": 13.24534
}
]
}Searches people from name.basics.tsv.
Parameters:
query(optional) – text to search across primary names and professions.limit(optional) – max results (1–50, default 10).birth_year_min,birth_year_max– inclusive birth year range filters.primary_profession– repeatable parameter to require specific professions (e.g.primary_profession=actor).
Response example:
{
"results": [
{
"nconst": "nm0000206",
"primary_name": "Keanu Reeves",
"birth_year": 1964,
"primary_profession": ["actor", "producer"],
"known_for_titles": ["tt0121765", "tt0133093", "tt0106519", "tt1375666"],
"score": 14.87334
}
]
}Fetches a single title by its IMDb identifier (e.g. tt0133093). Returns the same payload shape as /titles/search.
Fetches a single person by their IMDb identifier (e.g. nm0000206). Returns the same payload shape as /names/search.
cargo fmtandcargo clippykeep the codebase consistent.cargo checkensures the project builds without downloading datasets.- Integration with observability is via
tracing; control verbosity usingRUST_LOG, e.g.RUST_LOG=debug.
- The current index includes title basics and ratings. Additional datasets are downloaded and available for future enrichment (e.g., principals, crew, episodes).
- IMDb datasets are updated daily; consider scheduling periodic re-download + re-index if you need fresh data.
- Large downloads may take time; the downloader skips files already present on disk.