Skip to content

Add vl-convert-fontsource crate and explicit font install#245

Draft
jonmmease wants to merge 34 commits intomainfrom
jonmmease/fontsource-install
Draft

Add vl-convert-fontsource crate and explicit font install#245
jonmmease wants to merge 34 commits intomainfrom
jonmmease/fontsource-install

Conversation

@jonmmease
Copy link
Collaborator

@jonmmease jonmmease commented Feb 24, 2026

Summary

Add support for downloading, caching, and registering fonts from the Fontsource catalog (which includes Google Fonts and other open-source font families). Fonts are fetched on demand, cached to disk with LRU eviction, and loaded into fontdb for use in Vega/Vega-Lite rendering.

Motivation

vl-convert renders charts to static images using server-side text layout, which requires the actual font files. Previously, users had to manually download font files and point vl-convert at the directory. This PR adds automatic font downloading from Fontsource so users can register any of ~1,800 open-source font families by name.

Usage

CLI

# Use Roboto when converting a Vega-Lite spec to PNG
vl-convert vl2png -i chart.vl.json -o chart.png --fontsource-font "Roboto"

# Multiple fonts
vl-convert vl2svg -i chart.vl.json -o chart.svg \
  --fontsource-font "Roboto" \
  --fontsource-font "Playfair Display"

Python

import vl_convert as vlc

# Register a font globally (persists for the session)
vlc.register_fontsource_font("Roboto")

# Register specific variants only (smaller download)
vlc.register_fontsource_font("Roboto", variants=[(400, "normal"), (700, "normal")])

# Configure disk cache size (default 512 MB)
vlc.configure(fontsource_cache_size_mb=256)

Rust

use vl_convert_rs::text::register_fontsource_font;

// Register globally (persists for the session)
register_fontsource_font("Roboto", None).await?;

Per-request fonts via VlOpts/VgOpts:

use vl_convert_rs::converter::{VlOpts, VgOpts, FontsourceFontRequest};
use vl_convert_fontsource::{FontStyle, VariantRequest};

// Vega-Lite conversion with specific font variants
let svg = converter.vegalite_to_svg(vl_spec, VlOpts {
    fontsource_fonts: Some(vec![
        FontsourceFontRequest {
            family: "Roboto".into(),
            variants: Some(vec![VariantRequest { weight: 400, style: FontStyle::Normal }]),
        },
    ]),
    ..Default::default()
}).await?;

// Vega conversion with all variants of a font
let png = converter.vega_to_png(vg_spec, VgOpts {
    fontsource_fonts: Some(vec![
        FontsourceFontRequest {
            family: "Playfair Display".into(),
            variants: None, // downloads all available variants
        },
    ]),
    ..Default::default()
}, None, None).await?;

Per request fonts don't remain in memory after the conversion, so are a better solution for long running server processes. This is also the foundation that auto font detection will use in the following PR.

Architecture

New crate: vl-convert-fontsource

Self-contained crate with no dependencies on other vl-convert workspace crates. The core library (API client, disk cache, variant resolution) has no fontdb dependency — fontdb integration is behind an optional fontdb feature flag. This design means the crate could be spun out of the vl-convert workspace as an independent Fontsource client if there is future interest.

Core functionality (no fontdb required):

  • Metadata fetch: Downloads font metadata from the Fontsource API (/v1/fonts/{id}), cached to {cache_dir}/metadata/
  • Blob download: Fetches TTF files with parallel downloads (default 8 concurrent), cached to {cache_dir}/blobs/
  • LRU eviction: File-based LRU using mtime, with cross-process file locking (fs4)
  • Variant filtering: Download all variants or a specific subset of (weight, style) pairs
  • Cache integrity: Magic-bytes validation on cached blobs with self-healing (corrupt files are deleted and re-fetched)

With fontdb feature:

  • FontsourceDatabaseExt trait for batch register/unregister on fontdb::Database
  • RegisteredFontBatch for tracking registered face IDs

Public API: FontsourceClient, ClientConfig, LoadedFontBatch, VariantRequest, FontStyle, FontsourceError. With fontdb feature: FontsourceDatabaseExt, RegisteredFontBatch.

Integration: worker font overlay system

The converter worker pool gains a font overlay mechanism:

  • FontBaselineSnapshot: Shared RwLock snapshot of the resolved fontdb::Database, cloned by workers at startup.
  • Per-request overlays: Vec<LoadedFontBatch> flows through the VlConvertCommand enum. Workers call register_fontsource_batch to temporarily load request-scoped fonts, then unregister_fontsource_batch after rendering. Fonts specified via VlOpts/VgOpts fontsource_fonts field are transient.
  • Persistent registration: register_fontsource_font() adds fonts to the global FONT_CONFIG and bumps the baseline version, making them available to all subsequent requests.

Other changes in converter.rs

  • New VlConvertCommand variants (VgToJpeg, VgToPdf, VlToJpeg, VlToPdf, SvgToPng, SvgToJpeg, SvgToPdf) route all conversion paths through the worker command channel. This was necessary so per-request fontsource fonts can be applied consistently before any render.
  • SVG-to-raster/PDF conversions that previously used free functions now go through the worker pool.

Environment variables

Variable Effect
VL_CONVERT_FONT_CACHE_DIR Override cache directory (set to "none" to disable disk caching)
VL_CONVERT_FONTSOURCE_API_URL Override API base URL (for mirrors or proxies)

Default cache location: {dirs::cache_dir()}/vl-convert/fontsource/ (e.g., ~/Library/Caches/vl-convert/fontsource/ on macOS, ~/.cache/vl-convert/fontsource/ on Linux)

Testing

Integration test suite in vl-convert-fontsource/tests/load_fontsource.rs with a custom TestServer that tracks per-endpoint hit counts and max inflight concurrent requests.

Coverage includes:

  • Cache behavior: cache hits skip network, corrupt metadata triggers re-fetch, corrupt blobs self-heal via magic-bytes check, LRU eviction keeps current request's fonts
  • Download correctness: variant filtering (empty/unavailable errors), all-variants-when-none-specified
  • Concurrency: parallel download bounding, in-process download deduplication
  • Lifecycle: register/unregister via FontsourceDatabaseExt, idempotent unregister, async/blocking parity

Existing vl-convert-rs end-to-end tests continue to pass.

Known Limitations

  • No metadata TTL: Cached font metadata is served until manually deleted. Fonts rarely change on Fontsource.
  • TTF downloads: Downloads TTF files (larger than WOFF2) because fontdb requires TTF/OTF. Fontsource provides TTF for all fonts, so no fonts are excluded by this.
  • CLI downloads all variants: --fontsource-font downloads all available variants. The Python API supports variant selection via the variants parameter.

@jonmmease jonmmease marked this pull request as draft February 24, 2026 22:20
@jonmmease jonmmease changed the title Add Fontsource auto-install and HTML font loading feat: Loading remote fonts using Fontsource and optionally auto-install fonts during rendering Feb 24, 2026
@jonmmease jonmmease force-pushed the jonmmease/fontsource-install branch from 8a3d069 to 060fda9 Compare February 25, 2026 13:10
@jonmmease jonmmease changed the title feat: Loading remote fonts using Fontsource and optionally auto-install fonts during rendering Add vl-convert-fontsource crate and explicit font install Feb 25, 2026
jonmmease and others added 10 commits February 28, 2026 14:26
New standalone crate for downloading, caching, and resolving font files
from the Fontsource catalog (which includes Google Fonts). Provides
disk-based LRU cache, variant filtering, and batch font loading into
fontdb::Database via a source-batch registration API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add from_parts(), clone_fontdb(), and hinting_enabled() to
ResolvedFontConfig so workers can construct resolved configs from
existing fontdb databases without rebuilding from FontConfig.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce FontBaselineSnapshot (RwLock) so workers clone a pre-resolved
fontdb instead of each re-resolving FontConfig on every version bump.
Add install_font() for Fontsource downloads, configure_font_cache() for
disk cache sizing, and re-export public types from vl-convert-rs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add WorkerFontState and FontRequest types. Workers now clone a shared
FontBaselineSnapshot at startup and apply per-request font overlays
for request-scoped fonts. SVG/PNG/PDF render commands carry font
sources through the command channel so the worker can install them
before rendering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Global CLI flags to download Fontsource fonts before conversion.
--install-font accepts a family name (repeatable), --install-font-variants
accepts comma-separated "weight:style" pairs to filter variants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Expose install_font(family, variants) in both sync and asyncio APIs.
Add font_cache_size_mb parameter to configure_converter(). Update
type stubs to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace path-based font cache with blob cache (metadata + blob bytes),
rename public API to use explicit "fontsource" naming throughout
(register_fontsource_font, FontsourceFontRequest, --fontsource-font),
simplify CLI to single --fontsource-font flag, and wire fontsource_fonts
field through VgOpts/VlOpts for per-request font loading.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ories

Replace separate metadata_cache_dir and blob_cache_dir fields with a
single cache_dir: Option<PathBuf> and metadata_dir()/blob_dir() helpers
that derive metadata/ and blobs/ subdirectories. Simplifies config and
adds VL_CONVERT_FONT_CACHE_DIR env var support (path override or "none"
to disable persistent caching).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jonmmease jonmmease force-pushed the jonmmease/fontsource-install branch from 903179c to 116d812 Compare February 28, 2026 23:47
jonmmease and others added 14 commits March 1, 2026 16:02
…n fix

- Add with_font_overlay! macro in converter.rs to eliminate 12 copy-pasted
  apply/clear overlay blocks in handle_command match arms
- Replace hand-rolled retry loops with backon crate's ExponentialBuilder
  for proper exponential backoff on retryable HTTP errors
- Extract is_ttf_file() helper in cache.rs and apply to evict_blob_lru_until_size
  so eviction only considers .ttf files (matching calculate_blob_cache_size_bytes)
- Fix pre-existing CI failures: duplicate ..Default::default(), missing
  fontsource_fonts field in test VlOpts, and formatting in lib.rs/tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The clone is required because the async closure passed to
buffer_unordered needs owned ResolvedTtfFile values. Without it,
Rust 1.93 raises "implementation of FnOnce is not general enough"
due to higher-ranked lifetime requirements. Suppress the clippy
redundant_iter_cloned lint for this call.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ycle

- Blob cache keys are now SHA-256 hashes of the download URL instead of
  human-readable {font_id}--{subset}-{weight}-{style}.ttf names, making
  collisions impossible and removing the filename field from ResolvedTtfFile.
- Blob files use .blob extension instead of .ttf.
- cache_dir is now Option<PathBuf>; set VL_CONVERT_FONT_CACHE_DIR=none to
  disable persistent caching entirely. Fall back to no caching (rather than
  temp_dir) when dirs::cache_dir() is unavailable.
- Self-heal corrupt metadata (bad JSON) and blob entries (directory at blob
  path) instead of silently returning stale data.
- DownloadGate tracks active_users so gates are pruned from the DashMap when
  the last consumer releases, preventing unbounded map growth.
- Async ensure_blobs preserves insertion order via index sorting.
- New tests for gate lifecycle, corrupt entry self-healing, and hash-agnostic
  eviction verification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…omments

- Extract triple-nested HashMap into pub type VariantMap.
- VariantsNotAvailable error now holds Vec<VariantRequest> instead of Vec<String>.
- Add Display impl for VariantRequest (formats as "400-normal").
- Remove banner-style section comments in cache.rs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
read_blob now validates TTF/OTF/TTC magic bytes before returning cached
data. Corrupt or truncated files are deleted and treated as cache misses,
triggering a re-download.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eature flag

Isolate fontdb dependency behind an optional `fontdb` feature flag so the
core Fontsource client (API, caching, variant resolution) compiles without
fontdb. The fontdb integration (RegisteredFontBatch, FontsourceDatabaseExt)
lives in a feature-gated fontdb_ext module. LoadedFontBatch now holds raw
Vec<Arc<Vec<u8>>> bytes; consumers construct fontdb::Source at the call site.

Also renames FontsourceFontdbError to FontsourceError.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jonmmease and others added 10 commits March 2, 2026 11:42
…sion trait

Replace Vec<Arc<Vec<u8>>> with Vec<LoadedFontBatch> in VlConvertCommand
variants so the overlay uses register_fontsource_batch/unregister_fontsource_batch
directly. This eliminates all manual fontdb::Source::Binary construction from
converter.rs — fontdb_ext.rs is now the single fontdb integration seam.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract shared helpers (validate_load_request, try_read_cached_metadata,
cache_metadata, parse_metadata_response, try_read_cached_blob, cache_blob)
to eliminate duplication between async and blocking code paths. Inline
prepare_load_async/prepare_load_blocking into load/load_blocking and
delete the redundant PreparedLoad struct. Add brief rustdoc comments to
all methods and remove block separator comments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…scope

Replace manual work-stealing thread pool (shared VecDeque queue,
Arc<Mutex> results, AtomicUsize counter, first-error propagation) with
files.chunks() partitioning. Threads run concurrently on pre-assigned
chunks, results stay in order naturally, errors propagate via ?.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run cargo fmt and bundle-licenses to fix CI failures after the
fontsource-fontdb -> fontsource crate rename.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When blob_dir is None, the gate serializes downloads without benefit
since waiters would just re-download anyway. Skip directly to the
download; max_parallel_downloads still applies at the caller level.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add FontsourceError::RelativeCacheDir and return it from
FontsourceClient::new when cache_dir is not absolute, matching
the convention of XDG_CACHE_HOME, CARGO_HOME, etc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document async vs blocking client initialization, dedupe_variants
intent, Drop impl rationale, and ClientConfig field descriptions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verifies that with cache_dir: None, every load re-fetches both
metadata and blobs from the network.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove unused ext.rs (duplicate of fontdb_ext.rs), add comment
explaining tokio::sync::Mutex in blocking context, and apply fmt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant