typed-arrow provides a strongly typed, fully compile-time way to declare Arrow schemas in Rust. It maps Rust types directly to arrow-rs typed builders/arrays and arrow_schema::DataType — without any runtime DataType switching — enabling zero runtime cost, monomorphized column construction and ergonomic ORM-like APIs.
- Performance: monomorphized builders/arrays with zero dynamic dispatch; avoids runtime
DataTypematching. - Safety: column types, names, and nullability live in the type system; mismatches fail at compile time.
- Interop: uses
arrow-array/arrow-schematypes directly; no bespoke runtime layer to learn.
use typed_arrow::{prelude::*, schema::SchemaMeta};
use typed_arrow::{Dictionary, TimestampTz, Millisecond, Utc, List};
#[derive(typed_arrow::Record)]
struct Address { city: String, zip: Option<i32> }
#[derive(typed_arrow::Record)]
struct Person {
id: i64,
address: Option<Address>,
tags: Option<List<Option<i32>>>, // List column with nullable items
code: Option<Dictionary<i32, String>>, // Dictionary<i32, Utf8>
joined: TimestampTz<Millisecond, Utc>, // Timestamp(ms) with timezone (UTC)
}
fn main() {
// Build from owned rows
let rows = vec![
Person {
id: 1,
address: Some(Address { city: "NYC".into(), zip: None }),
tags: Some(List::new(vec![Some(1), None, Some(3)])),
code: Some(Dictionary::new("gold".into())),
joined: TimestampTz::<Millisecond, Utc>::new(1_700_000_000_000),
},
Person {
id: 2,
address: None,
tags: None,
code: None,
joined: TimestampTz::<Millisecond, Utc>::new(1_700_000_100_000),
},
];
let mut b = <Person as BuildRows>::new_builders(rows.len());
b.append_rows(rows);
let arrays = b.finish();
// Compile-time schema + RecordBatch
let batch = arrays.into_record_batch();
assert_eq!(batch.schema().fields().len(), <Person as Record>::LEN);
println!("rows={}, field0={}", batch.num_rows(), batch.schema().field(0).name());
}Add to your Cargo.toml (derives enabled by default):
[dependencies]
typed-arrow = { version = "0.x" }
# Enable zero-copy views for reading RecordBatch data
typed-arrow = { version = "0.x", features = ["views"] }When working in this repository/workspace:
[dependencies]
typed-arrow = { path = "." }
# With views feature
typed-arrow = { path = ".", features = ["views"] }Run the included examples to see end-to-end usage:
01_primitives— deriveRecord, inspectDataType, build primitives02_lists—List<T>andList<Option<T>>03_dictionary—Dictionary<K, String>04_timestamps—Timestamp<U>units04b_timestamps_tz—TimestampTz<U, Z>withUtcand custom markers05_structs— nested structs →StructArray06_rows_flat— row-based building for flat records07_rows_nested— row-based building with nested struct fields08_record_batch— compile-time schema +RecordBatch09_duration_interval— Duration and Interval types10_union— Dense Union as a Record column (with attributes)11_map— Map (incl.Option<V>values) + as a Record column12_ext_hooks— Extend#[derive(Record)]with visitor injection and macro callbacks13_record_batch_views— Zero-copy views overRecordBatchrows (requiresviewsfeature)
Run:
cargo run --example 08_record_batchRecord: implemented by the derive macro for structs with named fields.ColAt<I>: per-column associated itemsRust,ColumnBuilder,ColumnArray,NULLABLE,NAME, anddata_type().ArrowBinding: compile-time mapping from a Rust value type to its Arrow builder, array, andDataType.BuildRows: derive generates<Type>Buildersand<Type>Arrayswithappend_row(s)andfinish.SchemaMeta: derive providesfields()andschema(); arrays structs provideinto_record_batch().AppendStructandStructMeta: enable nested struct fields andStructArraybuilding.
When the views feature is enabled, typed-arrow automatically generates zero-copy view types for reading RecordBatch data without cloning or allocation. For each #[derive(Record)] struct, the macro generates:
{Name}View<'a>— A struct with borrowed references to row data{Name}Views<'a>— An iterator yieldingResult<{Name}View<'a>, ViewAccessError>impl TryFrom<{Name}View<'_>> for {Name}for each record type withError = ViewAccessError, making conversion composable and allowing proper error propagation when accessing nested structures.
use typed_arrow::prelude::*;
#[derive(typed_arrow::Record)]
struct Product {
id: i64,
name: String,
price: f64,
}
// Build a RecordBatch
let rows = vec![
Product { id: 1, name: "Widget".into(), price: 9.99 },
Product { id: 2, name: "Gadget".into(), price: 19.99 },
];
let mut b = <Product as BuildRows>::new_builders(rows.len());
b.append_rows(rows);
let batch = b.finish().into_record_batch();
// Read with zero-copy views
let views = batch.iter_views::<Product>()?;
for view in views.try_flatten()? {
// view.name is &str, view.id and view.price are copied primitives
println!("{}: ${}", view.name, view.price);
}Views provide zero-copy access to RecordBatch data, but sometimes you need to store data beyond the batch's lifetime. Use .try_into() to convert views into owned records:
let views = batch.iter_views::<Product>()?;
let mut owned_products = Vec::new();
for view in views.try_flatten()? {
// view.name is &str (borrowed)
// view.id and view.price are i64/f64 (copied)
if view.price > 100.0 {
// Convert to owned using .try_into()?
let owned: Product = view.try_into()?;
owned_products.push(owned); // Can store beyond batch lifetime
}
}- Schema-level: annotate with
#[schema_metadata(k = "owner", v = "data")]. - Field-level: annotate with
#[metadata(k = "pii", v = "email")]. - You can repeat attributes to add multiple pairs; later duplicates win.
- Struct fields: struct-typed fields map to Arrow
Structcolumns by default. Make the parent field nullable withOption<Nested>; child nullability is independent. - Lists:
List<T>(items non-null) andList<Option<T>>(items nullable). UseOption<List<_>>for list-level nulls. - LargeList:
LargeList<T>andLargeList<Option<T>>for 64-bit offsets; wrap withOption<_>for column nulls. - FixedSizeList:
FixedSizeList<T, N>(items non-null) andFixedSizeListNullable<T, N>(items nullable). Wrap withOption<_>for list-level nulls. - Map:
Map<K, V, const SORTED: bool = false>where keys are non-null; useMap<K, Option<V>>to allow nullable values. Column nullability viaOption<Map<...>>.SORTEDsetskeys_sortedin the ArrowDataType. - OrderedMap:
OrderedMap<K, V>usesBTreeMap<K, V>and declareskeys_sorted = true. - Dictionary:
Dictionary<K, V>with integral keysK ∈ { i8, i16, i32, i64, u8, u16, u32, u64 }and values:String/LargeUtf8(Utf8/LargeUtf8)Vec<u8>/LargeBinary(Binary/LargeBinary)[u8; N](FixedSizeBinary)- primitives
i*,u*,f32,f64Column nullability viaOption<Dictionary<..>>.
- Timestamps:
Timestamp<U>(unit-only) andTimestampTz<U, Z>(unit + timezone). Units:Second,Millisecond,Microsecond,Nanosecond. UseUtcor define your ownZ: TimeZoneSpec. - Decimals:
Decimal128<P, S>andDecimal256<P, S>(precisionP, scaleSas const generics). - Unions:
#[derive(Union)]for enums with#[union(mode = "dense"|"sparse")], per-variant#[union(tag = N)],#[union(field = "name")], and optional null carrier#[union(null)]or container-levelnull_variant = "Var".
Supported (arrow-rs v56):
- Primitives: Int8/16/32/64, UInt8/16/32/64, Float16/32/64, Boolean
- Strings/Binary: Utf8, LargeUtf8, Binary, LargeBinary, FixedSizeBinary (via
[u8; N]) - Temporal: Timestamp (with/without TZ; s/ms/us/ns), Date32/64, Time32(s/ms), Time64(us/ns), Duration(s/ms/us/ns), Interval(YearMonth/DayTime/MonthDayNano)
- Decimal: Decimal128, Decimal256 (const generic precision/scale)
- Nested:
- List (including nullable items), LargeList, FixedSizeList (nullable/non-null items)
- Struct,
- Map (Vec<(K,V)>; use
Option<V>for nullable values), OrderedMap (BTreeMap<K,V>) withkeys_sorted = true - Union: Dense and Sparse (via
#[derive(Union)]on enums) - Dictionary: keys = all integral types; values = Utf8 (String), LargeUtf8, Binary (Vec), LargeBinary, FixedSizeBinary (
[u8; N]), primitives (i*, u*, f32, f64)
Missing:
- BinaryView, Utf8View
- Utf8View
- ListView, LargeListView
- RunEndEncoded
- Derive extension hooks allow user-level customization without changing the core derive:
- Inject compile-time visitors:
#[record(visit(MyVisitor))] - Call your macros per field/record:
#[record(field_macro = my_ext::per_field, record_macro = my_ext::per_record)] - Tag fields/records with free-form markers:
#[record(ext(key))]
- Inject compile-time visitors:
- See
docs/extensibility.mdand the runnable exampleexamples/12_ext_hooks.rs.