Write and execute superfast rust inside your Python code! Here's how...
Write a type-annotated function or method definition in python, add the rust decorator and put the rust
implementation in a docstr:
from xenoform_rs import rust
@rust(py=False)
def vector_sum(v: list[int]) -> int: # ty: ignore[empty-body]
"""
Ok(v.iter().sum())
"""When Python loads this file, all functions using this decorator have their function signatures translated to rust and the source for an extension module is generated. The first time any function is called, the module is built, the attribute corresponding to the (empty) Python function is replaced with the rust implementation in the module.
Subsequent calls to the function incur minimal overhead, as the attribute corresponding to the (dummy) python function now points to the rust implementation.
Each module stores a hash of its source code (and Cargo.toml). Modules are checked on load and automatically rebuilt when any changes are detected.
By default, the binaries, source code and build logs for the compiled modules can be found in the ext subfolder (this location can be changed).
It's a work-in-progress and will likely never be as functionally complete than its C++ sister, xenoform:
-
numpyarray support - positional and keyword arguments and markers
- *args/**kwargs
- type overrides via
Annotated - callable types (partial). See below.
- free-threaded execution
- call in-crate modules
- link to external libs
-
auto-vectorisation(not supported) -
compound types(rust doesn't support this, usePyAny)
Notes:
- callable types:
- typed functions/closures are not supported.
- default type mapping (
Callable->Bound<'py, PyCFunction>) works for return values but doesn't allow for python functions/lambdas to be passed into rust. In this case override toBound<'py, PyAny>(PyAnyMethodsimplement the call... traits).
- complex: 128 bit support only (i.e. not
np.complex64) - if additional modules are specified, the files are copied into the crate
Simply decorate your rust-implemented functions with the rust decorator factory - it handles all the configuration and compilation. It can be customised with these optional parameters:
| kwarg | type(=default) | description |
|---|---|---|
py |
bool = True |
Pass the python context as the first argument. Necessary when (e.g.) creating python objects. |
dependencies |
list[str] | None = None |
Rust package dependencies, the rust_dependency convenience function can be used to specify dependency parameters, e.g. dependencies=[rust_dependency("numpy", version="0.28")]. |
imports |
list[str] | None = None |
Additional imports, e.g. imports=["numpy::{PyArray2, PyArrayMethods, PyReadonlyArray2}"] |
modules |
list[Path | str] | None = None |
Sources for additional modules |
edition |
str = "2024" |
The rust edition. |
profile |
dict[str, str] | None = None |
Overrides to (release mode) profile, e.g. optimisation level, strip symbols, etc. |
help |
str | None = None |
Docstring for the function |
verbose |
bool = False |
enable debug logging |
See the (C++) xenoform version for context.
Requires the "examples" optional dependency (and rust, of course):
uv sync --extra examplesRust vs python comparison of a non-vectorisable operation on a pd.Series:
def calc_balances_py(data: pd.Series, rate: float) -> pd.Series:
"""Cannot vectorise, since each value is dependent on the previous value"""
result = pd.Series(index=data.index)
result_a = result.to_numpy()
current_value = 0.0
for i, value in data.items():
current_value = (current_value + value) * (1 - rate)
result_a[i] = current_value # ty:ignore[invalid-assignment]
return result
@rust(
dependencies=[rust_dependency("numpy", version="0.28")],
imports=[
"numpy::{PyArray1, PyArrayMethods}",
"pyo3::types::{PyDict, PyAnyMethods}",
],
module_name="loop_rs", # override as "loop" is a rust keyword
)
def calc_balances_rust(
data: Annotated[pd.Series, "Bound<'py, PyAny>"], rate: float
) -> Annotated[pd.Series, "Bound<'py, PyAny>"]: # ty: ignore[empty-body]
""" // extract numpy arrays from the series. Note input is i64, output is f64
let data_obj = data.call_method0("to_numpy")?;
let data_np: &Bound<'py, PyArray1<i64>> = data_obj.cast()?;
let n = data_np.len()? as usize;
// use the pattern from the numpy documentation
let result_np = unsafe {
let r = PyArray1::<f64>::zeros(py, [n], false);
let mut current_value = 0.0;
for i in 0..n {
current_value = (current_value + *data_np.uget([i]) as f64) * (1.0 - rate);
*r.uget_mut([i]) = current_value;
}
r
};
// Construct a pd.Series with the same index as the input
let pd = py.import("pandas")?;
let kwargs = PyDict::new(py);
kwargs.set_item("index", data.getattr("index")?)?;
let result = pd.getattr("Series")?.call((result_np,), Some(&kwargs))?;
Ok(result) """| N | py (ms) | rust (ms) | speedup |
|---|---|---|---|
| 1000 | 0.6 | 1.9 | -68% |
| 10000 | 1.5 | 0.1 | 1410% |
| 100000 | 28.8 | 1.0 | 2775% |
| 1000000 | 136.4 | 3.0 | 4496% |
| 10000000 | 1248.0 | 25.5 | 4791% |
For reference, this is at least as fast as the equivalent xenoform implementation.
Full code is in examples/loop.py.
Rust vs python comparison of a vectorised operation on a np.array:
def calc_dist_matrix_py(p: npt.NDArray[np.float64]) -> npt.NDArray[np.float64]:
"Compute distance matrix from points, using numpy"
return np.sqrt(((p[:, np.newaxis, :] - p[np.newaxis, :, :]) ** 2).sum(axis=2))
@rust(
dependencies=[rust_dependency("numpy", version="0.28")],
imports=["numpy::{PyArray2, PyArrayMethods, PyReadonlyArray2}"],
)
def calc_dist_matrix_rust(
points: Annotated[npt.NDArray[np.float64], "PyReadonlyArray2<f64>"],
) -> Annotated[npt.NDArray[np.float64], "Bound<'py, PyArray2<f64>>"]: # ty: ignore[empty-body]
""" let points = points.as_array();
let shape = points.shape();
let (n, d) = (shape[0], shape[1]);
let result = PyArray2::zeros(py, [n, n], false);
let mut r = unsafe { result.as_array_mut() };
for i in 0..n {
for j in i + 1..n {
let mut sum = 0.0;
for k in 0..d {
let diff = points.get([i, k]).unwrap() - points.get([j, k]).unwrap();
sum += diff * diff;
}
let dist = sum.sqrt();
if let Some(x) = r.get_mut([i, j]) {
*x = dist;
}
if let Some(x) = r.get_mut([j, i]) {
*x = dist;
}
}
}
Ok(result) """| N | py (ms) | rust (ms) | speedup |
|---|---|---|---|
| 100 | 0.7 | 1.5 | -54% |
| 300 | 3.4 | 0.1 | 2838% |
| 1000 | 30.2 | 1.3 | 2246% |
| 3000 | 204.8 | 19.1 | 972% |
| 10000 | 2794.9 | 209.8 | 1232% |
For reference, this is five times faster than the xenoform implementation, which has openmp optimisations!
Full code is in examples/distance_matrix.py.