Rindle is a C++20 library for turning raw, per-ticker CSV files into training-ready datasets. It discovers input files, learns feature scalers, generates sliding windows, and produces both a manifest and contiguous tensors that can be consumed from C++ or Python.
- Configure – call
rindle::create_configto validate paths, select feature columns, and choose window geometry and scaling options. - Build – pass the configuration to
rindle::build_dataset; the driver discovers tickers, fits scalers, writes window manifests, and emitsmanifest.jsonsummarizing the build. - Load – use
rindle::get_datasetwith the manifest to materialize feature and target tensors in memory for model training or analysis.
The C++ API is mirrored in the optional Python bindings, enabling the same flow from notebooks or scripts.
Rindle’s Python package is published on PyPI as rindle.
Install with:
pip install rindle- Each ticker lives in its own CSV file inside the configured input directory.
- Files must include a header whose first column is
Date; the remaining columns are treated as numeric features. Missing numeric values are parsed asNaNand timestamps may be provided as ISO-8601 strings or integer epochs in seconds through nanoseconds. - Ticker symbols are derived from filenames (sans extension) and normalized to uppercase without whitespace.
Running build_dataset creates the following outputs:
- Per-ticker window manifests – each ticker produces a binary manifest file
(currently named
*_windows.parquet) that records every window's index range and optional target span. manifest.json– captures dataset-level metadata such as feature lists, scaler choices, window counts, and per-ticker statistics. It also stores the build timestamp, input/output directories, and a lookup table for ticker statistics.
The manifest content can be reused later to reload tensors without repeating the entire pipeline.
Datasets are represented by lightweight tensor wrappers that store contiguous
// 3. Get dataset for training (loads tensors into memory)
// Optional: percentage (default 1.0) loads a random subset of the data
auto dataset_result = rindle::get_dataset(manifest, 1.0);
/*
dataset.X: Tensor3D [windows, seq_length, n_features]
dataset.Y: Tensor3D [windows, horizon, 1]
*/feature (X) and target (Y) data along with window metadata:
Tensor3Dmodels a[window, sequence, feature]cube in row-major order and exposes helpers for indexing within a flat buffer.Datasetholds the feature/target tensors plus aWindowMetavector that tracks the source ticker and row ranges for every window.
Rindle offers several built-in scaling strategies and records the fitted parameters alongside the manifest:
ScalerKindenumerates available scalers (standard, min-max, robust, etc.) and is stored in the dataset configuration and manifest.ScalerStoreserializes the per-feature statistics to JSON for reuse, and CSV helpers exist to persist or reload artifact bundles if needed.get_feature_scalerreturns aFittedScalerfor a ticker/feature pair using either an in-memory manifest or a savedmanifest.json. The scaler exposestransform/inverse_transformhelpers, and the convenience functioninverse_transform_valuecan recover the original numeric value from the scaled tensors returned byget_dataset.
Sliding windows are produced using ticker-level statistics exposed by the manifest. The window maker can stream results to a sink (for writing manifests) or return them as in-memory vectors for smaller workloads.
include/ # Public headers (API, types, scalers)
src/ # Library implementation and internal headers
src/python/ # pybind11 bindings for the public API
examples/ # End-to-end usage demonstrations (C++ and Python)
data/ # Sample raw/processed directories for experimentation
tests/ # Catch2 test harness (placeholder)
cmake -S . -B build \
-DRINDLE_BUILD_TESTS=ON \
-DRINDLE_BUILD_EXAMPLES=ON \
-DRINDLE_BUILD_PYTHON=ON
cmake --build buildThe project targets C++20, fetches nlohmann_json, and optionally brings in
Catch2 and pybind11 for tests and bindings. Use
cmake --build build --target rindle_tests followed by ctest --test-dir build
to run the test suite when implemented.
Enable RINDLE_BUILD_PYTHON to build the rindle Python module. The bindings
expose tensor views as NumPy arrays while reusing the same configuration and
loading APIs as C++. The generated extension
module is placed in the build tree (e.g., build/src/python/rindle.*).
The repository ships a pyproject.toml configured with
scikit-build-core so the C++
extension can be packaged like a standard Python project.
The Python package re-exports the compiled module, exposes a version sourced
from package metadata, and keeps the import path as import rindle for existing
scripts.
-
Create (optional) and activate a virtual environment to keep dependencies isolated. Any virtual environment manager works; for the built-in
venvmodule:python -m venv .venv source .venv/bin/activate # On Windows use `.venv\Scripts\activate`
-
Install build prerequisites if you have not already.
pipcan compile the extension as long as CMake and a C++20 compiler are available on yourPATH. Installingbuildandwheelprovides helpful tooling:pip install --upgrade pip pip install build wheel
-
Install Rindle into the environment. From the repository root run:
pip install .This command builds the extension with scikit-build-core, installs the resulting wheel into the active environment, and exposes
import rindle. -
(Optional) Editable install for iterative development. If you intend to iterate on the bindings, install in editable mode so Python resolves the module from your working tree while still compiling the extension as needed:
pip install --editable . -
Verify the install by importing the package and checking the version:
python -c "import rindle; print(rindle.__version__)"
The examples directory contains runnable demonstrations for both languages:
examples/example_usage.cppwalks through the full C++ workflow, from configuration to printing summary statistics and inspecting windows.examples/example_usage.pymirrors the process using the Python bindings and NumPy for inspection.
Build the C++ example with the RINDLE_BUILD_EXAMPLES option and run the Python
script after building the bindings.