Skip to content

sukesan7/auro

Repository files navigation

Auro — Crypto Market Data Capture, Binary Logging, and Deterministic Replay

C++20 CMake simdjson License: MIT

Auro is a C++20 market-data systems project for capturing crypto exchange market-data messages, storing them in a compact append-only binary format, and replaying them deterministically for offline analysis, parser benchmarking, and systems experimentation.

The core of the project is a small, focused piece of market-data infrastructure built to demonstrate:

  • WebSocket ingestion
  • binary recording and integrity checks
  • indexing for faster seeks
  • deterministic replay
  • reproducible analysis artifacts

Auro also includes experimental scan utilities built on top of replay, but the primary value of the repo is the capture → record → index → replay pipeline.


What Auro currently does

  • Capture market-data messages from supported WebSocket venues into .auro recordings
  • Import packet captures into the same .auro container format
  • Replay recordings deterministically for repeatable parsing and analysis runs
  • Build indexes to support faster seek/start positions in replay workflows
  • Emit artifacts such as summary JSON and latency CSVs for benchmarking and inspection

Current scope

Auro is best understood as:

  • a market data capture and replay tool
  • a binary logging / replay infrastructure project
  • an offline analysis and benchmarking toolchain

Auro is not positioned as:

  • a live trading engine
  • a production arbitrage detector
  • a full market-simulation platform
  • a complete multi-venue research stack

Core design highlights

  • Compact append-only recording format for captured packets and metadata
  • Deterministic replay path from recorded input, with optional fast-forward mode
  • SPSC queue used to decouple I/O from downstream work in live capture paths
  • Venue parsers for normalized replay/analysis flows
  • Index support for quicker replay starts by offset, sequence, or time
  • Artifact generation for run summaries and latency measurements

Main tools

auro_ws_capture

Captures market-data WebSocket frames into .auro files.

auro_pcap_import

Imports packets from a PCAP into the .auro recording format.

auro_index_build

Builds .aidx indexes for faster seek/start operations during replay.

auro_replay_quotes

Replays .auro recordings and parses frames into lightweight quote data for benchmarking and inspection.

auro_replay_scan

Runs replay-driven validation and analysis utilities on recorded streams.

The scan utilities are currently best treated as offline analysis helpers / experimental demos, not as trading signals or production opportunity detectors.


Quickstart

Auro supports two common workflows:

  1. Core tools only (default): build the replay, indexing, import, and analysis tools
  2. Live WebSocket capture (optional): also build auro_ws_capture

1) Build the core tools

This is the recommended default path for most users.

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DAURO_ENABLE_WS_CAPTURE=OFF
cmake --build build -j
ctest --test-dir build --output-on-failure

This builds the core toolchain without requiring the optional WebSocket capture dependencies.

2) Optionally enable live WebSocket capture

If you want to build auro_ws_capture, reconfigure with capture enabled:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DAURO_ENABLE_WS_CAPTURE=ON
cmake --build build -j

auro_ws_capture requires the additional networking and TLS dependencies described below.

3) First workflow

If you already have an .auro recording, you can skip straight to replay and indexing.

Option A: Start from an existing .auro file

./build/auro_replay_quotes --in sample.auro --out quotes.csv
./build/auro_index_build --in sample.auro --out sample.aidx
./build/auro_replay_scan --in sample.auro

Option B: Capture data live, then replay it

Build with -DAURO_ENABLE_WS_CAPTURE=ON first, then:

./build/auro_ws_capture \
  --venue binance \
  --symbol BTCUSDT \
  --depth 20 \
  --interval-ms 100 \
  --duration 10 \
  --out btcusdt.auro

Then replay and index the capture:

./build/auro_replay_quotes --in btcusdt.auro --out quotes.csv
./build/auro_index_build --in btcusdt.auro --out btcusdt.aidx
./build/auro_replay_scan --in btcusdt.auro

Notes

  • Core third-party dependencies are fetched automatically with CMake FetchContent.
  • ccache is supported if available, but it is optional.
  • auro_ws_capture is optional and may require additional system packages such as Boost and OpenSSL depending on your environment.
  • Multi-stream / triangular replay analysis is currently best treated as experimental offline analysis.

Build requirements

  • C++20 compiler (GCC/Clang)
  • CMake ≥ 3.24
  • Ninja recommended, but not required
  • Dependencies are fetched automatically with FetchContent unless you configure them another way:
    • CLI11
    • Catch2
    • simdjson
  • Optional: ccache for faster rebuilds
  • For auro_ws_capture only:
    • Boost headers (Asio/Beast)
    • OpenSSL

Build core/replay tools only (default)

cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DAURO_ENABLE_WS_CAPTURE=OFF
cmake --build build -j
ctest --test-dir build --output-on-failure

If you do not have Ninja installed, omit -G Ninja.

Configure with WebSocket capture enabled

cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DAURO_ENABLE_WS_CAPTURE=ON
cmake --build build -j
ctest --test-dir build --output-on-failure

Determinism and integrity

Given the same .auro input and the same flags, replay is intended to preserve the same record stream and decision path. In other words, determinism here refers to data flow and replay results, not wall-clock runtime.

Depending on configuration and flags, the toolchain can also validate integrity using CRCs and report stream-level issues such as gaps or non-monotonic timing.


Artifacts

Replay tools write analysis artifacts under the selected output directory, including:

  • summary.json
  • latency_parse_ns.csv
  • latency_parse_to_decision_ns.csv (scanner path, when enabled)
  • opportunities.csv (scanner path, when produced)

These are intended as reproducible outputs for benchmarking and offline inspection.


Demo artifacts

Live single-stream parse latency

Live single-stream parse latency histogram

Triangular replay parse latency

Triangular replay parse-to-decision ECDF


Repository layout

.
├── CMakeLists.txt
├── apps/                  # CLI front-ends
├── cmake/                 # warnings and build helpers
├── docs/                  # format, build, architecture, tooling docs
├── include/auro/          # public headers
├── src/                   # implementation
├── tests/                 # unit and integration-style tests
└── outputs/               # ignored local artifacts and captures

Limitations

  • Venue support is intentionally limited and parser coverage is not yet broad.
  • auro_replay_quotes is currently best understood as a Binance-focused replay / quote parsing tool.
  • Some replay-based scan utilities are still experimental and should be interpreted carefully.
  • The project is designed for offline capture/replay workflows, not live order execution.
  • The build currently relies on CMake fetching third-party dependencies during configure time.
  • auro_ws_capture is optional and depends on additional networking and TLS libraries that are not required for the core replay toolchain.

Documentation

See the docs index in docs/README.md for:

  • recording format details
  • build instructions
  • tool usage
  • determinism and integrity notes
  • architecture overview
  • troubleshooting

License

MIT. See LICENSE.

Disclaimer

Auro is provided for educational and research purposes only. It does not provide investment, trading, or financial advice, and it is not intended for use in live trading or as a production trading system. You are solely responsible for how you use this software and for complying with any applicable laws, regulations, and exchange terms of service.

This project is unaffiliated with any exchange, and all trademarks and names belong to their respective owners.