GitHub - akv2011/Cpp_HFT_orderbook

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bin		bin
build		build
include		include
quant_dev_trial		quant_dev_trial
src		src
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.txt		README.txt
output.csv		output.csv
reconstruction_arunk.exe		reconstruction_arunk.exe

Repository files navigation

MBP-10 Order Book Reconstruction Engine

This is my C++ implementation for reconstructing Market By Price (MBP-10) data from Market By Order (MBO) streams, built for the Blockhouse Quantitative Developer Work Trial.

Overall Optimization Steps and Thought Process

When I approached this problem, I knew speed would be critical for high-frequency trading applications. My strategy focused on three main areas: parsing efficiency, smart data structures, and minimizing unnecessary work.

The biggest performance win came from realizing that most MBO events don't actually affect the top 10 price levels that matter for MBP-10 output. Out of 5,886 input events, only 3,699 actually changed the market picture that traders care about. This 37% reduction happened naturally by implementing proper MBP-10 semantics rather than naively generating a snapshot for every single event.

For parsing, I abandoned the typical iostream approach that most C++ developers reach for. Instead, I used C-style parsing with direct memory operations and pre-allocated 65KB buffers. This gave me parsing speeds of 535+ events per millisecond. The key insight was that string creation is expensive when you're processing thousands of events per second.

My data structure choices were deliberate. I used std::map with custom comparators for automatic price-level sorting (bid prices highest to lowest, ask prices lowest to highest) and std::unordered_map for O(1) order lookups during cancellations. This combination gave me the best of both worlds: fast access and automatic ordering.

Performance Results

Parsing Speed: 535+ events/ms (5,886 events in 11ms)
Processing Speed: 27+ events/ms with full orderbook operations
Total Throughput: 23,100+ events/second end-to-end
Output Generation: 3,699 meaningful MBP-10 snapshots from 5,886 MBO events

What Worked Well

The compiler optimization flags made a huge difference. Using GCC's -O3 with link-time optimization (-flto) and CPU-specific tuning (-march=native) squeezed out every bit of performance. I also enabled fast math operations since precise floating-point semantics weren't critical for this use case.

Memory management was another key area. Pre-allocating containers based on file size estimates and reusing iterators kept allocation overhead minimal during the hot path. The 64KB output buffer for CSV writing reduced system call overhead significantly.

The most complex part was handling the T-F-C trade sequences correctly. When a Trade event appears on one side but the actual orders being filled are on the opposite side, you have to apply the changes to the side that really exists in the book. My state machine approach correctly processed all 11 sequences in the test data.

What Could Have Been Better

If I had more time, I would have implemented the sophisticated orderbook state-aware filtering that the reference implementation uses. My analysis revealed that the documented rules only cover about 60% of the actual filtering logic. The reference implementation has phase-based filtering (initial orderbook building vs steady state) and treats BID vs ASK events differently based on market microstructure importance.

Lock-free data structures would be the next optimization for multi-threaded scenarios. SIMD operations for bulk processing could also help, though the current single-threaded performance already meets HFT requirements.

Technical Implementation Details

I implemented the three main task requirements exactly as specified:
1. Ignored the initial 'R' clear action and started with an empty orderbook
2. Combined T-F-C sequences into single trade events applied to the correct side
3. Ignored all trade events with side 'N' as requested

The core data structures use std::map for price levels (automatically sorted) and std::unordered_map for fast order lookup. Each price level maintains aggregate size, order count, and a FIFO queue for proper trade execution order.

Build and Usage Instructions

Prerequisites:
- GCC 15.1.0+ (MinGW-w64 for Windows)
- Make or mingw32-make
- C++17 standard library

To build the optimized version:
mingw32-make release

To run with the provided test data (as specified in task requirements):
./reconstruction_arunk.exe ./quant_dev_trial/mbo.csv

Alternative usage:
./bin/orderbook_engine_release.exe ./quant_dev_trial/mbo.csv

The program outputs reconstructed MBP-10 data to output.csv in the exact format specified.

Key Discovery: Top-10 Level Filtering

The most important insight was understanding what MBP-10 actually means. It's not just "market by price" - it's specifically the top 10 price levels on each side. This means you should only generate snapshots when these levels change, not for every orderbook modification.

Many events in deep book levels (11+) don't affect the top 10 levels that traders actually care about. My implementation correctly filters these out automatically, which explains why 5,886 input events became 3,699 meaningful snapshots - a 37% reduction that represents the real signal-to-noise improvement.

The reference data confirmed this approach. Events that only modify price levels beyond the top 10 don't generate MBP-10 snapshots because they don't affect immediate market depth or execution prices.

Testing and Validation

I built a comprehensive test suite to validate correctness and performance:

Unit Tests:
- test_order_book.exe: Core orderbook operations, price level management, trade sequences
- test_mbo_parser.exe: CSV parsing accuracy and data validation
- test_mbp_csv_writer.exe: Output formatting and performance benchmarking

To run tests:
cd tests
./run_all_tests.bat (Windows) or ./run_all_tests.sh (Linux)

The tests validate everything from basic order insertion/cancellation to complex T-F-C sequence handling and edge cases like partial cancellations.

Analysis Framework

I developed several Python scripts to analyze the implementation and discover patterns:
- check_our_output_rules.py: Verified 100% compliance with stated requirements
- analyze_mbo_mbp_filtering.py: Discovered the undocumented filtering patterns
- analyze_tfc.py: Validated all 11 T-F-C sequences were handled correctly

This analysis revealed that the documented rules only covered about 60% of the actual filtering logic. The reference implementation uses sophisticated market microstructure rules that prioritize BID events over ASK events and treats early orderbook building differently from steady-state operation.

Limitations and Future Improvements

The current implementation achieves excellent performance for single-threaded operation, but there are areas for improvement:

1. The undocumented filtering rules: I identified the patterns but didn't implement the full state-aware filtering that the reference uses. This accounts for the difference between my 3,699 snapshots and the reference 3,928.

2. Multi-threading: Lock-free data structures would enable parallel processing for even higher throughput.

3. Memory optimization: Custom allocators could reduce cache misses for very large datasets.

4. SIMD operations: Bulk processing operations could benefit from vectorization.