StockDataDump is a high-throughput historical stock data dumper designed for large-scale quantitative research pipelines.
It combines a Rust-powered concurrent fetcher with a Python orchestration and conversion layer, enabling rapid data acquisition, compact storage, and seamless transformation into analytics-ready formats.
StockDataDump provides:
-
Rust hot-path fetcher
Concurrent HTTP downloads using Tokio + Reqwest, streaming responses directly through zstd compression into.zstdump files. -
Python orchestration layer
Manifest generation, job scheduling, error handling, and conversion to columnar formats such as Parquet and Feather using pandas/pyarrow/numpy. -
Optimized storage formats
Raw dumps use zstd compression; converted Parquet outputs support zstd or snappy for fast reads and reduced disk usage.
This architecture allows capturing large universes of symbols quickly, while producing small, efficient datasets ideal for backtesting, machine learning, and long-horizon research.
rust-core/ # Rust fetcher (`dump-core`), built with Tokio + Reqwest + zstd
python/ # Python CLI (`stockdatadump`), manifest builder + converter
scripts/ # Helper scripts for install/clean/update workflows
dumps/ # Default output location for manifests, raw `.zst`, and Parquet files
cd rust-core
cargo build --releaseFrom the repository root:
pip install -e python./scripts/interface.sh # interactive menu: build, install, clean, update
./scripts/install.sh # installs Rust core + Python tools
./scripts/clean.sh # removes generated artifacts
./scripts/update.sh # rebuilds Rust and reinstalls Python packageYahoo Finance requires both a crumb and a cookie for authentication.
These may be passed directly or exported as environment variables (YAHOO_CRUMB, YAHOO_COOKIE).
To obtain valid credentials:
- Open your browser and navigate to https://finance.yahoo.com/quote/AAPL/history
- Open Developer Tools (F12) and go to the Network tab
- Download historical data (click Download button or adjust date range)
- Find the download request in the Network tab and click on it
- Extract credentials:
- Crumb: Look in the request URL for the
crumb=parameter (e.g.,crumb=abc123xyz) - Cookie: Copy the entire
Cookieheader value from the request headers
- Crumb: Look in the request URL for the
Note: These credentials may expire after some time. If you get 401 errors, regenerate them.
stockdatadump manifest AAPL MSFT SPY \
-o dumps/manifests/yahoo.jsonl \
--start 2023-01-01 \
--crumb "your-actual-crumb-value" \
--cookie "B=actual-cookie-value; other-cookies=values"Or use environment variables:
export YAHOO_CRUMB="your-actual-crumb-value"
export YAHOO_COOKIE="B=actual-cookie-value; other-cookies=values"
stockdatadump manifest AAPL MSFT SPY \
-o dumps/manifests/yahoo.jsonl \
--start 2023-01-01stockdatadump fetch \
--manifest dumps/manifests/yahoo.jsonl \
--output-dir dumps/raw \
--concurrency 12This writes compressed .zst files into dumps/raw/.
stockdatadump convert \
--dumps-dir dumps/raw \
--output dumps/arrow/dump.parquet \
--format parquet \
--compression zstdTo quickly preview a compressed dump:
stockdatadump head dumps/raw/AAPL.zstThis decompresses the stream and prints the first few records.
The Rust dump-core fetcher expects NDJSON, where each line contains a symbol and a url:
{"symbol": "AAPL", "url": "https://query1.finance.yahoo.com/v7/finance/download/AAPL?..."}
{"symbol": "MSFT", "url": "https://query1.finance.yahoo.com/v7/finance/download/MSFT?..."}Manifests are fully generated by the CLI but can be manually constructed for custom data sources.
This project is licensed under OpenNET LLC.