Bitcoin price forecasting with LSTM: a deep time series predictor for crypto markets

This project covers the full stack for forecasting Bitcoin prices using a deep learning model built on Long Short-Term Memory (LSTM) networks. It focuses on data preprocessing, model training, evaluation, and producing future price predictions with properly aligned timestamps. It is designed to be approachable for data scientists, researchers, and developers who want a solid baseline and a clear workflow for time series forecasting in finance.

Key ideas

Use historical price data and derived features to predict future prices.
Align timestamps to ensure that input sequences line up with the target horizons.
Train an LSTM-based model that can handle sequential dependencies in price data.
Provide a reproducible workflow from data collection to prediction generation.

Why this project exists

Bitcoin price movements are driven by complex factors. Simple models can miss patterns across time.
Deep learning models like LSTMs can capture longer dependencies in sequences.
A clear data pipeline helps you reproduce results and adapt the model to other assets.

Project scope

Data preprocessing: gather data, clean it, and create features suitable for model training.
Model training: build an LSTM network, train it with well-defined sequences, and evaluate it with standard metrics.
Future prediction: generate price forecasts for a future horizon with aligned timestamps for easy visualization and comparison.
Reproducibility: provide scripts and notebooks to reproduce results and experiments.

Preview and scope notes

This repository emphasizes clarity over clever tricks. The aim is a robust baseline that you can extend.
It uses common Python libraries for data science and machine learning.
The focus is on Bitcoin price data, but the structure supports other assets with minimal changes.

Table of contents

What you’ll find here
Quick start
Data pipeline
Model design
Training and evaluation
Making predictions
Reproducibility and experiments
Project structure
Datasets and data sources
Visualization and interpretation
Troubleshooting
Roadmap
Contributing
Licensing

What you’ll find here

Overview This project provides an end-to-end workflow for forecasting Bitcoin prices with an LSTM model. It includes data ingestion from public sources, preprocessing steps to align and normalize data, a modular LSTM model you can extend, and utilities to generate predictions that align with real-time timestamps.

Audience

Data scientists who want a clear baseline for time series forecasting in crypto markets.
Researchers who study price dynamics and want an accessible benchmark.
Developers who want to build dashboards or alerts around forecasted price trends.

How this reads in practice

You pull historical price data, clean it, and transform it into input features and target values.
You train an LSTM model to learn from past sequences and forecast the next price step.
You export the forecast for a horizon and align it with calendar timestamps to plot against actual prices.

Quick start

Prerequisites

Python 3.8 or newer
A Linux, macOS, or Windows development environment
Basic familiarity with Python, notebooks, and command-line tools

Install and set up

Create a virtual environment
Install dependencies from the requirements file
Start the data pipeline or the notebook that runs the experiments

Example steps

Create and activate a virtual environment
- Python -m venv venv
- For Windows: venv\Scripts\activate
- For macOS/Linux: source venv/bin/activate
Install dependencies
- pip install -r requirements.txt
Run a notebook or a script
- jupyter notebook
- python scripts/train_model.py

What to expect after setup

A data folder with preprocessed datasets
A trained model checkpoint
A notebook to reproduce experiments and explore predictions

Data pipeline

Data sources

Historical Bitcoin price data from public sources such as crypto exchange APIs and public price indices.
Optional macro-market indicators that can be added to features, such as trading volume, volatility proxies, and market sentiment indicators.

Preprocessing steps

Deduplicate and normalize timestamps to a common cadence (e.g., daily or hourly).
Handle missing values with simple imputation strategies to avoid gaps that derail sequence learning.
Normalize features to stable ranges to improve model training dynamics.
Create lag features and rolling statistics to help the model capture short-term and medium-term trends.
Build input sequences: for each target, construct a window of past time steps (look-back) and a corresponding horizon anchor for the predicted price.

Feature engineering ideas

Price features: open, high, low, close, volume
Returns and volatility: log returns, rolling standard deviation
Time features: day of week, hour of day, seasonality proxies
Custom indicators: moving averages, RSI, MACD, rate of change

Data validation and integrity

Validate timestamp alignment between input sequences and targets.
Check for outliers and remove or cap extreme values when justified.
Keep a log of preprocessing steps to ensure reproducibility.

Model design

Architectural choices

Sequence-to-value or sequence-to-sequence style: the model takes a sequence of past prices and returns a forecast for the next step.
LSTM layers to capture temporal dependencies, with dropout to improve generalization.
Optional dense layers after LSTM blocks to map learned representations to the target price.

Hyperparameters (illustrative)

Look-back window: 30 to 60 time steps (adjust to data cadence)
LSTM layers: 1 to 2 stacked layers
Hidden units: 64 to 256 per layer
Dropout: 0.2 to 0.5
Activation: tanh in LSTM cells; ReLU or linear in dense heads
Optimizer: Adam
Learning rate: 0.001 to 0.0001 with a scheduler
Loss: Mean Squared Error (MSE)
Batch size: 32 to 128
Epochs: early stopping based on validation loss

Why LSTM

LSTMs handle long-range dependencies better than simple feedforward nets in time series.
They can learn patterns across days or hours that influence future prices.
A well-structured LSTM with proper regularization can generalize to unseen periods.

Regularization and stability

Use dropout between layers to prevent overfitting.
Apply L2 weight regularization if necessary.
Normalize input features to stabilize learning.
Early stopping based on a validation set prevents overtraining.

Training and evaluation

Data split

Training set: historical data up to a chosen cutoff.
Validation set: a recent slice to monitor generalization.
Test set: the final portion used for reported metrics and robust evaluation.

Evaluation metrics

RMSE (Root Mean Squared Error)
MAE (Mean Absolute Error)
MAPE (Mean Absolute Percentage Error)
R-squared as a supplementary measure of fit

Training workflow

Load preprocessed sequences and targets
Train the LSTM model on the training set
Monitor validation loss and stop early if no improvement
Save the best model checkpoint and the associated scaler/normalizer

Model interpretation

Visualize training curves to assess convergence.
Analyze residuals to identify systematic errors.
Plot predicted vs actual prices for the validation and test periods.

Prediction and aligned timestamps

Forecast horizon

Define a forecast horizon, such as the next 7 days or 24 hours, depending on the data cadence.
The model outputs a forecast for each step in the horizon.

Timestamp alignment

Ensure each forecast timestamp aligns with the corresponding calendar time.
Use the same cadence as the training data to keep comparisons meaningful.
Create a forecast dataframe with columns for timestamp, predicted price, and confidence proxies if available.

Prediction workflow

Load the latest trained model and the data scaler
Prepare the most recent time window as input
Generate predictions for the desired horizon
Denormalize predictions if features were scaled
Save results to a CSV or display in a notebook for quick inspection

Expected outputs

A BLOB of predicted prices aligned to the forecast timestamps
Associated uncertainties or confidence intervals if implemented
Plots showing forecast trajectories compared to real prices

Reproducibility and experiments

Environment and dependencies

A dedicated virtual environment is recommended
Pin versions in a requirements file to ensure reproducibility
Document any optional libraries used for visualization or experimentation

Experiment tracking

Record hyperparameters, data splits, random seeds, and results
Store model artifacts, plots, and evaluation metrics in a structured folder
Use a simple naming convention for experiments, such as exp-YYYYMMDD-HHMM

Notebooks and scripts

Notebooks provide a narrative for the preprocessing, training, evaluation, and prediction steps
Scripts enable automated runs for batch experiments
Keep notebooks clean and well-commented for easy reuse

Performance and benchmarking

Compare against simple baselines like moving averages or ARIMA to quantify gains
Report improvements in RMSE and MAE relative to the baselines
Show how the model handles different market regimes (bull, bear, sideways)

Production considerations

If you plan to deploy, package the model and a small runtime to generate forecasts
Ensure the deployment can fetch fresh data, preprocess it, and produce up-to-date predictions
Include a minimal API or a command-line interface for integration with dashboards

Project structure

Directory overview

data/: raw and preprocessed data; manifests and sample datasets
notebooks/: Jupyter notebooks for exploration and experiments
scripts/: training, evaluation, and prediction scripts
models/: saved model checkpoints and configuration files
src/: source code modules for data handling, model architecture, and utilities
docs/: documentation and usage guides
tests/: unit tests and data integrity checks
examples/: small runnable examples and demos
artifacts/: saved artifacts from runs, such as scalers and encoders

Key files

requirements.txt: pinned dependencies for consistent environments
README.md: this document
config.yaml: experiment-level configuration for reproducibility
data_preprocessing.py: module for feature engineering and sequence creation
model.py: LSTM model definition
train.py: training loop with evaluation logic
predict.py: inference pipeline for generating forecasts
visualize.py: plotting utilities for results and diagnostics
utils.py: common helper functions

Data sources and licensing

Public data sources

Historical Bitcoin price data are pulled from public APIs and data providers.
If you extend to other assets, ensure you follow the respective data terms and licensing.

Licensing

The project uses an open source license that permits usage, modification, and redistribution.
Always respect data provider licenses and attribution requirements when using external data.

Usage patterns and examples

Training a baseline model

Start by preparing the data with the preprocessing module.
Run the training script to fit the LSTM on the prepared sequences.
Evaluate the model on the validation and test sets.
Save the best-performing model and its preprocessing state.

Generating predictions

Use the prediction script to feed the latest data into the trained model.
Retrieve forecasted prices for the desired horizon.
Visualize the forecast with actual prices to assess alignment and timing.

Example commands

Prepare data and train
- python scripts/train.py --config config.yaml --mode train
Evaluate model
- python scripts/train.py --config config.yaml --mode evaluate
Run prediction
- python scripts/predict.py --config config.yaml --horizon 168

Experimentation and tuning

Adjust the look-back window to examine how the model performs with longer or shorter histories.
Test different numbers of LSTM units and layers to balance capacity with computational cost.
Try alternative optimizers or learning rate schedules to improve convergence.
Add more features such as sentiment indicators if you have access to reliable sources.

Visualization and interpretation

Plotting forecasts

Compare forecast curves with actual price trajectories.
Display error metrics for each horizon to understand where the model struggles.
Create residual plots to spot systematic errors.

Interpreting model behavior

Use attention-like mechanisms or feature importance approximations to understand which inputs drive forecasts.
Examine which time steps in the look-back window contribute most to the prediction.

Troubleshooting

Common issues and fixes

Data alignment issues: ensure timestamps are consistent across features and target values.
Training instability: try smaller learning rates, adjust batch size, or enforce gradient clipping.
Overfitting: increase dropout, reduce model complexity, add more data.
Memory constraints: reduce look-back window, decrease batch size, use gradient accumulation.

Validation tips

Always verify the data splits and ensure no leakage from the future into the training set.
Plot training and validation losses to detect overfitting early.
Confirm that the evaluation metrics reflect real predictive performance, not just numerical accuracy.

Release assets and how to obtain them

Release process overview

The project maintains release assets on the official GitHub Releases page.
You can download prebuilt artifacts, such as packaged models or runnable notebooks, from this page.

Important note about the release link

From the Releases page, download the latest release asset and execute it. This page provides the files you need to run the project in a packaged form.

Where to find releases

The Releases page is hosted at the official repository URL.
Access it here: https://github.com/TranquilVessel/bitcoin-prediction/releases

Releases section guidance

If you want to reproduce experiments quickly, use the prebuilt artifact if available.
If you prefer to run from source, you can clone the repository and run the notebooks or scripts directly.

Releases note with direct link

Visit the Releases page to see the latest assets and notes: https://github.com/TranquilVessel/bitcoin-prediction/releases

Roadmap

Upcoming enhancements

Extend the model to multi-asset forecasting by adding parallel streams for different cryptocurrencies.
Experiment with hybrid models that combine LSTM with attention mechanisms for improved focus on recent events.
Add more robust evaluation with backtesting over multiple rolling horizons.
Integrate with a dashboard to display live forecasts and historical comparisons.

Possible improvements

Include governance features for parameter choices and experiment tracking.
Add automated data ingestion pipelines that refresh datasets daily.
Improve plotting with interactive charts to inspect predictions across multiple time frames.

Community and contributions

Contribution guidelines

Open an issue to discuss major changes before implementing them.
Start with small pull requests that fix bugs or add tests.
Keep a clear, descriptive commit message for every change.

Code style and quality

Follow the existing code style and add type hints where helpful.
Write tests for new modules and ensure existing tests pass.
Document new features with examples in the README and notebook comments.

Code of conduct

Be respectful and constructive in all interactions.
Share credit for ideas and contributions fairly.
Avoid personal attacks and focus on the work.

Security and safety

Data handling

Treat data responsibly. Do not misuse API keys or private data.
Keep sensitive credentials out of the repository; use environment variables or secret managers.

Model safety

Do not deploy the model without proper validation and monitoring.
Consider potential misuses of forecast data and implement appropriate safeguards.

Related projects and inspiration

Similar work

Time series forecasting with LSTMs in finance
Crypto price prediction using deep learning
Sequence modeling for financial data

Inspiration sources

Public datasets and benchmarks for financial time series
Open research on LSTM architectures and sequence modeling
Best practices for reproducible data science

Data sources and attribution

Public data provenance

Price data is sourced from credible public APIs and exchanges.
Any third-party indicators included in the feature set are described with their sources in the notebooks.

Attribution

Acknowledge data providers in the documentation and in any published results.
Cite relevant papers or resources if you publish research based on this project.

Screenshots and visuals

Charts and graphs

Include plots comparing predicted and actual prices over time.
Show error metrics across horizons to illustrate performance.
Provide residual plots to help readers understand where the model struggles.

Illustrative images

Bitcoin logos and crypto-inspired visuals help convey the theme.
Consider including a simple chart image generated from the notebook for quick reference.

Frequently asked questions (FAQ)

What is this project for?

To forecast Bitcoin prices using an LSTM model with a clear, reproducible workflow.

What data is used?

Historical price data and derived features created in the preprocessing step.

What should I expect from the model?

A forecast for future prices with timestamps aligned to the forecast horizon.

Do I need to train the model myself?

You can train it from scratch or use the release artifact if available. See the Releases for details.

How do I reproduce the results?

Follow the quick start steps, then run the provided notebooks and scripts to reproduce the experiments.

What if I want to adapt it to another asset?

Add the asset data and adjust the preprocessing steps to handle new price streams and features. The model architecture can stay similar.

Changelog (sample)

v0.x.y — Initial public release

Baseline LSTM model for Bitcoin price forecasting
Data preprocessing pipeline with alignment and normalization
Training and evaluation workflow
Prediction generation with aligned timestamps

v0.x.z — Minor improvements

Added more features (rolling statistics, day-of-week)
Improved data validation and logging
Documentation enhancements and examples

v0.x.z+1 — Experimental tweaks

Tuning of hyperparameters for stability
Added early stopping and model checkpointing
Visualization improvements and better error metrics

Licensing

This project is open source and designed for sharing knowledge and enabling reproducible research. You can use, modify, and distribute the code under the terms of the project’s license. Always respect the licenses of any included third-party data sources.

Appendix: quick references

Data preprocessing flow
- Load data
- Align timestamps
- Create features
- Normalize features
- Build input sequences and targets
- Save preprocessed data for training
Model workflow
- Define LSTM architecture
- Compile with optimizer and loss
- Train with early stopping
- Save best model and scaler
- Run predictions on new data

Appendix: code snippets

Example: training a model
- import necessary modules
- prepare data
- instantiate model
- compile and fit
- save the best model
Example: predicting future prices
- load model and scaler
- prepare the latest data window
- generate forecast for horizon
- save results to CSV

Appendix: project governance

Decision making is documented in issues and pull requests.
Major changes require discussion and a clear consensus.
Changes are versioned through releases with notes.

Appendix: visual assets and branding

Logo and brand colors are used consistently across notebooks and dashboards.
Visuals use simple color palettes to keep focus on data.

Appendix: deployment notes

For local runs, a lightweight setup suffices.
For production, consider containerization for reproducible environments.
Ensure that data pipelines run on a schedule and that logs are preserved.

Appendix: data and model lifecycle

Data ingestion: fetches new data periodically, validates integrity.
Preprocessing: standardizes data for modeling.
Training: updates models with fresh data to stay relevant.
Evaluation: tracks performance over time and across regimes.
Prediction: generates forward-looking forecasts for dashboards or alerts.

Appendix: additional usage notes

You can customize the forecast horizon as needed.
You can switch to a different cadence (hourly, daily) with appropriate feature engineering.
You can swap out the model with a different architecture while preserving the pipeline.

Appendix: acknowledgments

Thank contributors who made code, tests, and documentation possible.
Acknowledge data providers and any open source libraries used.

Releases and further reading

For the latest assets and notes, visit the Releases page. The release bundle often includes runnable artifacts and example notebooks you can execute directly. This is the link to the releases page again for convenience: https://github.com/TranquilVessel/bitcoin-prediction/releases

End of README content

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitattributes		.gitattributes
README.md		README.md
best_bitcoin_model.keras		best_bitcoin_model.keras
bitcoin.csv		bitcoin.csv
formatted_output.csv		formatted_output.csv
model_LSTM.ipynb		model_LSTM.ipynb
prediction.ipynb		prediction.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bitcoin price forecasting with LSTM: a deep time series predictor for crypto markets

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

TranquilVessel/bitcoin-prediction

Folders and files

Latest commit

History

Repository files navigation

Bitcoin price forecasting with LSTM: a deep time series predictor for crypto markets

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages