Skip to content

Bitcoin price prediction with LSTM models using TensorFlow and Pandas. Train and evaluate models; visualize forecasts and compare predictions to actual prices 🐙

Notifications You must be signed in to change notification settings

TranquilVessel/bitcoin-prediction

Repository files navigation

Releases

Bitcoin price forecasting with LSTM: a deep time series predictor for crypto markets

Bitcoin Logo

This project covers the full stack for forecasting Bitcoin prices using a deep learning model built on Long Short-Term Memory (LSTM) networks. It focuses on data preprocessing, model training, evaluation, and producing future price predictions with properly aligned timestamps. It is designed to be approachable for data scientists, researchers, and developers who want a solid baseline and a clear workflow for time series forecasting in finance.

Key ideas

  • Use historical price data and derived features to predict future prices.
  • Align timestamps to ensure that input sequences line up with the target horizons.
  • Train an LSTM-based model that can handle sequential dependencies in price data.
  • Provide a reproducible workflow from data collection to prediction generation.

Why this project exists

  • Bitcoin price movements are driven by complex factors. Simple models can miss patterns across time.
  • Deep learning models like LSTMs can capture longer dependencies in sequences.
  • A clear data pipeline helps you reproduce results and adapt the model to other assets.

Project scope

  • Data preprocessing: gather data, clean it, and create features suitable for model training.
  • Model training: build an LSTM network, train it with well-defined sequences, and evaluate it with standard metrics.
  • Future prediction: generate price forecasts for a future horizon with aligned timestamps for easy visualization and comparison.
  • Reproducibility: provide scripts and notebooks to reproduce results and experiments.

Preview and scope notes

  • This repository emphasizes clarity over clever tricks. The aim is a robust baseline that you can extend.
  • It uses common Python libraries for data science and machine learning.
  • The focus is on Bitcoin price data, but the structure supports other assets with minimal changes.

Table of contents

  • What you’ll find here
  • Quick start
  • Data pipeline
  • Model design
  • Training and evaluation
  • Making predictions
  • Reproducibility and experiments
  • Project structure
  • Datasets and data sources
  • Visualization and interpretation
  • Troubleshooting
  • Roadmap
  • Contributing
  • Licensing

What you’ll find here

Overview This project provides an end-to-end workflow for forecasting Bitcoin prices with an LSTM model. It includes data ingestion from public sources, preprocessing steps to align and normalize data, a modular LSTM model you can extend, and utilities to generate predictions that align with real-time timestamps.

Audience

  • Data scientists who want a clear baseline for time series forecasting in crypto markets.
  • Researchers who study price dynamics and want an accessible benchmark.
  • Developers who want to build dashboards or alerts around forecasted price trends.

How this reads in practice

  • You pull historical price data, clean it, and transform it into input features and target values.
  • You train an LSTM model to learn from past sequences and forecast the next price step.
  • You export the forecast for a horizon and align it with calendar timestamps to plot against actual prices.

Quick start

Prerequisites

  • Python 3.8 or newer
  • A Linux, macOS, or Windows development environment
  • Basic familiarity with Python, notebooks, and command-line tools

Install and set up

  • Create a virtual environment
  • Install dependencies from the requirements file
  • Start the data pipeline or the notebook that runs the experiments

Example steps

  • Create and activate a virtual environment
    • Python -m venv venv
    • For Windows: venv\Scripts\activate
    • For macOS/Linux: source venv/bin/activate
  • Install dependencies
    • pip install -r requirements.txt
  • Run a notebook or a script
    • jupyter notebook
    • python scripts/train_model.py

What to expect after setup

  • A data folder with preprocessed datasets
  • A trained model checkpoint
  • A notebook to reproduce experiments and explore predictions

Data pipeline

Data sources

  • Historical Bitcoin price data from public sources such as crypto exchange APIs and public price indices.
  • Optional macro-market indicators that can be added to features, such as trading volume, volatility proxies, and market sentiment indicators.

Preprocessing steps

  • Deduplicate and normalize timestamps to a common cadence (e.g., daily or hourly).
  • Handle missing values with simple imputation strategies to avoid gaps that derail sequence learning.
  • Normalize features to stable ranges to improve model training dynamics.
  • Create lag features and rolling statistics to help the model capture short-term and medium-term trends.
  • Build input sequences: for each target, construct a window of past time steps (look-back) and a corresponding horizon anchor for the predicted price.

Feature engineering ideas

  • Price features: open, high, low, close, volume
  • Returns and volatility: log returns, rolling standard deviation
  • Time features: day of week, hour of day, seasonality proxies
  • Custom indicators: moving averages, RSI, MACD, rate of change

Data validation and integrity

  • Validate timestamp alignment between input sequences and targets.
  • Check for outliers and remove or cap extreme values when justified.
  • Keep a log of preprocessing steps to ensure reproducibility.

Model design

Architectural choices

  • Sequence-to-value or sequence-to-sequence style: the model takes a sequence of past prices and returns a forecast for the next step.
  • LSTM layers to capture temporal dependencies, with dropout to improve generalization.
  • Optional dense layers after LSTM blocks to map learned representations to the target price.

Hyperparameters (illustrative)

  • Look-back window: 30 to 60 time steps (adjust to data cadence)
  • LSTM layers: 1 to 2 stacked layers
  • Hidden units: 64 to 256 per layer
  • Dropout: 0.2 to 0.5
  • Activation: tanh in LSTM cells; ReLU or linear in dense heads
  • Optimizer: Adam
  • Learning rate: 0.001 to 0.0001 with a scheduler
  • Loss: Mean Squared Error (MSE)
  • Batch size: 32 to 128
  • Epochs: early stopping based on validation loss

Why LSTM

  • LSTMs handle long-range dependencies better than simple feedforward nets in time series.
  • They can learn patterns across days or hours that influence future prices.
  • A well-structured LSTM with proper regularization can generalize to unseen periods.

Regularization and stability

  • Use dropout between layers to prevent overfitting.
  • Apply L2 weight regularization if necessary.
  • Normalize input features to stabilize learning.
  • Early stopping based on a validation set prevents overtraining.

Training and evaluation

Data split

  • Training set: historical data up to a chosen cutoff.
  • Validation set: a recent slice to monitor generalization.
  • Test set: the final portion used for reported metrics and robust evaluation.

Evaluation metrics

  • RMSE (Root Mean Squared Error)
  • MAE (Mean Absolute Error)
  • MAPE (Mean Absolute Percentage Error)
  • R-squared as a supplementary measure of fit

Training workflow

  • Load preprocessed sequences and targets
  • Train the LSTM model on the training set
  • Monitor validation loss and stop early if no improvement
  • Save the best model checkpoint and the associated scaler/normalizer

Model interpretation

  • Visualize training curves to assess convergence.
  • Analyze residuals to identify systematic errors.
  • Plot predicted vs actual prices for the validation and test periods.

Prediction and aligned timestamps

Forecast horizon

  • Define a forecast horizon, such as the next 7 days or 24 hours, depending on the data cadence.
  • The model outputs a forecast for each step in the horizon.

Timestamp alignment

  • Ensure each forecast timestamp aligns with the corresponding calendar time.
  • Use the same cadence as the training data to keep comparisons meaningful.
  • Create a forecast dataframe with columns for timestamp, predicted price, and confidence proxies if available.

Prediction workflow

  • Load the latest trained model and the data scaler
  • Prepare the most recent time window as input
  • Generate predictions for the desired horizon
  • Denormalize predictions if features were scaled
  • Save results to a CSV or display in a notebook for quick inspection

Expected outputs

  • A BLOB of predicted prices aligned to the forecast timestamps
  • Associated uncertainties or confidence intervals if implemented
  • Plots showing forecast trajectories compared to real prices

Reproducibility and experiments

Environment and dependencies

  • A dedicated virtual environment is recommended
  • Pin versions in a requirements file to ensure reproducibility
  • Document any optional libraries used for visualization or experimentation

Experiment tracking

  • Record hyperparameters, data splits, random seeds, and results
  • Store model artifacts, plots, and evaluation metrics in a structured folder
  • Use a simple naming convention for experiments, such as exp-YYYYMMDD-HHMM

Notebooks and scripts

  • Notebooks provide a narrative for the preprocessing, training, evaluation, and prediction steps
  • Scripts enable automated runs for batch experiments
  • Keep notebooks clean and well-commented for easy reuse

Performance and benchmarking

  • Compare against simple baselines like moving averages or ARIMA to quantify gains
  • Report improvements in RMSE and MAE relative to the baselines
  • Show how the model handles different market regimes (bull, bear, sideways)

Production considerations

  • If you plan to deploy, package the model and a small runtime to generate forecasts
  • Ensure the deployment can fetch fresh data, preprocess it, and produce up-to-date predictions
  • Include a minimal API or a command-line interface for integration with dashboards

Project structure

Directory overview

  • data/: raw and preprocessed data; manifests and sample datasets
  • notebooks/: Jupyter notebooks for exploration and experiments
  • scripts/: training, evaluation, and prediction scripts
  • models/: saved model checkpoints and configuration files
  • src/: source code modules for data handling, model architecture, and utilities
  • docs/: documentation and usage guides
  • tests/: unit tests and data integrity checks
  • examples/: small runnable examples and demos
  • artifacts/: saved artifacts from runs, such as scalers and encoders

Key files

  • requirements.txt: pinned dependencies for consistent environments
  • README.md: this document
  • config.yaml: experiment-level configuration for reproducibility
  • data_preprocessing.py: module for feature engineering and sequence creation
  • model.py: LSTM model definition
  • train.py: training loop with evaluation logic
  • predict.py: inference pipeline for generating forecasts
  • visualize.py: plotting utilities for results and diagnostics
  • utils.py: common helper functions

Data sources and licensing

Public data sources

  • Historical Bitcoin price data are pulled from public APIs and data providers.
  • If you extend to other assets, ensure you follow the respective data terms and licensing.

Licensing

  • The project uses an open source license that permits usage, modification, and redistribution.
  • Always respect data provider licenses and attribution requirements when using external data.

Usage patterns and examples

Training a baseline model

  • Start by preparing the data with the preprocessing module.
  • Run the training script to fit the LSTM on the prepared sequences.
  • Evaluate the model on the validation and test sets.
  • Save the best-performing model and its preprocessing state.

Generating predictions

  • Use the prediction script to feed the latest data into the trained model.
  • Retrieve forecasted prices for the desired horizon.
  • Visualize the forecast with actual prices to assess alignment and timing.

Example commands

  • Prepare data and train
    • python scripts/train.py --config config.yaml --mode train
  • Evaluate model
    • python scripts/train.py --config config.yaml --mode evaluate
  • Run prediction
    • python scripts/predict.py --config config.yaml --horizon 168

Experimentation and tuning

  • Adjust the look-back window to examine how the model performs with longer or shorter histories.
  • Test different numbers of LSTM units and layers to balance capacity with computational cost.
  • Try alternative optimizers or learning rate schedules to improve convergence.
  • Add more features such as sentiment indicators if you have access to reliable sources.

Visualization and interpretation

Plotting forecasts

  • Compare forecast curves with actual price trajectories.
  • Display error metrics for each horizon to understand where the model struggles.
  • Create residual plots to spot systematic errors.

Interpreting model behavior

  • Use attention-like mechanisms or feature importance approximations to understand which inputs drive forecasts.
  • Examine which time steps in the look-back window contribute most to the prediction.

Troubleshooting

Common issues and fixes

  • Data alignment issues: ensure timestamps are consistent across features and target values.
  • Training instability: try smaller learning rates, adjust batch size, or enforce gradient clipping.
  • Overfitting: increase dropout, reduce model complexity, add more data.
  • Memory constraints: reduce look-back window, decrease batch size, use gradient accumulation.

Validation tips

  • Always verify the data splits and ensure no leakage from the future into the training set.
  • Plot training and validation losses to detect overfitting early.
  • Confirm that the evaluation metrics reflect real predictive performance, not just numerical accuracy.

Release assets and how to obtain them

Release process overview

  • The project maintains release assets on the official GitHub Releases page.
  • You can download prebuilt artifacts, such as packaged models or runnable notebooks, from this page.

Important note about the release link

  • From the Releases page, download the latest release asset and execute it. This page provides the files you need to run the project in a packaged form.

Where to find releases

Releases section guidance

  • If you want to reproduce experiments quickly, use the prebuilt artifact if available.
  • If you prefer to run from source, you can clone the repository and run the notebooks or scripts directly.

Releases note with direct link

Roadmap

Upcoming enhancements

  • Extend the model to multi-asset forecasting by adding parallel streams for different cryptocurrencies.
  • Experiment with hybrid models that combine LSTM with attention mechanisms for improved focus on recent events.
  • Add more robust evaluation with backtesting over multiple rolling horizons.
  • Integrate with a dashboard to display live forecasts and historical comparisons.

Possible improvements

  • Include governance features for parameter choices and experiment tracking.
  • Add automated data ingestion pipelines that refresh datasets daily.
  • Improve plotting with interactive charts to inspect predictions across multiple time frames.

Community and contributions

Contribution guidelines

  • Open an issue to discuss major changes before implementing them.
  • Start with small pull requests that fix bugs or add tests.
  • Keep a clear, descriptive commit message for every change.

Code style and quality

  • Follow the existing code style and add type hints where helpful.
  • Write tests for new modules and ensure existing tests pass.
  • Document new features with examples in the README and notebook comments.

Code of conduct

  • Be respectful and constructive in all interactions.
  • Share credit for ideas and contributions fairly.
  • Avoid personal attacks and focus on the work.

Security and safety

Data handling

  • Treat data responsibly. Do not misuse API keys or private data.
  • Keep sensitive credentials out of the repository; use environment variables or secret managers.

Model safety

  • Do not deploy the model without proper validation and monitoring.
  • Consider potential misuses of forecast data and implement appropriate safeguards.

Related projects and inspiration

Similar work

  • Time series forecasting with LSTMs in finance
  • Crypto price prediction using deep learning
  • Sequence modeling for financial data

Inspiration sources

  • Public datasets and benchmarks for financial time series
  • Open research on LSTM architectures and sequence modeling
  • Best practices for reproducible data science

Data sources and attribution

Public data provenance

  • Price data is sourced from credible public APIs and exchanges.
  • Any third-party indicators included in the feature set are described with their sources in the notebooks.

Attribution

  • Acknowledge data providers in the documentation and in any published results.
  • Cite relevant papers or resources if you publish research based on this project.

Screenshots and visuals

Charts and graphs

  • Include plots comparing predicted and actual prices over time.
  • Show error metrics across horizons to illustrate performance.
  • Provide residual plots to help readers understand where the model struggles.

Illustrative images

  • Bitcoin logos and crypto-inspired visuals help convey the theme.
  • Consider including a simple chart image generated from the notebook for quick reference.

Frequently asked questions (FAQ)

What is this project for?

  • To forecast Bitcoin prices using an LSTM model with a clear, reproducible workflow.

What data is used?

  • Historical price data and derived features created in the preprocessing step.

What should I expect from the model?

  • A forecast for future prices with timestamps aligned to the forecast horizon.

Do I need to train the model myself?

  • You can train it from scratch or use the release artifact if available. See the Releases for details.

How do I reproduce the results?

  • Follow the quick start steps, then run the provided notebooks and scripts to reproduce the experiments.

What if I want to adapt it to another asset?

  • Add the asset data and adjust the preprocessing steps to handle new price streams and features. The model architecture can stay similar.

Changelog (sample)

v0.x.y — Initial public release

  • Baseline LSTM model for Bitcoin price forecasting
  • Data preprocessing pipeline with alignment and normalization
  • Training and evaluation workflow
  • Prediction generation with aligned timestamps

v0.x.z — Minor improvements

  • Added more features (rolling statistics, day-of-week)
  • Improved data validation and logging
  • Documentation enhancements and examples

v0.x.z+1 — Experimental tweaks

  • Tuning of hyperparameters for stability
  • Added early stopping and model checkpointing
  • Visualization improvements and better error metrics

Licensing

This project is open source and designed for sharing knowledge and enabling reproducible research. You can use, modify, and distribute the code under the terms of the project’s license. Always respect the licenses of any included third-party data sources.

Appendix: quick references

  • Data preprocessing flow

    • Load data
    • Align timestamps
    • Create features
    • Normalize features
    • Build input sequences and targets
    • Save preprocessed data for training
  • Model workflow

    • Define LSTM architecture
    • Compile with optimizer and loss
    • Train with early stopping
    • Save best model and scaler
    • Run predictions on new data

Appendix: code snippets

  • Example: training a model

    • import necessary modules
    • prepare data
    • instantiate model
    • compile and fit
    • save the best model
  • Example: predicting future prices

    • load model and scaler
    • prepare the latest data window
    • generate forecast for horizon
    • save results to CSV

Appendix: project governance

  • Decision making is documented in issues and pull requests.
  • Major changes require discussion and a clear consensus.
  • Changes are versioned through releases with notes.

Appendix: visual assets and branding

  • Logo and brand colors are used consistently across notebooks and dashboards.
  • Visuals use simple color palettes to keep focus on data.

Appendix: deployment notes

  • For local runs, a lightweight setup suffices.
  • For production, consider containerization for reproducible environments.
  • Ensure that data pipelines run on a schedule and that logs are preserved.

Appendix: data and model lifecycle

  • Data ingestion: fetches new data periodically, validates integrity.
  • Preprocessing: standardizes data for modeling.
  • Training: updates models with fresh data to stay relevant.
  • Evaluation: tracks performance over time and across regimes.
  • Prediction: generates forward-looking forecasts for dashboards or alerts.

Appendix: additional usage notes

  • You can customize the forecast horizon as needed.
  • You can switch to a different cadence (hourly, daily) with appropriate feature engineering.
  • You can swap out the model with a different architecture while preserving the pipeline.

Appendix: acknowledgments

  • Thank contributors who made code, tests, and documentation possible.
  • Acknowledge data providers and any open source libraries used.

Releases and further reading

End of README content

About

Bitcoin price prediction with LSTM models using TensorFlow and Pandas. Train and evaluate models; visualize forecasts and compare predictions to actual prices 🐙

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages