Note: This template uses poetry. If you prefer using pip, go to the pip branch instead.
This repository is a template for a data science project. This is the project structure I frequently use for my data science project.
- Poetry: Dependency management - article
- hydra: Manage configuration files - article
- pre-commit plugins: Automate code reviewing formatting - article
- DVC: Data version control - article
- pdoc: Automatically create an API documentation for your project
.
βββ config
β βββ main.yaml # Main configuration file
β βββ model # Configurations for training model
β β βββ model1.yaml # First variation of parameters to train model
β β βββ model2.yaml # Second variation of parameters to train model
β βββ process # Configurations for processing data
β βββ process1.yaml # First variation of parameters to process data
β βββ process2.yaml # Second variation of parameters to process data
βββ data
β βββ final # data after training the model
β βββ processed # data after processing
β βββ raw # raw data
β βββ raw.dvc # DVC file of data/raw
βββ docs # documentation for your project
βββ dvc.yaml # DVC pipeline
βββ .flake8 # configuration for flake8 - a Python formatter tool
βββ .gitignore # ignore files that cannot commit to Git
βββ Makefile # store useful commands to set up the environment
βββ models # store models
βββ notebooks # store notebooks
βββ .pre-commit-config.yaml # configurations for pre-commit
βββ pyproject.toml # dependencies for poetry
βββ README.md # describe your project
βββ src # store source code
β βββ __init__.py # make src a Python module
β βββ process.py # process data before training model
β βββ train_model.py # train model
βββ tests # store tests
βββ __init__.py # make tests a Python module
βββ test_process.py # test functions for process.py
βββ test_train_model.py # test functions for train_model.pyInstall Cookiecutter:
pip install cookiecutterCreate a project based on the template:
cookiecutter https://github.com/khuyentran1401/data-science-templateFind detailed explanation of this template here.