Skip to content

owini/data-science-template

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

89 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

View on Medium

Data Science Cookie Cutter

Note: This template uses poetry. If you prefer using pip, go to the pip branch instead.

What is this?

This repository is a template for a data science project. This is the project structure I frequently use for my data science project.

Tools used in this project

Project Structure

.
β”œβ”€β”€ config                      
β”‚   β”œβ”€β”€ main.yaml                   # Main configuration file
β”‚   β”œβ”€β”€ model                       # Configurations for training model
β”‚   β”‚   β”œβ”€β”€ model1.yaml             # First variation of parameters to train model
β”‚   β”‚   └── model2.yaml             # Second variation of parameters to train model
β”‚   └── process                     # Configurations for processing data
β”‚       β”œβ”€β”€ process1.yaml           # First variation of parameters to process data
β”‚       └── process2.yaml           # Second variation of parameters to process data
β”œβ”€β”€ data            
β”‚   β”œβ”€β”€ final                       # data after training the model
β”‚   β”œβ”€β”€ processed                   # data after processing
β”‚   β”œβ”€β”€ raw                         # raw data
β”‚   └── raw.dvc                     # DVC file of data/raw
β”œβ”€β”€ docs                            # documentation for your project
β”œβ”€β”€ dvc.yaml                        # DVC pipeline
β”œβ”€β”€ .flake8                         # configuration for flake8 - a Python formatter tool
β”œβ”€β”€ .gitignore                      # ignore files that cannot commit to Git
β”œβ”€β”€ Makefile                        # store useful commands to set up the environment
β”œβ”€β”€ models                          # store models
β”œβ”€β”€ notebooks                       # store notebooks
β”œβ”€β”€ .pre-commit-config.yaml         # configurations for pre-commit
β”œβ”€β”€ pyproject.toml                  # dependencies for poetry
β”œβ”€β”€ README.md                       # describe your project
β”œβ”€β”€ src                             # store source code
β”‚   β”œβ”€β”€ __init__.py                 # make src a Python module 
β”‚   β”œβ”€β”€ process.py                  # process data before training model
β”‚   └── train_model.py              # train model
└── tests                           # store tests
    β”œβ”€β”€ __init__.py                 # make tests a Python module 
    β”œβ”€β”€ test_process.py             # test functions for process.py
    └── test_train_model.py         # test functions for train_model.py

How to use this project

Install Cookiecutter:

pip install cookiecutter

Create a project based on the template:

cookiecutter https://github.com/khuyentran1401/data-science-template

Find detailed explanation of this template here.

About

Template for a data science project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 77.2%
  • Makefile 22.8%