Skip to content

Commit 09cc69e

Browse files
committed
Release v0.2.0
1 parent cd7542a commit 09cc69e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+1168
-1220
lines changed

.bumpversion.cfg

Lines changed: 0 additions & 6 deletions
This file was deleted.

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ poetry.lock
2222

2323
# Project
2424
/docs/*
25+
/models/*
26+
/outputs/*
2527
!**/.gitkeep
2628

2729
# Python

.pre-commit-config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@ repos:
2020
- repo: local
2121
hooks:
2222
- id: invoke-check
23-
name: invoke check
23+
name: invoke checks
2424
language: system
2525
pass_filenames: false
2626
verbose: true
27-
entry: invoke check
27+
entry: invoke checks

README.md

Lines changed: 73 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@ You can use this package as part of your MLOps toolkit or platform (e.g., Model
2121
- [Tools](#tools)
2222
- [Automation](#automation-1)
2323
- [Commit: Pre-Commit](#commit-pre-commit)
24-
- [Release: Bump2version](#release-bump2version)
2524
- [Tasks: PyInvoke](#tasks-pyinvoke)
2625
- [CLI](#cli)
2726
- [Parser: Argparse!](#parser-argparse)
@@ -95,7 +94,7 @@ This section details the requirements, actions, and next steps to kickstart your
9594
## Prerequisites
9695

9796
- [Python>=3.12](https://www.python.org/downloads/) (to benefit from [the latest features and performance improvements](https://docs.python.org/3/whatsnew/3.12.html))
98-
- [Poetry>=1.5.1](https://python-poetry.org/) (to initialize the project [virtual environment](https://docs.python.org/3/library/venv.html) and its dependencies)
97+
- [Poetry>=1.7.1](https://python-poetry.org/) (to initialize the project [virtual environment](https://docs.python.org/3/library/venv.html) and its dependencies)
9998

10099
## Installation
101100

@@ -134,18 +133,20 @@ You can add or edit config files in the `confs/` folder to change the program be
134133
job:
135134
KIND: TrainingJob
136135
inputs:
137-
KIND: ParquetDataset
136+
KIND: ParquetReader
138137
path: data/inputs.parquet
139-
target:
140-
KIND: ParquetDataset
141-
path: data/target.parquet
142-
output_model: outputs/model.joblib
138+
targets:
139+
KIND: ParquetReader
140+
path: data/targets.parquet
141+
serializer:
142+
KIND: JoblibModelSerializer
143+
path: models/model.joblib
143144
```
144145
145146
This config file instructs the program to start a `TrainingJob` with 3 parameters:
146147
- `inputs`: dataset that contains the model inputs
147-
- `target`: dataset that contains the model target
148-
- `output_model`: output path to the model artifact
148+
- `targets`: dataset that contains the model target
149+
- `serializer`: output path to the model artifact
149150

150151
You can find all the parameters of your program in the `src/[package]/jobs.py`.
151152

@@ -156,7 +157,6 @@ The project code can be executed with poetry during your development:
156157
```bash
157158
$ poetry run [package] confs/tuning.yaml
158159
$ poetry run [package] confs/training.yaml
159-
$ poetry run [package] confs/transition.yaml
160160
$ poetry run [package] confs/inference.yaml
161161
```
162162

@@ -166,7 +166,7 @@ In production, you can build, ship, and run the project as a Python package:
166166
poetry build
167167
poetry publish # optional
168168
python -m pip install [package]
169-
[package] confs/transition.yaml
169+
[package] confs/inference.yaml
170170
```
171171

172172
You can also install and use this package as a library for another AI/ML project:
@@ -187,49 +187,47 @@ You can invoke the actions from the [command-line](https://www.pyinvoke.org/) or
187187

188188
```bash
189189
# execute the project DAG
190-
$ inv dag
190+
$ inv dags
191191
# create a code archive
192-
$ inv package
192+
$ inv packages
193193
# list other actions
194194
$ inv --list
195195
```
196196

197197
**Available tasks**:
198-
- `bump.release (bump)`: Bump a release: major, minor, patch.
199-
- `bump.version`: Bump to the new version.
200-
- `check.all (check)`: Run all check tasks.
201-
- `check.code`: Check the codes with pylint.
202-
- `check.coverage`: Check the coverage with coverage.
203-
- `check.format`: Check the formats with isort and black.
204-
- `check.poetry`: Check poetry config files.
205-
- `check.test`: Check the tests with pytest.
206-
- `check.type`: Check the types with mypy.
207-
- `clean.all (clean)`: Run all clean tasks.
208-
- `clean.coverage`: Clean coverage files.
209-
- `clean.dist`: Clean the dist folder.
210-
- `clean.docs`: Clean the docs folder.
211-
- `clean.install`: Clean the install.
212-
- `clean.mypy`: Clean the mypy folder.
213-
- `clean.outputs`: Clean the outputs folder.
214-
- `clean.pytest`: Clean the pytest folder.
215-
- `clean.python`: Clean python files and folders.
216-
- `clean.reset`: Reset the project state.
217-
- `dag.all (dag)`: Run all DAG tasks.
218-
- `dag.job`: Run the project for the given job name.
219-
- `docker.all (docker)`: Run all docker tasks.
220-
- `docker.build`: Build the docker image.
221-
- `docker.run`: Run the docker image.
198+
- `checks.all (checks)`: Run all check tasks.
199+
- `checks.code`: Check the codes with pylint.
200+
- `checks.coverage`: Check the coverage with coverage.
201+
- `checks.format`: Check the formats with isort and black.
202+
- `checks.poetry`: Check poetry config files.
203+
- `checks.test`: Check the tests with pytest.
204+
- `checks.type`: Check the types with mypy.
205+
- `cleans.all (cleans)`: Run all clean tasks.
206+
- `cleans.coverage`: Clean coverage files.
207+
- `cleans.dist`: Clean the dist folder.
208+
- `cleans.docs`: Clean the docs folder.
209+
- `cleans.install`: Clean the install.
210+
- `cleans.mypy`: Clean the mypy folder.
211+
- `cleans.outputs`: Clean the outputs folder.
212+
- `cleans.pytest`: Clean the pytest folder.
213+
- `cleans.python`: Clean python files and folders.
214+
- `cleans.reset`: Reset the project state.
215+
- `containers.all (containers)`: Run all container tasks.
216+
- `containers.build`: Build the container image.
217+
- `containers.run`: Run the container image.
218+
- `dags.all (dags)`: Run all DAG tasks.
219+
- `dags.job`: Run the project for the given job name.
222220
- `docs.all (docs)`: Run all docs tasks.
223221
- `docs.api`: Document the API with pdoc.
224222
- `docs.serve`: Document the API with pdoc.
225-
- `format.all (format)`: Run all format tasks.
226-
- `format.imports`: Format code imports with isort.
227-
- `format.sources`: Format code sources with black.
228-
- `install.all (install)`: Run all install tasks.
229-
- `install.poetry`: Run poetry install.
230-
- `install.pre-commit`: Run pre-commit install.
231-
- `package.all (package)`: Run all package tasks.
232-
- `package.build`: Build a wheel package.
223+
- `formats.all (formats)`: Run all format tasks.
224+
- `formats.imports`: Format code imports with isort.
225+
- `formats.sources`: Format code sources with black.
226+
- `installs.all (installs)`: Run all install tasks.
227+
- `installs.poetry`: Run poetry install.
228+
- `installs.pre-commit`: Run pre-commit install.
229+
- `packages.all (packages)`: Run all package tasks.
230+
- `packages.build`: Build a wheel package.
233231

234232
# Tools
235233

@@ -252,17 +250,6 @@ Pre-defined actions to automate your project development.
252250
- **Alternatives**:
253251
- [Git Hooks](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks): less convenient to use
254252

255-
### Release: [Bump2version](https://github.com/c4urself/bump2version)
256-
257-
- **Motivations**:
258-
- Easily change the package version
259-
- Can modify multiple files at once
260-
- Suited for [SemVer versioning](https://semver.org/)
261-
- **Limitations**:
262-
- https://xkcd.com/1319/
263-
- **Alternatives**:
264-
- Manual edits: less convenient, risk of forgetting a file
265-
266253
### Tasks: [PyInvoke](https://www.pyinvoke.org/)
267254

268255
- **Motivations**:
@@ -636,17 +623,17 @@ This sections gives some tips and tricks to enrich the develop experience.
636623

637624
**You should decouple the pointer to your data from how to access it.**
638625

639-
In your code, you can refer to your dataset with a tag (e.g., `inputs`, `target`).
626+
In your code, you can refer to your dataset with a tag (e.g., `inputs`, `targets`).
640627

641628
This tag can then be associated to an reader/writer implementation in a configuration file:
642629

643630
```yaml
644-
inputs:
645-
KIND: ParquetDataset
646-
path: data/inputs.parquet
647-
target:
648-
KIND: ParquetDataset
649-
path: data/target.parquet
631+
inputs:
632+
KIND: ParquetReader
633+
path: data/inputs.parquet
634+
targets:
635+
KIND: ParquetReader
636+
path: data/targets.parquet
650637
```
651638

652639
In this package, the implementation are described in `src/[package]/datasets.py` and selected by `KIND`.
@@ -681,7 +668,7 @@ This package provides a simple deterministic strategy implemented in `src/[packa
681668

682669
A DAG can express the dependencies between steps while keeping the individual step independent.
683670

684-
This package provides a simple DAG example in `tasks/dag.py`. This approach is based on [PyInvoke](https://www.pyinvoke.org/).
671+
This package provides a simple DAG example in `tasks/dags.py`. This approach is based on [PyInvoke](https://www.pyinvoke.org/).
685672

686673
In production, we recommend to use a scalable system such as [Airflow](https://airflow.apache.org/), [Dagster](https://dagster.io/), [Prefect](https://www.prefect.io/), [Metaflow](https://metaflow.org/), or [ZenML](https://zenml.io/).
687674

@@ -749,7 +736,7 @@ To build a Python package with Poetry, you simply have to type in a terminal:
749736
# for all poetry project
750737
poetry build
751738
# for this project only
752-
inv package
739+
inv packages
753740
```
754741

755742
## [Software Engineering](https://en.wikipedia.org/wiki/Software_engineering)
@@ -763,11 +750,11 @@ Python provides the [typing module](https://docs.python.org/3/library/typing.htm
763750
```python
764751
# in src/[package]/models.py
765752
@abc.abstractmethod
766-
def fit(self, inputs: schemas.Inputs, target: schemas.Target) -> "Model":
753+
def fit(self, inputs: schemas.Inputs, targets: schemas.Targets) -> "Model":
767754
"""Fit the model on the given inputs and target."""
768755
769756
@abc.abstractmethod
770-
def predict(self, inputs: schemas.Inputs) -> schemas.Output:
757+
def predict(self, inputs: schemas.Inputs) -> schemas.Outputs:
771758
"""Generate an output with the model for the given inputs."""
772759
```
773760

@@ -784,8 +771,8 @@ Pydantic allows to define classes that can validate your configs during the prog
784771
```python
785772
# in src/[package]/splitters.py
786773
class TrainTestSplitter(Splitter):
787-
ratio: float = 0.8
788-
shuffle: bool = True
774+
shuffle: bool = False # required (time sensitive)
775+
test_size: int | float = 24 * 30 * 2 # 2 months
789776
random_state: int = 42
790777
```
791778

@@ -802,19 +789,22 @@ Pandera supports dataframe typing for Pandas and other library like PySpark:
802789
```python
803790
# in src/package/schemas.py
804791
class InputsSchema(Schema):
805-
alcohol: papd.Series[float] = pa.Field(gt=0, lt=100)
806-
malic_acid: papd.Series[float] = pa.Field(gt=0, lt=10)
807-
ash: papd.Series[float] = pa.Field(gt=0, lt=10)
808-
alcalinity_of_ash: papd.Series[float] = pa.Field(gt=0, lt=100)
809-
magnesium: papd.Series[float] = pa.Field(gt=0, lt=1000)
810-
total_phenols: papd.Series[float] = pa.Field(gt=0, lt=10)
811-
flavanoids: papd.Series[float] = pa.Field(gt=0, lt=10)
812-
nonflavanoid_phenols: papd.Series[float] = pa.Field(gt=0, lt=10)
813-
proanthocyanins: papd.Series[float] = pa.Field(gt=0, lt=10)
814-
color_intensity: papd.Series[float] = pa.Field(gt=0, lt=100)
815-
hue: papd.Series[float] = pa.Field(gt=0, lt=10)
816-
od280_od315_of_diluted_bikes: papd.Series[float] = pa.Field(gt=0, lt=10)
817-
proline: papd.Series[float] = pa.Field(gt=0, lt=10000)
792+
instant: papd.Index[papd.UInt32] = pa.Field(ge=0, check_name=True)
793+
dteday: papd.Series[papd.DateTime] = pa.Field()
794+
season: papd.Series[papd.UInt8] = pa.Field(isin=[1, 2, 3, 4])
795+
yr: papd.Series[papd.UInt8] = pa.Field(ge=0, le=1)
796+
mnth: papd.Series[papd.UInt8] = pa.Field(ge=1, le=12)
797+
hr: papd.Series[papd.UInt8] = pa.Field(ge=0, le=23)
798+
holiday: papd.Series[papd.Bool] = pa.Field()
799+
weekday: papd.Series[papd.UInt8] = pa.Field(ge=0, le=6)
800+
workingday: papd.Series[papd.Bool] = pa.Field()
801+
weathersit: papd.Series[papd.UInt8] = pa.Field(ge=1, le=4)
802+
temp: papd.Series[papd.Float16] = pa.Field(ge=0, le=1)
803+
atemp: papd.Series[papd.Float16] = pa.Field(ge=0, le=1)
804+
hum: papd.Series[papd.Float16] = pa.Field(ge=0, le=1)
805+
windspeed: papd.Series[papd.Float16] = pa.Field(ge=0, le=1)
806+
casual: papd.Series[papd.UInt32] = pa.Field(ge=0)
807+
registered: papd.Series[papd.UInt32] = pa.Field(ge=0)
818808
```
819809

820810
This code snippet defines the fields of the dataframe and some of its constraint.
@@ -828,15 +818,11 @@ The package encourages to type every dataframe used in `src/[package]/schemas.py
828818
Polymorphism combined with SOLID Principles allows to easily swap your code components.
829819

830820
```python
831-
class Dataset(abc.ABC, pdt.BaseModel):
821+
class Reader(abc.ABC, pdt.BaseModel):
832822
833823
@abc.abstractmethod
834824
def read(self) -> pd.DataFrame:
835825
"""Read a dataframe from a dataset."""
836-
837-
@abc.abstractmethod
838-
def write(self, data: pd.DataFrame) -> None:
839-
"""Write a dataframe to a dataset."""
840826
```
841827

842828
This code snippet uses the [abc module](https://docs.python.org/3/library/abc.html) to define code interfaces for a dataset with a read/write method.

confs/inference.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ job:
55
path: data/inputs.parquet
66
outputs:
77
KIND: ParquetWriter
8-
path: outputs/outputs.parquet
9-
loader:
10-
KIND: JoblibLoader
11-
model_path: outputs/model.joblib
8+
path: outputs/predictions.parquet
9+
deserializer:
10+
KIND: JoblibModelDeserializer
11+
path: models/model.joblib

confs/training.yaml

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@ job:
33
inputs:
44
KIND: ParquetReader
55
path: data/inputs.parquet
6-
saver:
7-
KIND: JoblibSaver
8-
path: outputs/model.joblib
6+
targets:
7+
KIND: ParquetReader
8+
path: data/targets.parquet
9+
serializer:
10+
KIND: JoblibModelSerializer
11+
path: models/model.joblib

confs/tuning.yaml

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@ job:
33
inputs:
44
KIND: ParquetReader
55
path: data/inputs.parquet
6-
outputs:
7-
KIND: CSVWriter
8-
path: outputs/results.csv
6+
targets:
7+
KIND: ParquetReader
8+
path: data/targets.parquet
9+
results:
10+
KIND: ParquetWriter
11+
path: outputs/results.parquet

data/outputs.parquet

-126 KB
Binary file not shown.

invoke.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
# https://docs.pyinvoke.org/en/latest/index.html
22

3-
# invoke
43
run:
54
echo: true
6-
# project
75
project:
86
name: bikes

models/.gitkeep

Whitespace-only changes.

0 commit comments

Comments
 (0)