You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+73-87Lines changed: 73 additions & 87 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,6 @@ You can use this package as part of your MLOps toolkit or platform (e.g., Model
21
21
-[Tools](#tools)
22
22
-[Automation](#automation-1)
23
23
-[Commit: Pre-Commit](#commit-pre-commit)
24
-
-[Release: Bump2version](#release-bump2version)
25
24
-[Tasks: PyInvoke](#tasks-pyinvoke)
26
25
-[CLI](#cli)
27
26
-[Parser: Argparse!](#parser-argparse)
@@ -95,7 +94,7 @@ This section details the requirements, actions, and next steps to kickstart your
95
94
## Prerequisites
96
95
97
96
-[Python>=3.12](https://www.python.org/downloads/) (to benefit from [the latest features and performance improvements](https://docs.python.org/3/whatsnew/3.12.html))
98
-
-[Poetry>=1.5.1](https://python-poetry.org/) (to initialize the project [virtual environment](https://docs.python.org/3/library/venv.html) and its dependencies)
97
+
-[Poetry>=1.7.1](https://python-poetry.org/) (to initialize the project [virtual environment](https://docs.python.org/3/library/venv.html) and its dependencies)
99
98
100
99
## Installation
101
100
@@ -134,18 +133,20 @@ You can add or edit config files in the `confs/` folder to change the program be
134
133
job:
135
134
KIND: TrainingJob
136
135
inputs:
137
-
KIND: ParquetDataset
136
+
KIND: ParquetReader
138
137
path: data/inputs.parquet
139
-
target:
140
-
KIND: ParquetDataset
141
-
path: data/target.parquet
142
-
output_model: outputs/model.joblib
138
+
targets:
139
+
KIND: ParquetReader
140
+
path: data/targets.parquet
141
+
serializer:
142
+
KIND: JoblibModelSerializer
143
+
path: models/model.joblib
143
144
```
144
145
145
146
This config file instructs the program to start a `TrainingJob` with 3 parameters:
146
147
- `inputs`: dataset that contains the model inputs
147
-
- `target`: dataset that contains the model target
148
-
- `output_model`: output path to the model artifact
148
+
- `targets`: dataset that contains the model target
149
+
- `serializer`: output path to the model artifact
149
150
150
151
You can find all the parameters of your program in the `src/[package]/jobs.py`.
151
152
@@ -156,7 +157,6 @@ The project code can be executed with poetry during your development:
156
157
```bash
157
158
$ poetry run [package] confs/tuning.yaml
158
159
$ poetry run [package] confs/training.yaml
159
-
$ poetry run [package] confs/transition.yaml
160
160
$ poetry run [package] confs/inference.yaml
161
161
```
162
162
@@ -166,7 +166,7 @@ In production, you can build, ship, and run the project as a Python package:
166
166
poetry build
167
167
poetry publish # optional
168
168
python -m pip install [package]
169
-
[package] confs/transition.yaml
169
+
[package] confs/inference.yaml
170
170
```
171
171
172
172
You can also install and use this package as a library for another AI/ML project:
@@ -187,49 +187,47 @@ You can invoke the actions from the [command-line](https://www.pyinvoke.org/) or
187
187
188
188
```bash
189
189
# execute the project DAG
190
-
$ inv dag
190
+
$ inv dags
191
191
# create a code archive
192
-
$ inv package
192
+
$ inv packages
193
193
# list other actions
194
194
$ inv --list
195
195
```
196
196
197
197
**Available tasks**:
198
-
- `bump.release (bump)`: Bump a release: major, minor, patch.
199
-
- `bump.version`: Bump to the new version.
200
-
- `check.all (check)`: Run all check tasks.
201
-
- `check.code`: Check the codes with pylint.
202
-
- `check.coverage`: Check the coverage with coverage.
203
-
- `check.format`: Check the formats with isort and black.
204
-
- `check.poetry`: Check poetry config files.
205
-
- `check.test`: Check the tests with pytest.
206
-
- `check.type`: Check the types with mypy.
207
-
- `clean.all (clean)`: Run all clean tasks.
208
-
- `clean.coverage`: Clean coverage files.
209
-
- `clean.dist`: Clean the dist folder.
210
-
- `clean.docs`: Clean the docs folder.
211
-
- `clean.install`: Clean the install.
212
-
- `clean.mypy`: Clean the mypy folder.
213
-
- `clean.outputs`: Clean the outputs folder.
214
-
- `clean.pytest`: Clean the pytest folder.
215
-
- `clean.python`: Clean python files and folders.
216
-
- `clean.reset`: Reset the project state.
217
-
- `dag.all (dag)`: Run all DAG tasks.
218
-
- `dag.job`: Run the project for the given job name.
219
-
- `docker.all (docker)`: Run all docker tasks.
220
-
- `docker.build`: Build the docker image.
221
-
- `docker.run`: Run the docker image.
198
+
- `checks.all (checks)`: Run all check tasks.
199
+
- `checks.code`: Check the codes with pylint.
200
+
- `checks.coverage`: Check the coverage with coverage.
201
+
- `checks.format`: Check the formats with isort and black.
202
+
- `checks.poetry`: Check poetry config files.
203
+
- `checks.test`: Check the tests with pytest.
204
+
- `checks.type`: Check the types with mypy.
205
+
- `cleans.all (cleans)`: Run all clean tasks.
206
+
- `cleans.coverage`: Clean coverage files.
207
+
- `cleans.dist`: Clean the dist folder.
208
+
- `cleans.docs`: Clean the docs folder.
209
+
- `cleans.install`: Clean the install.
210
+
- `cleans.mypy`: Clean the mypy folder.
211
+
- `cleans.outputs`: Clean the outputs folder.
212
+
- `cleans.pytest`: Clean the pytest folder.
213
+
- `cleans.python`: Clean python files and folders.
214
+
- `cleans.reset`: Reset the project state.
215
+
- `containers.all (containers)`: Run all container tasks.
216
+
- `containers.build`: Build the container image.
217
+
- `containers.run`: Run the container image.
218
+
- `dags.all (dags)`: Run all DAG tasks.
219
+
- `dags.job`: Run the project for the given job name.
222
220
- `docs.all (docs)`: Run all docs tasks.
223
221
- `docs.api`: Document the API with pdoc.
224
222
- `docs.serve`: Document the API with pdoc.
225
-
- `format.all (format)`: Run all format tasks.
226
-
- `format.imports`: Format code imports with isort.
227
-
- `format.sources`: Format code sources with black.
228
-
- `install.all (install)`: Run all install tasks.
229
-
- `install.poetry`: Run poetry install.
230
-
- `install.pre-commit`: Run pre-commit install.
231
-
- `package.all (package)`: Run all package tasks.
232
-
- `package.build`: Build a wheel package.
223
+
- `formats.all (formats)`: Run all format tasks.
224
+
- `formats.imports`: Format code imports with isort.
225
+
- `formats.sources`: Format code sources with black.
226
+
- `installs.all (installs)`: Run all install tasks.
227
+
- `installs.poetry`: Run poetry install.
228
+
- `installs.pre-commit`: Run pre-commit install.
229
+
- `packages.all (packages)`: Run all package tasks.
230
+
- `packages.build`: Build a wheel package.
233
231
234
232
# Tools
235
233
@@ -252,17 +250,6 @@ Pre-defined actions to automate your project development.
252
250
- **Alternatives**:
253
251
- [Git Hooks](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks): less convenient to use
- Suited for [SemVer versioning](https://semver.org/)
261
-
- **Limitations**:
262
-
- https://xkcd.com/1319/
263
-
- **Alternatives**:
264
-
- Manual edits: less convenient, risk of forgetting a file
265
-
266
253
### Tasks: [PyInvoke](https://www.pyinvoke.org/)
267
254
268
255
- **Motivations**:
@@ -636,17 +623,17 @@ This sections gives some tips and tricks to enrich the develop experience.
636
623
637
624
**You should decouple the pointer to your data from how to access it.**
638
625
639
-
In your code, you can refer to your dataset with a tag (e.g., `inputs`, `target`).
626
+
In your code, you can refer to your dataset with a tag (e.g., `inputs`, `targets`).
640
627
641
628
This tag can then be associated to an reader/writer implementation in a configuration file:
642
629
643
630
```yaml
644
-
inputs:
645
-
KIND: ParquetDataset
646
-
path: data/inputs.parquet
647
-
target:
648
-
KIND: ParquetDataset
649
-
path: data/target.parquet
631
+
inputs:
632
+
KIND: ParquetReader
633
+
path: data/inputs.parquet
634
+
targets:
635
+
KIND: ParquetReader
636
+
path: data/targets.parquet
650
637
```
651
638
652
639
In this package, the implementation are described in `src/[package]/datasets.py` and selected by `KIND`.
@@ -681,7 +668,7 @@ This package provides a simple deterministic strategy implemented in `src/[packa
681
668
682
669
A DAG can express the dependencies between steps while keeping the individual step independent.
683
670
684
-
This package provides a simple DAG example in `tasks/dag.py`. This approach is based on [PyInvoke](https://www.pyinvoke.org/).
671
+
This package provides a simple DAG example in `tasks/dags.py`. This approach is based on [PyInvoke](https://www.pyinvoke.org/).
685
672
686
673
In production, we recommend to use a scalable system such as [Airflow](https://airflow.apache.org/), [Dagster](https://dagster.io/), [Prefect](https://www.prefect.io/), [Metaflow](https://metaflow.org/), or [ZenML](https://zenml.io/).
687
674
@@ -749,7 +736,7 @@ To build a Python package with Poetry, you simply have to type in a terminal:
0 commit comments