This repository is an exploration over Dask for UNICAMP's MO837 class. It is organized into different branches, where each branch represents a part of the exploration itself.
This application executes a Dask workflow using a distributed cluster. In this section you'll learn how to setup the dependencies, install the application requirements, launch the cluster, and execute the application
To execute the application you need to install the following dependencies:
After installing all dependencies, you can run the following command (at the root folder of this repository):
poetry installThis will install all Python requirements for this application
To launch the cluster, you must first builder the Docker image with the following command:
poetry run hpccm --recipe cluster/recipe.py --format docker > cluster/DockerfileNow that you have the Dockerfile built, you can execute the following command to launch your cluster:
docker-compose up -d --scale worker=<num_workers>Note
You can set the num_workers variable to whatever number pleases you. This will be used by docker-compose to scale the number of workers
Now that everything is properly working, you can run the application with the following command:
poetry run python src/app.py --scheduler_url localhost:9000Note
If you don't want to execute it on the cluster, you can simply don't pass anything for the
--scheduler_urlargument
Congrats! You were able to execute a Dask workflow using a distributed approach 🥳