Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions APIs/openEO/EOAP-CWL.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
title: "EOAP CWL"
execute:
echo: false
jupyter: python3
aliases:
- /EOAP-CWL.html
---

With openEO, it is now possible to run CWL ([Common Workflow Language](https://www.commonwl.org/)) in the `run_udf` process.
Workflows will be executed using Calrissian on Kubernetes.
First read the general CWl documentation from `openeo-geopyspark-driver`: [udf-eoap-cwl.md](https://github.com/Open-EO/openeo-geopyspark-driver/blob/master/docs/udf-eoap-cwl.md)
On the CDSE backend, there are some extra features, which are described on this page.


## S3 access

CWL workflows running on this backend will receive short-lived S3 credentials with read-only access to the `eodata` bucket on CDSE.
Those credentials will be available in the following environment variables:

- `AWS_ENDPOINT_URL_S3`
- `AWS_ACCESS_KEY_ID`
- `AWS_SECRET_ACCESS_KEY`

They only work inside the cluster environment, and are only temporarily valid.
You can use these instead of your own credentials. This way, your Docker images can remain public.

## Docker images

CWL allows to use docker images to run code. For example:

```yaml
requirements:
- class: DockerRequirement
dockerPull: ghcr.io/cloudinsar/openeo_insar:20260219T1446
```

Only whitelisted Docker images can be used in the cluster. Contact us
through [support](https://helpcenter.dataspace.copernicus.eu/hc/en-gb/requests/new) if you have custom images that needs
to be whitelisted. (You might need to create an account)
As of February 2026, only Docker images that can be pulled without credentials are used.

## Memory limits

Increasing requested memory increases credit usage.
The maximum amount of memory available is deployment specific, but would be around 20Gb.
If your job gets stuck without being processed, consider lowering the requested memory.

## Debugging locally

To test your CWL workflow locally before running it on the cluster,
you can use [cwltool](https://pypi.org/project/cwltool/) locally.
You might need to provide your own S3 credentials. You can request them
here: https://documentation.dataspace.copernicus.eu/APIs/S3.html

```bash
cwltool \
--tmpdir-prefix=$HOME/tmp/ \
--force-docker-pull \
--no-read-only \
--parallel \
--preserve-environment=AWS_ENDPOINT_URL_S3 \
--preserve-environment=AWS_ACCESS_KEY_ID \
--preserve-environment=AWS_SECRET_ACCESS_KEY \
example_workflow.cwl example_parameters.json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a lot of mentioned options are not essential or even risky to use unless you know what you are doing. I would just have the essentials here and suggest some more advanced options seperately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed 2 unneeded parameters. Annoyingly, --tmpdir-prefix and --no-read-only are needed to be able to run for me.

```
2 changes: 2 additions & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,8 @@ website:
target: "_blank"
- href: "APIs/openEO/fair.qmd"
text: FAIR & open science
- href: "APIs/openEO/EOAP-CWL.qmd"
text: EOAP CWL
- section: "Sentinel Hub"
href: "APIs/SentinelHub.qmd"
contents:
Expand Down