From da9cc912808ea9074a4930e399fb76692bf62893 Mon Sep 17 00:00:00 2001 From: Peter Vos Date: Mon, 15 Dec 2025 13:52:08 +0100 Subject: [PATCH 1/8] add python and analysis guide --- manuals/yoda/_sidebar.yml | 2 + .../yoda/data_access/yoda_using_python.qmd | 14 +++ manuals/yoda/using_yoda/analysing_data.qmd | 101 ++++++++++++++++++ 3 files changed, 117 insertions(+) create mode 100644 manuals/yoda/data_access/yoda_using_python.qmd create mode 100644 manuals/yoda/using_yoda/analysing_data.qmd diff --git a/manuals/yoda/_sidebar.yml b/manuals/yoda/_sidebar.yml index 257e698bf..a2a06c8aa 100644 --- a/manuals/yoda/_sidebar.yml +++ b/manuals/yoda/_sidebar.yml @@ -20,6 +20,7 @@ website: - manuals/yoda/using_yoda/workflow_metadata.qmd - manuals/yoda/using_yoda/properties_and_explanation.qmd - manuals/yoda/using_yoda/workflow_metadata_license.qmd + - manuals/yoda/using_yoda/analysing_data.qmd - section: Securing and Distributing Data contents: - manuals/yoda/securing_distribution/vault_archive.qmd @@ -34,6 +35,7 @@ website: - manuals/yoda/data_access/yoda_using_cyberduck.qmd - manuals/yoda/data_access/yoda_using_cyberduck_cryptometer.qmd - manuals/yoda/data_access/yoda_using_icommands.qmd + - manuals/yoda/data_access/yoda_using_python.qmd - manuals/yoda/data_access/yoda_using_rclone.qmd - manuals/yoda/data_access/yoda_using_webdrive.qmd - manuals/yoda/data_access/yoda_using_windowsexplorer.qmd diff --git a/manuals/yoda/data_access/yoda_using_python.qmd b/manuals/yoda/data_access/yoda_using_python.qmd new file mode 100644 index 000000000..ba5a5dfd5 --- /dev/null +++ b/manuals/yoda/data_access/yoda_using_python.qmd @@ -0,0 +1,14 @@ +--- +title: Using Python +categories: [] +description: "This page explains how to transfer data using Python scripting." +--- +Data in Yoda is not directly accessible, you have to download data to the machine that contains your analysis software first. If you do your analysis with Python scripts anyway, for example on [Snellius](/topics/snellius.qmd) or [Ada](/topics/ada.qmd), it can be useful to script the data transfer as well. + +## Python iRODS Client (PRC) + + +## iBridges + + +## Advanced: iRODS metadata and rules \ No newline at end of file diff --git a/manuals/yoda/using_yoda/analysing_data.qmd b/manuals/yoda/using_yoda/analysing_data.qmd new file mode 100644 index 000000000..e4ea1ca95 --- /dev/null +++ b/manuals/yoda/using_yoda/analysing_data.qmd @@ -0,0 +1,101 @@ +--- +title: Analysing Data +categories: [] +description: "This page explains how to run analysis software on your data in Yoda." +--- +Yoda is a data management solution and not explicitly meant for analysing data. However, this does not mean that you cannot analyse data that is stored in Yoda. On this page, we highlight example workflows for analysing data that is stored in Yoda. + +## What type of analysis can you do in Yoda, and where to run it? +Before you decide on the best workflow for your use case, you should ask yourself: + +- Which type of analysis will I run? + +- Is this task suitable to run on a personal computer (PC)? + +If your analysis cannot be run on your PC, for example because your dataset is too large and you do not have enough storage, or your computing requirements are too heavy and the processing capacity of your machine is not big enough, you should think about using other analysis platforms: a VRE (Virtual Research Environment) such as [SciCloud](/topics/scicloud.qmd) or [Research Cloud](/topics/researchcloud.qmd); or a high-performance computing facility (HPC), such as [ADA](/topics/ada.qmd) or [Snellius](/topics/snellius.qmd). + +Below we discuss three possible workflows to work with data stored in Yoda: + +- Downloading files from Yoda, performing the analysis, and uploading the results to Yoda again. + +- Mounting the Network Drive and performing the analysis on the device on which the Network Drive is mounted. + +- Streaming data in memory, without having to download the data from Yoda. + +## Workflow: downloading files and folders + +Suitable for: +- Analysis system: PC, VRE, HPC +- Data: All file and folder sizes, assuming there is enough storage on the analysis system + +In this workflow, you download the files and folders that you want to analyse from Yoda to the system where you plan to run the analysis, i.e. you create a working copy of your data. You run the analysis on the system, and afterwards upload the data and/or results back to Yoda. You can also safely remove your working copy again, since the source data stays untouched in Yoda. In this way you can save storage space on the analysis system. + +The main reason for choosing this method is that it is relatively straightforward, and it will give you good performance when reading your file in your analysis script. + +There are several ways in which you can download and upload the files: + +- Via the Yoda web portal + This can be done if you have an internet browser available (e.g., your PC and some VREs). You could choose this option when you are already familiar with the web portal, or when you do not want to install additional tools on your system. However, this method is not very reliable when transferring large files. Also, the web portal will not give you clear feedback on whether a download was completed correctly. + +- Via a WebDAV client + This method is more suitable for a larger amount of files (up to several thounands files and a 100GB of data). The [data access introduction page](../data_access/introduction.qmd) lists the options on different operating systems. + +- Using [iCommands or GoCommands](../data_access/yoda_using_icommands.qmd) + These command line tools provide slightly better performance for data transfer compared to iBridges, and also offer many features for working with metadata. It is also possible to check the integrity of uploaded and downloaded files. + +- Using iBridges or the Python iRODS Client + If you use Python for your analysis, you could include transfer of the source data and results in your scripts. This way you can automate data management and avoid duplicates or temporary data. For some workflows it is also possible to access a file directly by streaming, [see below](#workflow-streaming). + +::: {.callout-tip} +- Make sure you have a good internet connection when you download (large) files to your PC and when you upload your results to Yoda. Regardless of the method you choose, this will be the biggest determinant of download speed. On HPC and VRE systems, the connections should be very good. + +- Treat the downloaded files as a temporary working copy and make sure to remove them whenever they are not needed anymore. In this way, you make sure the version of the file on Yoda is the ‘ground truth’ version of your data and prevents the creation of copies of copies that might go out of sync. Automate the downloading of files, removal of temporary copies, and uploading of output as much as possible. This improves the reproducibility of results and reduces the potential of human error. + +- Use iBridges or iCommands to (automatically) add file-level metadata to your files on Yoda when you upload them (e.g. file version, experimental condition, etc.). This way, you can keep your project organised. Note that metadata to describe the to-be-archived data package as a whole should be added via the web portal. +::: + +## Workflow: mount with Network Disk + +Suitable for: + +- Analysis system: PC, VRE and some HPC systems +- Data: small operations on small files only + +Yoda can be mounted as a Network Disk on your system via the WebDAV protocol. The main advantage is that this method allows you to see the files in your file explorer as if they are on your computer. You can then perform your analysis on the analysis system as if the files were stored locally. + +- On Windows using [Windows Explorer](../data_access/yoda_using_windowsexplorer.qmd) or [WebDrive](../data_access/yoda_using_webdrive.qmd) +- On MacOS using [Finder](../data_access/data_access_macos.qmd#mounting-the-yoda-webdav-in-finder) +- On Linux using [Gnome Files](../data_access/data_access_linux.qmd#gnome-files) or similar. + +However, we only recommend working with this method if you are working with a small number of small files (few MBs), or if you just want to browse files and folders. This is because when working with larger files, performance of operations like reading and writing files will be slow and can greatly increase the runtime of your analysis. In certain cases, you might run into errors because of this. When you make changes to a file or create a new file on Yoda, this method does not provide clear feedback about the ‘upload’ of those changes. If you interrupt the upload (e.g. by shutting down your PC), the changes might be lost. Since the files can be easily opened by an editor you also risk that you might change files on Yoda by accident. + +::: {.callout-tip} +- Only use this method for small file sizes and small folders. + +- Be careful when you create new files or make changes to files: wait long enough and double-check the integrity of the files and whether the data has been stored properly on Yoda (e.g. via the Yoda portal). + +- Make sure only one person at a time is working on the data to prevent conflicts. +::: + +## Workflow: streaming + +Suitable for: + +- Analysis system: PC, VRE, HPC +- Data analysis: When you use Python for your analysis + +Streaming is a more advanced method to analyse data in Yoda. Using iBridges in Python or the Python iRODS client, it is possible to directly load data into memory without having to download it to the analysis system. The main advantage of this method is that you do not create new copies of the data that you later have to remove, and your workflow becomes a lot cleaner. Streaming is especially useful when your data is organised in larger files and you only need extracts, i.e. you do not need all the content. Another use case for streaming is when you need to combine/append the content of many small files for your analysis. + +Output of your scripts can also be streamed directly to Yoda along with metadata. That means you do not need to first create a local file which contains the output, but you can directly create a file on Yoda and “stream” the output into that file. + +::: {.callout-tip} +- This workflow is mainly intended for researchers who work programmatically with their data. + +- Make sure you have a stable internet connection when streaming data in or out of Yoda. The amount of data that can be streamed depends on the working memory of the system you are streaming into/from. + +- Use iBridges or iCommands to (automatically) add file-level metadata (e.g., file version, experimental condition) to your files on Yoda after you created and streamed the content into the new files. This way, you can keep your project organised. Note that metadata to describe the to-be-archived data package as a whole should be added via the web portal. + +- The streaming option in iBridges or the iCommands does not verify that the content of the data is correct. Inspect the received or sent data by checking its size or content. + +- For certain data like audio, video or spreadsheets, specific python libraries exist with which you can navigate to the part of the data stream you want to analyse. +::: \ No newline at end of file From f2311d895e20b88c39feb7b7fce7742ca5eda00f Mon Sep 17 00:00:00 2001 From: Peter Vos Date: Mon, 15 Dec 2025 16:46:17 +0100 Subject: [PATCH 2/8] Improve readability --- .../yoda/data_access/yoda_using_python.qmd | 3 +- manuals/yoda/using_yoda/analysing_data.qmd | 115 ++++++++++-------- 2 files changed, 64 insertions(+), 54 deletions(-) diff --git a/manuals/yoda/data_access/yoda_using_python.qmd b/manuals/yoda/data_access/yoda_using_python.qmd index ba5a5dfd5..f1d846e8f 100644 --- a/manuals/yoda/data_access/yoda_using_python.qmd +++ b/manuals/yoda/data_access/yoda_using_python.qmd @@ -9,6 +9,7 @@ Data in Yoda is not directly accessible, you have to download data to the machin ## iBridges +https://ibridges.readthedocs.io/en/stable/quickstart.html -## Advanced: iRODS metadata and rules \ No newline at end of file +## Advanced: iRODS metadata and rules diff --git a/manuals/yoda/using_yoda/analysing_data.qmd b/manuals/yoda/using_yoda/analysing_data.qmd index e4ea1ca95..3849b3fbb 100644 --- a/manuals/yoda/using_yoda/analysing_data.qmd +++ b/manuals/yoda/using_yoda/analysing_data.qmd @@ -3,99 +3,108 @@ title: Analysing Data categories: [] description: "This page explains how to run analysis software on your data in Yoda." --- -Yoda is a data management solution and not explicitly meant for analysing data. However, this does not mean that you cannot analyse data that is stored in Yoda. On this page, we highlight example workflows for analysing data that is stored in Yoda. -## What type of analysis can you do in Yoda, and where to run it? +Yoda is a data management solution and not explicitly meant for analysing data. However, this does not mean that you cannot analyse data that is stored in Yoda. On this page, we highlight example workflows for analysing data that is stored in Yoda. + +## Where to run your analysis + Before you decide on the best workflow for your use case, you should ask yourself: - -- Which type of analysis will I run? -- Is this task suitable to run on a personal computer (PC)? +- Which type of analysis will I run? -If your analysis cannot be run on your PC, for example because your dataset is too large and you do not have enough storage, or your computing requirements are too heavy and the processing capacity of your machine is not big enough, you should think about using other analysis platforms: a VRE (Virtual Research Environment) such as [SciCloud](/topics/scicloud.qmd) or [Research Cloud](/topics/researchcloud.qmd); or a high-performance computing facility (HPC), such as [ADA](/topics/ada.qmd) or [Snellius](/topics/snellius.qmd). +- Is this task suitable to run on a personal computer (PC)? + +If your analysis cannot be run on your PC, for example because your dataset is too large and you do not have enough storage, or your computing requirements are too heavy and the processing capacity of your machine is not big enough, you should think about using other analysis platforms: a VRE (Virtual Research Environment) such as [SciCloud](/topics/scicloud.qmd) or [Research Cloud](/topics/researchcloud.qmd); or a high-performance computing facility (HPC), such as [ADA](/topics/ada.qmd) or [Snellius](/topics/snellius.qmd). Below we discuss three possible workflows to work with data stored in Yoda: -- Downloading files from Yoda, performing the analysis, and uploading the results to Yoda again. +1. Downloading files from Yoda, performing the analysis, and uploading the results to Yoda again. -- Mounting the Network Drive and performing the analysis on the device on which the Network Drive is mounted. +2. Mounting the Network Drive and performing the analysis on the device on which the Network Drive is mounted. -- Streaming data in memory, without having to download the data from Yoda. +3. Streaming data in memory, without having to download the data from Yoda. -## Workflow: downloading files and folders +## Workflow: downloading files and folders -Suitable for: -- Analysis system: PC, VRE, HPC -- Data: All file and folder sizes, assuming there is enough storage on the analysis system +> Suitable for: +> +> - Analysis system: PC, VRE, HPC +> +> - Data: All file and folder sizes, assuming there is enough storage on the analysis system -In this workflow, you download the files and folders that you want to analyse from Yoda to the system where you plan to run the analysis, i.e. you create a working copy of your data. You run the analysis on the system, and afterwards upload the data and/or results back to Yoda. You can also safely remove your working copy again, since the source data stays untouched in Yoda. In this way you can save storage space on the analysis system. +In this workflow, you download the files and folders that you want to analyse from Yoda to the system where you plan to run the analysis, i.e. you create a working copy of your data. You run the analysis on the system, and afterwards upload the data and/or results back to Yoda. You can also safely remove your working copy again, since the source data stays untouched in Yoda. In this way you can save storage space on the analysis system. -The main reason for choosing this method is that it is relatively straightforward, and it will give you good performance when reading your file in your analysis script. +The main reason for choosing this method is that it is relatively straightforward, and it will give you good performance when reading your file in your analysis script. -There are several ways in which you can download and upload the files: +There are several ways in which you can download and upload the files: -- Via the Yoda web portal - This can be done if you have an internet browser available (e.g., your PC and some VREs). You could choose this option when you are already familiar with the web portal, or when you do not want to install additional tools on your system. However, this method is not very reliable when transferring large files. Also, the web portal will not give you clear feedback on whether a download was completed correctly. +- Via the Yoda web portal This can be done if you have an internet browser available (e.g., your PC and some VREs). You could choose this option when you are already familiar with the web portal, or when you do not want to install additional tools on your system. However, this method is not very reliable when transferring large files. Also, the web portal will not give you clear feedback on whether a download was completed correctly. -- Via a WebDAV client - This method is more suitable for a larger amount of files (up to several thounands files and a 100GB of data). The [data access introduction page](../data_access/introduction.qmd) lists the options on different operating systems. +- Via a WebDAV client This method is more suitable for a larger amount of files (up to several thounands files and a 100GB of data). The [data access introduction page](../data_access/introduction.qmd) lists the options on different operating systems. -- Using [iCommands or GoCommands](../data_access/yoda_using_icommands.qmd) - These command line tools provide slightly better performance for data transfer compared to iBridges, and also offer many features for working with metadata. It is also possible to check the integrity of uploaded and downloaded files. +- Using [iCommands or GoCommands](../data_access/yoda_using_icommands.qmd). + These command line tools provide slightly better performance for data transfer compared to iBridges, and also offer many features for working with metadata. It is also possible to check the integrity of uploaded and downloaded files, see the [ichksum command](https://docs.irods.org/4.3.4/icommands/user/). -- Using iBridges or the Python iRODS Client - If you use Python for your analysis, you could include transfer of the source data and results in your scripts. This way you can automate data management and avoid duplicates or temporary data. For some workflows it is also possible to access a file directly by streaming, [see below](#workflow-streaming). +- Using [iBridges or the Python iRODS Client](../data_access/yoda_using_python.qmd). If you use Python for your analysis, you could include transfer of the source data and results in your scripts. This way you can automate data management and avoid duplicates or temporary data. For some workflows it is also possible to access a file directly by streaming, [see below](#workflow-streaming). -::: {.callout-tip} -- Make sure you have a good internet connection when you download (large) files to your PC and when you upload your results to Yoda. Regardless of the method you choose, this will be the biggest determinant of download speed. On HPC and VRE systems, the connections should be very good. +::: callout-tip +## Tips -- Treat the downloaded files as a temporary working copy and make sure to remove them whenever they are not needed anymore. In this way, you make sure the version of the file on Yoda is the ‘ground truth’ version of your data and prevents the creation of copies of copies that might go out of sync. Automate the downloading of files, removal of temporary copies, and uploading of output as much as possible. This improves the reproducibility of results and reduces the potential of human error. +- Make sure you have a good internet connection when you download (large) files to your PC and when you upload your results to Yoda. Regardless of the method you choose, this will be the biggest determinant of download speed. On HPC and VRE systems, the connections should be very good. -- Use iBridges or iCommands to (automatically) add file-level metadata to your files on Yoda when you upload them (e.g. file version, experimental condition, etc.). This way, you can keep your project organised. Note that metadata to describe the to-be-archived data package as a whole should be added via the web portal. -::: +- Treat the downloaded files as a temporary working copy and make sure to remove them whenever they are not needed anymore. In this way, you make sure the version of the file on Yoda is the ‘ground truth’ version of your data and prevents the creation of copies of copies that might go out of sync. Automate the downloading of files, removal of temporary copies, and uploading of output as much as possible. This improves the reproducibility of results and reduces the potential of human error. -## Workflow: mount with Network Disk +- Use iBridges or iCommands to (automatically) add file-level metadata to your files on Yoda when you upload them (e.g. file version, experimental condition, etc.). This way, you can keep your project organised. Note that metadata to describe the to-be-archived data package as a whole should be added via the web portal. +::: -Suitable for: +## Workflow: mount with Network Disk -- Analysis system: PC, VRE and some HPC systems -- Data: small operations on small files only +> Suitable for: +> +> - Analysis system: PC, VRE and some HPC systems +> +> - Data: small operations on small files only -Yoda can be mounted as a Network Disk on your system via the WebDAV protocol. The main advantage is that this method allows you to see the files in your file explorer as if they are on your computer. You can then perform your analysis on the analysis system as if the files were stored locally. +Yoda can be mounted as a Network Disk on your system via the WebDAV protocol. The main advantage is that this method allows you to see the files in your file explorer as if they are on your computer. You can then perform your analysis on the analysis system as if the files were stored locally. -- On Windows using [Windows Explorer](../data_access/yoda_using_windowsexplorer.qmd) or [WebDrive](../data_access/yoda_using_webdrive.qmd) -- On MacOS using [Finder](../data_access/data_access_macos.qmd#mounting-the-yoda-webdav-in-finder) -- On Linux using [Gnome Files](../data_access/data_access_linux.qmd#gnome-files) or similar. +- On Windows using [Windows Explorer](../data_access/yoda_using_windowsexplorer.qmd) or [WebDrive](../data_access/yoda_using_webdrive.qmd) +- On MacOS using [Finder](../data_access/data_access_macos.qmd#mounting-the-yoda-webdav-in-finder) +- On Linux using [Gnome Files](../data_access/data_access_linux.qmd#gnome-files) or similar. However, we only recommend working with this method if you are working with a small number of small files (few MBs), or if you just want to browse files and folders. This is because when working with larger files, performance of operations like reading and writing files will be slow and can greatly increase the runtime of your analysis. In certain cases, you might run into errors because of this. When you make changes to a file or create a new file on Yoda, this method does not provide clear feedback about the ‘upload’ of those changes. If you interrupt the upload (e.g. by shutting down your PC), the changes might be lost. Since the files can be easily opened by an editor you also risk that you might change files on Yoda by accident. -::: {.callout-tip} -- Only use this method for small file sizes and small folders. +::: callout-tip +## Tips -- Be careful when you create new files or make changes to files: wait long enough and double-check the integrity of the files and whether the data has been stored properly on Yoda (e.g. via the Yoda portal). +- Only use this method for small file sizes and small folders. -- Make sure only one person at a time is working on the data to prevent conflicts. -::: +- Be careful when you create new files or make changes to files: wait long enough and double-check the integrity of the files and whether the data has been stored properly on Yoda (e.g. via the Yoda portal). -## Workflow: streaming +- Make sure only one person at a time is working on the data to prevent conflicts. +::: -Suitable for: +## Workflow: streaming {#workflow-streaming} -- Analysis system: PC, VRE, HPC -- Data analysis: When you use Python for your analysis +> Suitable for: +> +> - Analysis system: PC, VRE, HPC +> +> - Data analysis: When you use Python for your analysis -Streaming is a more advanced method to analyse data in Yoda. Using iBridges in Python or the Python iRODS client, it is possible to directly load data into memory without having to download it to the analysis system. The main advantage of this method is that you do not create new copies of the data that you later have to remove, and your workflow becomes a lot cleaner. Streaming is especially useful when your data is organised in larger files and you only need extracts, i.e. you do not need all the content. Another use case for streaming is when you need to combine/append the content of many small files for your analysis. +Streaming is a more advanced method to analyse data in Yoda. Using iBridges in Python or the Python iRODS client, it is possible to directly load data into memory without having to download it to the analysis system. The main advantage of this method is that you do not create new copies of the data that you later have to remove, and your workflow becomes a lot cleaner. Streaming is especially useful when your data is organised in larger files and you only need extracts, i.e. you do not need all the content. Another use case for streaming is when you need to combine/append the content of many small files for your analysis. Output of your scripts can also be streamed directly to Yoda along with metadata. That means you do not need to first create a local file which contains the output, but you can directly create a file on Yoda and “stream” the output into that file. -::: {.callout-tip} -- This workflow is mainly intended for researchers who work programmatically with their data. +::: callout-tip +## Tips + +- This workflow is mainly intended for researchers who work programmatically with their data. -- Make sure you have a stable internet connection when streaming data in or out of Yoda. The amount of data that can be streamed depends on the working memory of the system you are streaming into/from. +- Make sure you have a stable internet connection when streaming data in or out of Yoda. The amount of data that can be streamed depends on the working memory of the system you are streaming into/from. -- Use iBridges or iCommands to (automatically) add file-level metadata (e.g., file version, experimental condition) to your files on Yoda after you created and streamed the content into the new files. This way, you can keep your project organised. Note that metadata to describe the to-be-archived data package as a whole should be added via the web portal. +- Add file- and folder-level [iRODS metadata](https://docs.irods.org/4.3.4/icommands/metadata/) (e.g., file version, experimental condition) to your files on Yoda after you created and streamed the content into the new files. This way, you can keep your project organised. Note that metadata to describe the to-be-archived data package as a whole should be added via the web portal. -- The streaming option in iBridges or the iCommands does not verify that the content of the data is correct. Inspect the received or sent data by checking its size or content. +- The streaming option in iBridges or the iCommands does not verify that the content of the data is correct. Inspect the received or sent data by checking its size or content. -- For certain data like audio, video or spreadsheets, specific python libraries exist with which you can navigate to the part of the data stream you want to analyse. +- For certain data like audio, video or spreadsheets, specific python libraries exist with which you can navigate to the part of the data stream you want to analyse. ::: \ No newline at end of file From 924261af8f4f5ecd1c05bf9e5127a7138dda31b6 Mon Sep 17 00:00:00 2001 From: Peter Vos Date: Mon, 15 Dec 2025 17:30:31 +0100 Subject: [PATCH 3/8] improve --- manuals/yoda/using_yoda/analysing_data.qmd | 64 +++++++++++----------- 1 file changed, 31 insertions(+), 33 deletions(-) diff --git a/manuals/yoda/using_yoda/analysing_data.qmd b/manuals/yoda/using_yoda/analysing_data.qmd index 3849b3fbb..39d8600f7 100644 --- a/manuals/yoda/using_yoda/analysing_data.qmd +++ b/manuals/yoda/using_yoda/analysing_data.qmd @@ -8,13 +8,13 @@ Yoda is a data management solution and not explicitly meant for analysing data. ## Where to run your analysis -Before you decide on the best workflow for your use case, you should ask yourself: +Before you decide on the best workflow for your use case, you should ask yourself: -- Which type of analysis will I run? +- Which type of analysis will I run? Will you use a desktop application or scripting? - Is this task suitable to run on a personal computer (PC)? -If your analysis cannot be run on your PC, for example because your dataset is too large and you do not have enough storage, or your computing requirements are too heavy and the processing capacity of your machine is not big enough, you should think about using other analysis platforms: a VRE (Virtual Research Environment) such as [SciCloud](/topics/scicloud.qmd) or [Research Cloud](/topics/researchcloud.qmd); or a high-performance computing facility (HPC), such as [ADA](/topics/ada.qmd) or [Snellius](/topics/snellius.qmd). +If your analysis cannot be run on your PC, for example because your dataset is too large and you do not have enough storage, or your computing requirements are too heavy and the processing capacity of your machine is not big enough, you should think about using other analysis platforms: a VRE (Virtual Research Environment) such as [SciCloud](/topics/scicloud.qmd) or [Research Cloud](/topics/researchcloud.qmd); The [VU compute hub](/topics/compute-hub.qmd); or a high-performance computing facility (HPC), such as [ADA](/topics/ada.qmd) or [Snellius](/topics/snellius.qmd). Below we discuss three possible workflows to work with data stored in Yoda: @@ -24,66 +24,64 @@ Below we discuss three possible workflows to work with data stored in Yoda: 3. Streaming data in memory, without having to download the data from Yoda. -## Workflow: downloading files and folders +## Workflow: mount with Network Disk > Suitable for: > -> - Analysis system: PC, VRE, HPC +> - Analysis system: PC, VRE with graphical interface > -> - Data: All file and folder sizes, assuming there is enough storage on the analysis system - -In this workflow, you download the files and folders that you want to analyse from Yoda to the system where you plan to run the analysis, i.e. you create a working copy of your data. You run the analysis on the system, and afterwards upload the data and/or results back to Yoda. You can also safely remove your working copy again, since the source data stays untouched in Yoda. In this way you can save storage space on the analysis system. - -The main reason for choosing this method is that it is relatively straightforward, and it will give you good performance when reading your file in your analysis script. - -There are several ways in which you can download and upload the files: - -- Via the Yoda web portal This can be done if you have an internet browser available (e.g., your PC and some VREs). You could choose this option when you are already familiar with the web portal, or when you do not want to install additional tools on your system. However, this method is not very reliable when transferring large files. Also, the web portal will not give you clear feedback on whether a download was completed correctly. +> - Data: small operations on small files only -- Via a WebDAV client This method is more suitable for a larger amount of files (up to several thounands files and a 100GB of data). The [data access introduction page](../data_access/introduction.qmd) lists the options on different operating systems. +Yoda can be mounted as a Network Disk on your system via the WebDAV protocol. The main advantage is that this method allows you to see the files in your file explorer as if they are on your computer. You can then perform your analysis on the analysis system as if the files were stored locally. -- Using [iCommands or GoCommands](../data_access/yoda_using_icommands.qmd). - These command line tools provide slightly better performance for data transfer compared to iBridges, and also offer many features for working with metadata. It is also possible to check the integrity of uploaded and downloaded files, see the [ichksum command](https://docs.irods.org/4.3.4/icommands/user/). +- On Windows using [Windows Explorer](../data_access/yoda_using_windowsexplorer.qmd) or [WebDrive](../data_access/yoda_using_webdrive.qmd) +- On MacOS using [Finder](../data_access/data_access_macos.qmd#mounting-the-yoda-webdav-in-finder) +- On Linux using [Gnome Files](../data_access/data_access_linux.qmd#gnome-files) or similar. -- Using [iBridges or the Python iRODS Client](../data_access/yoda_using_python.qmd). If you use Python for your analysis, you could include transfer of the source data and results in your scripts. This way you can automate data management and avoid duplicates or temporary data. For some workflows it is also possible to access a file directly by streaming, [see below](#workflow-streaming). +However, we only recommend working with this method if you are working with a small number of small files (few MBs), or if you just want to browse files and folders. This is because when working with larger files, performance of operations like reading and writing files will be slow and can greatly increase the runtime of your analysis. In certain cases, you might run into errors because of this. When you make changes to a file or create a new file on Yoda, this method does not provide clear feedback about the ‘upload’ of those changes. If you interrupt the upload (e.g. by shutting down your PC), the changes might be lost. Since the files can be easily opened by an editor you also risk that you might change files on Yoda by accident. ::: callout-tip ## Tips -- Make sure you have a good internet connection when you download (large) files to your PC and when you upload your results to Yoda. Regardless of the method you choose, this will be the biggest determinant of download speed. On HPC and VRE systems, the connections should be very good. +- Only use this method for small file sizes and small folders. -- Treat the downloaded files as a temporary working copy and make sure to remove them whenever they are not needed anymore. In this way, you make sure the version of the file on Yoda is the ‘ground truth’ version of your data and prevents the creation of copies of copies that might go out of sync. Automate the downloading of files, removal of temporary copies, and uploading of output as much as possible. This improves the reproducibility of results and reduces the potential of human error. +- Be careful when you create new files or make changes to files: wait long enough and double-check the integrity of the files and whether the data has been stored properly on Yoda (e.g. via the Yoda portal). -- Use iBridges or iCommands to (automatically) add file-level metadata to your files on Yoda when you upload them (e.g. file version, experimental condition, etc.). This way, you can keep your project organised. Note that metadata to describe the to-be-archived data package as a whole should be added via the web portal. +- Make sure only one person at a time is working on the data to prevent conflicts. ::: -## Workflow: mount with Network Disk +## Workflow: downloading files and folders > Suitable for: > -> - Analysis system: PC, VRE and some HPC systems +> - Analysis system: PC, VRE, HPC > -> - Data: small operations on small files only +> - Data: All file and folder sizes, assuming there is enough storage on the analysis system -Yoda can be mounted as a Network Disk on your system via the WebDAV protocol. The main advantage is that this method allows you to see the files in your file explorer as if they are on your computer. You can then perform your analysis on the analysis system as if the files were stored locally. +In this workflow, you download the files and folders that you want to analyse from Yoda to the system where you plan to run the analysis, i.e. you create a working copy of your data. You run the analysis on the system, and afterwards upload the data and/or results back to Yoda. You can also safely remove your working copy again, since the source data stays untouched in Yoda. In this way you can save storage space on the analysis system. -- On Windows using [Windows Explorer](../data_access/yoda_using_windowsexplorer.qmd) or [WebDrive](../data_access/yoda_using_webdrive.qmd) -- On MacOS using [Finder](../data_access/data_access_macos.qmd#mounting-the-yoda-webdav-in-finder) -- On Linux using [Gnome Files](../data_access/data_access_linux.qmd#gnome-files) or similar. +The main reason for choosing this method is that it is relatively straightforward, and it will give you good performance when reading your file in your analysis script. -However, we only recommend working with this method if you are working with a small number of small files (few MBs), or if you just want to browse files and folders. This is because when working with larger files, performance of operations like reading and writing files will be slow and can greatly increase the runtime of your analysis. In certain cases, you might run into errors because of this. When you make changes to a file or create a new file on Yoda, this method does not provide clear feedback about the ‘upload’ of those changes. If you interrupt the upload (e.g. by shutting down your PC), the changes might be lost. Since the files can be easily opened by an editor you also risk that you might change files on Yoda by accident. +There are several ways in which you can download and upload the files: + +| Tool | Typical dataset | Platform | Explanation | +| --- | --- | --- | --- | +| **Yoda web portal** | 10GB, 100 files or less | PC, VRE | This can be done if you have an internet browser available (e.g., your PC and some VREs). You could choose this option when you do not want to install additional tools on your system. However, this method is not very reliable when transferring large files. Also, the web portal will not give you clear feedback on whether a download was completed correctly. | +| **WebDAV client**
[manual](../data_access/introduction.qmd) | 100GB, 1000 files or less | PC, VRE | This method is more suitable for a larger amount of files (up to several thounands files and a 100GB of data). WebDAV can be slow when transferring a large amount of small files. It is possible to automate the transfer files using WebDAV with Python, but it would be better to use the iRODS interface, see below. | +| **iCommands or GoCommands**
[manual](../data_access/yoda_using_icommands.qmd) | Small to very large | PC, VRE, HPC | These command line tools can handle very large datasets and also offer many features for working with file-level metadata. It is also possible to check the integrity of uploaded and downloaded files, see the [ichksum command](https://docs.irods.org/4.3.4/icommands/user/). | +| **iBridges or the Python iRODS Client**
[manual](../data_access/yoda_using_python.qmd) | Small to very large | PC, VRE, HPC | If you use Python for your analysis, you could include transfer of the source data and results in your scripts. This way you can automate data management and avoid duplicates or temporary data. For some workflows it is also possible to access a file directly by streaming, [see below](#workflow-streaming). | ::: callout-tip ## Tips -- Only use this method for small file sizes and small folders. +- Make sure you have a good internet connection when you download (large) files to your PC and when you upload your results to Yoda. Regardless of the method you choose, this will be the biggest determinant of download speed. On HPC and VRE systems, the connections should be very good. -- Be careful when you create new files or make changes to files: wait long enough and double-check the integrity of the files and whether the data has been stored properly on Yoda (e.g. via the Yoda portal). +- Treat the downloaded files as a temporary working copy and make sure to remove them whenever they are not needed anymore. In this way, you make sure the version of the file on Yoda is the ‘ground truth’ version of your data and prevents the creation of copies of copies that might go out of sync. Automate the downloading of files, removal of temporary copies, and uploading of output as much as possible. This improves the reproducibility of results and reduces the potential of human error. -- Make sure only one person at a time is working on the data to prevent conflicts. +- Use iBridges or iCommands to (automatically) add file-level metadata to your files on Yoda when you upload them (e.g. file version, experimental condition, etc.). This way, you can keep your project organised. Note that metadata to describe the to-be-archived data package as a whole should be added via the web portal. ::: -## Workflow: streaming {#workflow-streaming} +## Workflow: streaming > Suitable for: > From 5c84b0c69e96a12833a6632ef4ba34787130d824 Mon Sep 17 00:00:00 2001 From: Peter Vos Date: Tue, 16 Dec 2025 09:50:37 +0100 Subject: [PATCH 4/8] redundant --- manuals/yoda/using_yoda/analysing_data.qmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/manuals/yoda/using_yoda/analysing_data.qmd b/manuals/yoda/using_yoda/analysing_data.qmd index 39d8600f7..0e18ba8d4 100644 --- a/manuals/yoda/using_yoda/analysing_data.qmd +++ b/manuals/yoda/using_yoda/analysing_data.qmd @@ -66,8 +66,8 @@ There are several ways in which you can download and upload the files: | Tool | Typical dataset | Platform | Explanation | | --- | --- | --- | --- | -| **Yoda web portal** | 10GB, 100 files or less | PC, VRE | This can be done if you have an internet browser available (e.g., your PC and some VREs). You could choose this option when you do not want to install additional tools on your system. However, this method is not very reliable when transferring large files. Also, the web portal will not give you clear feedback on whether a download was completed correctly. | -| **WebDAV client**
[manual](../data_access/introduction.qmd) | 100GB, 1000 files or less | PC, VRE | This method is more suitable for a larger amount of files (up to several thounands files and a 100GB of data). WebDAV can be slow when transferring a large amount of small files. It is possible to automate the transfer files using WebDAV with Python, but it would be better to use the iRODS interface, see below. | +| **Yoda web portal** | 10GB, 100 files or less | PC, some VRE | This can be done if you have an internet browser available (e.g., your PC and some VREs). You could choose this option when you do not want to install additional tools on your system. However, this method is not very reliable when transferring large files. Also, the web portal will not give you clear feedback on whether a download was completed correctly. | +| **WebDAV client**
[manual](../data_access/introduction.qmd) | 100GB, 1000 files or less | PC, VRE | WebDAV can be slow when transferring a large amount of small files. It is possible to automate the transfer files using WebDAV with Python, but it would be better to use the iRODS interface, see below. | | **iCommands or GoCommands**
[manual](../data_access/yoda_using_icommands.qmd) | Small to very large | PC, VRE, HPC | These command line tools can handle very large datasets and also offer many features for working with file-level metadata. It is also possible to check the integrity of uploaded and downloaded files, see the [ichksum command](https://docs.irods.org/4.3.4/icommands/user/). | | **iBridges or the Python iRODS Client**
[manual](../data_access/yoda_using_python.qmd) | Small to very large | PC, VRE, HPC | If you use Python for your analysis, you could include transfer of the source data and results in your scripts. This way you can automate data management and avoid duplicates or temporary data. For some workflows it is also possible to access a file directly by streaming, [see below](#workflow-streaming). | From 46ec7bc4e3c5db23f81b10a4bdda406774cf75cf Mon Sep 17 00:00:00 2001 From: Peter Vos Date: Tue, 16 Dec 2025 12:03:12 +0100 Subject: [PATCH 5/8] improvements --- .../yoda/data_access/yoda_using_python.qmd | 158 +++++++++++++++++- manuals/yoda/using_yoda/analysing_data.qmd | 12 +- 2 files changed, 161 insertions(+), 9 deletions(-) diff --git a/manuals/yoda/data_access/yoda_using_python.qmd b/manuals/yoda/data_access/yoda_using_python.qmd index f1d846e8f..e57b8b0a5 100644 --- a/manuals/yoda/data_access/yoda_using_python.qmd +++ b/manuals/yoda/data_access/yoda_using_python.qmd @@ -3,13 +3,163 @@ title: Using Python categories: [] description: "This page explains how to transfer data using Python scripting." --- -Data in Yoda is not directly accessible, you have to download data to the machine that contains your analysis software first. If you do your analysis with Python scripts anyway, for example on [Snellius](/topics/snellius.qmd) or [Ada](/topics/ada.qmd), it can be useful to script the data transfer as well. +Data in Yoda is not directly accessible, you have to download data to the machine that contains your analysis software first. If you do your analysis with Python scripts anyway, for example on [Snellius](/topics/snellius.qmd) or [Ada](/topics/ada.qmd), it can be useful to script the data access and transfer as well. -## Python iRODS Client (PRC) +## Python iRODS Client +The Python iRODS Client (PRC) is the default way to access data in iRODS programatically. +### Install +```sh +pip install python-irodsclient +``` + +### Setting up a session to access Yoda +The easiest way to setup a session to Yoda is by using the information in the [irods environment file](./yoda_using_icommands.qmd#environment-file). + +The code below sets up a session using all the correct settings for Yoda: +```python +import json +from irods.session import iRODSSession +from pathlib import Path +from getpass import getpass +import ssl + +def get_irods_environment(irods_environment_file): + """Reads the irods_environment.json file, which contains the environment configuration.""" + + print( + f"Trying to retrieve connection settings from: {irods_environment_file}" + ) + + try: + with open(irods_environment_file, "r") as f: + return json.load(f) + except: + print(f'Could not open {irods_environment_file}') + exit() + +def setup_session(ca_file='/etc/ssl/certs/ca-certificates.crt'): + """Use irods environment files to configure a iRODSSession. User is prompted for the password""" + + irods_env = get_irods_environment(f"{Path.home()}/.irods/irods_environment.json") + + password = getpass(f"Enter valid DAP for user {irods_env['irods_user_name']}: ") + + ssl_context = ssl.create_default_context( + purpose=ssl.Purpose.SERVER_AUTH, cafile=ca_file, capath=None, cadata=None + ) + + ssl_settings = { + "client_server_negotiation": "request_server_negotiation", + "client_server_policy": "CS_NEG_REQUIRE", + "encryption_algorithm": "AES-256-CBC", + "encryption_key_size": 32, + "encryption_num_hash_rounds": 16, + "encryption_salt_size": 8, + "ssl_context": ssl_context, + } + + session = iRODSSession( + host=irods_env["irods_host"], + port=irods_env["irods_port"], + user=irods_env["irods_user_name"], + password=password, + zone=irods_env["irods_zone_name"], + authentication_scheme="pam_password", + **ssl_settings, + ) + + return session + +session=setup_session() + +# workload +coll=session.collections.get(f"/{session.zone}/home") +for col in coll.subcollections: + print(col.name) +``` + +### More information +You can find more information on using the iRODS client in the [README on github](https://github.com/irods/python-irodsclient/blob/main/README.md). ## iBridges -https://ibridges.readthedocs.io/en/stable/quickstart.html +The PRC can be hard to use because it requires some prior knowledge on the structure and terminology used in iRODS. For this reason developers at Utrecht University create [iBridges](https://github.com/iBridges-for-iRODS/iBridges), which makes it easier to do basic file and metadata manipulation in iRODS. + +### Installation +Installation is again as simple as: +```sh +pip install ibridges +``` + +### Connecting +To connect you will need the [irods environment file](./yoda_using_icommands.qmd#environment-file). iBridges expects the file to be in `~/.irods/irods_environment.json` but you can point it to a different location. +```python +from ibridges import Session +from pathlib import Path +from getpass import getpass + +password = getpass(f"Enter valid DAP: ") +session = Session(irods_env_path=Path.home() / ".irods" / "irods_environment.json", password=password) +``` + +### Upload data +You can easily upload your data with the previously created session: + +```python +from ibridges import upload + +upload(session, "/your/local/path", "/irods/path") +``` +This upload function can upload both directories (collections in iRODS) and files (data objects in iRODS). + +### Add iRODS metadata +One of the powerful features of iRODS is its ability to store metadata with your data in a consistent manner. Let’s add some metadata to a collection or data object: + +```python +from ibridges import IrodsPath + +ipath = IrodsPath(session, "/irods/path") +ipath.meta.add("some_key", "some_value", "some_units") +``` +We have used the IrodsPath class here, which is another central class to the iBridges API. From here we have access to the metadata as shown above, but additionally there are many more convenient features directly accessible such as getting the size of a collection or data object. A detailed description of the features is present in another part of the documentation. + +### Download data +Naturally, we also want to download the data back to our local machine. This is done with the download function: +```python +from ibridges import download + +download(session, "/irods/path", "/other/local/path") +``` + +### Closing the session +When you are done with your session, you should generally close it: +```python +session.close() +``` +### More information +More information on using iBridges can be found in the [online documentation](https://ibridges.readthedocs.io/en/stable/ibridges_python.html). + + +## Streaming +With the python-irodsclient which iBridges is built on, we can open the file inside of a data object as a stream and process the content without downloading the data. This is especially useful if you need to access data stored in large files. That works without any problems for textual data. + +```python +from ibridges import IrodsPath + +obj_path = IrodsPath(session, "path", "to", "object") +with obj_path.open('r') as stream: + content = stream.read().decode() +``` + +Some python libraries allow to be instantiated directly from such a stream. This is supported by e.g. [pandas](https://pandas.pydata.org/) and [polars](https://pola.rs/) for datafiles or [whisper](https://github.com/openai/whisper) for transcription and translation of audio files. + +```python +import pandas as pd + +with obj_path.open('r') as stream: + df = pd.read_csv(stream) + +print(df) +``` -## Advanced: iRODS metadata and rules diff --git a/manuals/yoda/using_yoda/analysing_data.qmd b/manuals/yoda/using_yoda/analysing_data.qmd index 0e18ba8d4..c4c0dba83 100644 --- a/manuals/yoda/using_yoda/analysing_data.qmd +++ b/manuals/yoda/using_yoda/analysing_data.qmd @@ -18,9 +18,9 @@ If your analysis cannot be run on your PC, for example because your dataset is t Below we discuss three possible workflows to work with data stored in Yoda: -1. Downloading files from Yoda, performing the analysis, and uploading the results to Yoda again. +1. Mounting the Network Drive and performing the analysis on the device on which the Network Drive is mounted. -2. Mounting the Network Drive and performing the analysis on the device on which the Network Drive is mounted. +2. Downloading files from Yoda, performing the analysis, and uploading the results to Yoda again. 3. Streaming data in memory, without having to download the data from Yoda. @@ -38,7 +38,7 @@ Yoda can be mounted as a Network Disk on your system via the WebDAV protocol. Th - On MacOS using [Finder](../data_access/data_access_macos.qmd#mounting-the-yoda-webdav-in-finder) - On Linux using [Gnome Files](../data_access/data_access_linux.qmd#gnome-files) or similar. -However, we only recommend working with this method if you are working with a small number of small files (few MBs), or if you just want to browse files and folders. This is because when working with larger files, performance of operations like reading and writing files will be slow and can greatly increase the runtime of your analysis. In certain cases, you might run into errors because of this. When you make changes to a file or create a new file on Yoda, this method does not provide clear feedback about the ‘upload’ of those changes. If you interrupt the upload (e.g. by shutting down your PC), the changes might be lost. Since the files can be easily opened by an editor you also risk that you might change files on Yoda by accident. +We only recommend working with this method if you are working with a small number of small files (few MBs), or if you just want to browse files and folders. This is because when working with larger files, performance of operations like reading and writing files will be slow and can greatly increase the runtime of your analysis. In certain cases, you might run into errors because of this. When you make changes to a file or create a new file on Yoda, this method does not provide clear feedback about the ‘upload’ of those changes. If you interrupt the upload (e.g. by shutting down your PC), the changes might be lost. Since the files can be easily opened by an editor you also risk that you might change files on Yoda by accident. ::: callout-tip ## Tips @@ -74,11 +74,13 @@ There are several ways in which you can download and upload the files: ::: callout-tip ## Tips -- Make sure you have a good internet connection when you download (large) files to your PC and when you upload your results to Yoda. Regardless of the method you choose, this will be the biggest determinant of download speed. On HPC and VRE systems, the connections should be very good. +- Make sure you have a good internet connection when you download (large) files to your PC and when you upload your results to Yoda. Regardless of the method you choose, this will be the biggest determinant of download speed. On HPC and VRE systems, the connections should be ok. - Treat the downloaded files as a temporary working copy and make sure to remove them whenever they are not needed anymore. In this way, you make sure the version of the file on Yoda is the ‘ground truth’ version of your data and prevents the creation of copies of copies that might go out of sync. Automate the downloading of files, removal of temporary copies, and uploading of output as much as possible. This improves the reproducibility of results and reduces the potential of human error. - Use iBridges or iCommands to (automatically) add file-level metadata to your files on Yoda when you upload them (e.g. file version, experimental condition, etc.). This way, you can keep your project organised. Note that metadata to describe the to-be-archived data package as a whole should be added via the web portal. + +- If you consistently work with large datasets on campus, e.g. on your PC, SciCloud or Ada, consider storing data you are actively working with on [SciStor](/topics/scistor.qmd). You can store the bulk of your source data in Yoda to keep costs down and upload your results to Yoda to organize, share with external collaborators, archive and publish. ::: ## Workflow: streaming @@ -89,7 +91,7 @@ There are several ways in which you can download and upload the files: > > - Data analysis: When you use Python for your analysis -Streaming is a more advanced method to analyse data in Yoda. Using iBridges in Python or the Python iRODS client, it is possible to directly load data into memory without having to download it to the analysis system. The main advantage of this method is that you do not create new copies of the data that you later have to remove, and your workflow becomes a lot cleaner. Streaming is especially useful when your data is organised in larger files and you only need extracts, i.e. you do not need all the content. Another use case for streaming is when you need to combine/append the content of many small files for your analysis. +Streaming is a more advanced method to analyse data in Yoda. Using iBridges in Python or the Python iRODS client, it is possible to directly load data into memory without having to download it to the analysis system ([manual](../data_access/yoda_using_python.qmd#advanced-streaming)). The main advantage of this method is that you do not create new copies of the data that you later have to remove, and your workflow becomes a lot cleaner. Streaming is especially useful when your data is organised in larger files and you only need extracts, i.e. you do not need all the content. Another use case for streaming is when you need to combine/append the content of many small files for your analysis. Output of your scripts can also be streamed directly to Yoda along with metadata. That means you do not need to first create a local file which contains the output, but you can directly create a file on Yoda and “stream” the output into that file. From 9ecd8b3c5983737d19344b8caba16ef2aeb746e6 Mon Sep 17 00:00:00 2001 From: Jolien-S <142608800+Jolien-S@users.noreply.github.com> Date: Thu, 8 Jan 2026 10:26:01 +0100 Subject: [PATCH 6/8] Added some interpunction --- manuals/yoda/data_access/yoda_using_python.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/manuals/yoda/data_access/yoda_using_python.qmd b/manuals/yoda/data_access/yoda_using_python.qmd index e57b8b0a5..00ea11077 100644 --- a/manuals/yoda/data_access/yoda_using_python.qmd +++ b/manuals/yoda/data_access/yoda_using_python.qmd @@ -83,7 +83,7 @@ for col in coll.subcollections: You can find more information on using the iRODS client in the [README on github](https://github.com/irods/python-irodsclient/blob/main/README.md). ## iBridges -The PRC can be hard to use because it requires some prior knowledge on the structure and terminology used in iRODS. For this reason developers at Utrecht University create [iBridges](https://github.com/iBridges-for-iRODS/iBridges), which makes it easier to do basic file and metadata manipulation in iRODS. +The PRC can be hard to use, because it requires some prior knowledge on the structure and terminology used in iRODS. For this reason, developers at Utrecht University created [iBridges](https://github.com/iBridges-for-iRODS/iBridges), which makes it easier to do basic file and metadata manipulation in iRODS. ### Installation Installation is again as simple as: From 827a412f352bf5659be9cca17642be1bcba7d949 Mon Sep 17 00:00:00 2001 From: peer35 Date: Wed, 21 Jan 2026 17:24:50 +0100 Subject: [PATCH 7/8] clarify --- manuals/yoda/using_yoda/analysing_data.qmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/manuals/yoda/using_yoda/analysing_data.qmd b/manuals/yoda/using_yoda/analysing_data.qmd index c4c0dba83..fb38cdf77 100644 --- a/manuals/yoda/using_yoda/analysing_data.qmd +++ b/manuals/yoda/using_yoda/analysing_data.qmd @@ -66,8 +66,8 @@ There are several ways in which you can download and upload the files: | Tool | Typical dataset | Platform | Explanation | | --- | --- | --- | --- | -| **Yoda web portal** | 10GB, 100 files or less | PC, some VRE | This can be done if you have an internet browser available (e.g., your PC and some VREs). You could choose this option when you do not want to install additional tools on your system. However, this method is not very reliable when transferring large files. Also, the web portal will not give you clear feedback on whether a download was completed correctly. | -| **WebDAV client**
[manual](../data_access/introduction.qmd) | 100GB, 1000 files or less | PC, VRE | WebDAV can be slow when transferring a large amount of small files. It is possible to automate the transfer files using WebDAV with Python, but it would be better to use the iRODS interface, see below. | +| **Yoda web portal** | up to 10GB, up to 100 files | PC, some VRE | This can be done if you have an internet browser available (e.g., your PC and some VREs). You could choose this option when you do not want to install additional tools on your system. However, this method is not very reliable when transferring large files. Also, the web portal will not give you clear feedback on whether a download was completed correctly. | +| **WebDAV client**
[manual](../data_access/introduction.qmd) | up to 100GB, up to 1000 files | PC, VRE | WebDAV can be slow when transferring a large amount of small files. It is possible to automate the transfer files using WebDAV with Python, but it would be better to use the iRODS interface, see below. | | **iCommands or GoCommands**
[manual](../data_access/yoda_using_icommands.qmd) | Small to very large | PC, VRE, HPC | These command line tools can handle very large datasets and also offer many features for working with file-level metadata. It is also possible to check the integrity of uploaded and downloaded files, see the [ichksum command](https://docs.irods.org/4.3.4/icommands/user/). | | **iBridges or the Python iRODS Client**
[manual](../data_access/yoda_using_python.qmd) | Small to very large | PC, VRE, HPC | If you use Python for your analysis, you could include transfer of the source data and results in your scripts. This way you can automate data management and avoid duplicates or temporary data. For some workflows it is also possible to access a file directly by streaming, [see below](#workflow-streaming). | From 9ac0d03470c9f7c5b769e594ba409478f3e92374 Mon Sep 17 00:00:00 2001 From: peer35 Date: Wed, 21 Jan 2026 17:25:16 +0100 Subject: [PATCH 8/8] fix link --- manuals/yoda/using_yoda/analysing_data.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/manuals/yoda/using_yoda/analysing_data.qmd b/manuals/yoda/using_yoda/analysing_data.qmd index fb38cdf77..9b5da96db 100644 --- a/manuals/yoda/using_yoda/analysing_data.qmd +++ b/manuals/yoda/using_yoda/analysing_data.qmd @@ -91,7 +91,7 @@ There are several ways in which you can download and upload the files: > > - Data analysis: When you use Python for your analysis -Streaming is a more advanced method to analyse data in Yoda. Using iBridges in Python or the Python iRODS client, it is possible to directly load data into memory without having to download it to the analysis system ([manual](../data_access/yoda_using_python.qmd#advanced-streaming)). The main advantage of this method is that you do not create new copies of the data that you later have to remove, and your workflow becomes a lot cleaner. Streaming is especially useful when your data is organised in larger files and you only need extracts, i.e. you do not need all the content. Another use case for streaming is when you need to combine/append the content of many small files for your analysis. +Streaming is a more advanced method to analyse data in Yoda. Using iBridges in Python or the Python iRODS client, it is possible to directly load data into memory without having to download it to the analysis system ([manual](../data_access/yoda_using_python.qmd#streaming)). The main advantage of this method is that you do not create new copies of the data that you later have to remove, and your workflow becomes a lot cleaner. Streaming is especially useful when your data is organised in larger files and you only need extracts, i.e. you do not need all the content. Another use case for streaming is when you need to combine/append the content of many small files for your analysis. Output of your scripts can also be streamed directly to Yoda along with metadata. That means you do not need to first create a local file which contains the output, but you can directly create a file on Yoda and “stream” the output into that file.