-
Notifications
You must be signed in to change notification settings - Fork 762
docs: Update Workflows page #6648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for nextflow-docs-staging ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
I see the logic in having all of this stuff together. My instinct here is that I think the page would benefit from a stronger overview at the top, i.e., a list of major parts (entry, parameters, outputs, named workflows, dataflow) with one-line descriptions and an explanation that all of this is related under the concept of a workflow, with a reasonably simple example showing these parts. This probably softens the need for the order to be perfect, as everything is already framed rather than being progressive information. I'll vibe a PR to see how it looks in this scenario. |
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
682c69d to
fc7bcae
Compare
|
@christopher-hakkaart I tried my hand at an overview for the Workflows page, let me know what you think |
christopher-hakkaart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I like the direction and changes. I would like to take a pass at this, but I'm tied up with other projects for the next couple of weeks and won't have time to give it the attention it deserves. This is a significant improvement, and I would suggest mostly language changes rather than rearranging sections. I don't want to hold this up.
My main suggestions are to add a list of the main sections and minimal descriptions that are scannable rather than sentences and add a bit more context/description at the start of sections to help frame the section.
Take or leave what you like/dislike.
I started some consistency suggestions, but they probably aren't worth it until the whole page is standardized, which I can follow up with in a second PR.
| ## Outputs | ||
|
|
||
| :::{versionadded} 25.10.0 | ||
| This feature is available as a preview in Nextflow {ref}`24.04 <workflow-outputs-first-preview>`, {ref}`24.10 <workflow-outputs-second-preview>`, and {ref}`25.04 <workflow-outputs-third-preview>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This feature is available as a preview in Nextflow {ref}`24.04 <workflow-outputs-first-preview>`, {ref}`24.10 <workflow-outputs-second-preview>`, and {ref}`25.04 <workflow-outputs-third-preview>`. | |
| Workflow outputs are available as a preview in Nextflow {ref}`24.04 <workflow-outputs-first-preview>`, {ref}`24.10 <workflow-outputs-second-preview>`, and {ref}`25.04 <workflow-outputs-third-preview>`. |
docs/workflow.md
Outdated
| # Workflows | ||
|
|
||
| In Nextflow, a **workflow** is a function that is specialized for composing processes and dataflow logic (i.e. channels and operators). | ||
| In Nextflow, a **workflow** is a function that is specialized for composing {ref}`processes <process-page>` and dataflow logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| In Nextflow, a **workflow** is a function that is specialized for composing {ref}`processes <process-page>` and dataflow logic. | |
| In Nextflow, a **workflow** is a specialized function for composing {ref}`processes <process-page>` and dataflow logic. |
| Workflow outputs are intended to replace the {ref}`publishDir <process-publishdir>` directive. See {ref}`migrating-workflow-outputs` for guidance on migrating from `publishDir` to workflow outputs. | ||
| ::: | ||
|
|
||
| A script can define an *output block* which declares the top-level outputs of the workflow. Each output should be assigned in the `publish` section of the entry workflow. Any channel in the workflow can be assigned to an output, including process and subworkflow outputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| A script can define an *output block* which declares the top-level outputs of the workflow. Each output should be assigned in the `publish` section of the entry workflow. Any channel in the workflow can be assigned to an output, including process and subworkflow outputs. | |
| A script can define an *output block* to declare the top-level workflow outputs. Each output should be assigned in the `publish` section of the entry workflow. Any channel in the workflow can be assigned to an output, including process and subworkflow outputs. |
| In Nextflow, a **workflow** is a function that is specialized for composing {ref}`processes <process-page>` and dataflow logic: | ||
|
|
||
| See {ref}`syntax-workflow` for a full description of the workflow syntax. | ||
| - An [entry workflow](#entry-workflow) is the entrypoint of a pipeline. It can take [parameters](#parameters) as inputs using the `params` block, and it can publish [outputs](#outputs) using the `output` block. | ||
|
|
||
| :::{note} | ||
| Workflows were introduced in DSL2. If you are still using DSL1, see {ref}`dsl1-page` for more information about how to migrate your Nextflow pipelines to DSL2. | ||
| ::: | ||
| - A [named workflow](#named-workflows) is a workflow that can be called by other workflows. It can define its own inputs and outputs, which are called *takes* and *emits*. | ||
|
|
||
| - Both entry workflows and named workflows can contain [dataflow logic](#dataflow) such as calling processes, workflows, and channel operators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| In Nextflow, a **workflow** is a function that is specialized for composing {ref}`processes <process-page>` and dataflow logic: | |
| See {ref}`syntax-workflow` for a full description of the workflow syntax. | |
| - An [entry workflow](#entry-workflow) is the entrypoint of a pipeline. It can take [parameters](#parameters) as inputs using the `params` block, and it can publish [outputs](#outputs) using the `output` block. | |
| :::{note} | |
| Workflows were introduced in DSL2. If you are still using DSL1, see {ref}`dsl1-page` for more information about how to migrate your Nextflow pipelines to DSL2. | |
| ::: | |
| - A [named workflow](#named-workflows) is a workflow that can be called by other workflows. It can define its own inputs and outputs, which are called *takes* and *emits*. | |
| - Both entry workflows and named workflows can contain [dataflow logic](#dataflow) such as calling processes, workflows, and channel operators. | |
| A **workflow** composes {ref}`processes <process-page>` and dataflow logic to define how data flows through your pipeline. A Nextflow script typically includes: | |
| - **[Entry workflow](#entry-workflow)**: A main entrypoint that orchestrates the pipeline | |
| - **[Parameters](#parameters)**: Configurable inputs | |
| - **[Outputs](#outputs)**: Published results | |
| - **[Named workflows](#named-workflows)**: Reusable workflow components that can be called by other workflows | |
| - **[Dataflow](#dataflow)**: Channels and operators connecting processes | |
| For detailed syntax and usage instructions, see {ref}`syntax-workflow`. |
IMO - these are relatively novel concepts for a new user, a high-level list of what each part is that can be scanned helps frame the page.
|
|
||
| ## Parameters | ||
|
|
||
| Parameters can be declared in a Nextflow script with the `params` block or with *legacy* parameter declarations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Parameters are configurable variables that control pipeline behavior. You can declare parameters with [typed parameters](#typed-parameters) in the `params` block or with [legacy parameters](#legacy-parameters) to customize pipeline behavior at runtime. |
| The default output directory is `results` in the launch directory. | ||
|
|
||
| By default, all output files are published to the output directory. Each output in the output block can define where files are published using the `path` directive. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The default output directory is `results` in the launch directory. | |
| By default, all output files are published to the output directory. Each output in the output block can define where files are published using the `path` directive. For example: | |
| The default output directory is `results` in the launch directory. | |
| By default, Nextflow publishes all output files to the output directory. Each output in the output block can define where Nextflow publishes files using the `path` directive: |
| └── ... | ||
| ``` | ||
|
|
||
| All files received by an output are published into the specified directory. Lists, maps, and tuples are recursively scanned for nested files. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| All files received by an output are published into the specified directory. Lists, maps, and tuples are recursively scanned for nested files. For example: | |
| Nextflow publishes all files received by an output into the specified directory. Nextflow recursively scans lists, maps, and tuples for nested files: |
|
|
||
| The above example publishes each channel value to a different subdirectory. In this case, each pair of FASTQ files is published into a subdirectory based on the sample ID. | ||
|
|
||
| The closure can even define a different path for each individual file using the `>>` operator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The closure can even define a different path for each individual file using the `>>` operator: | |
| You can define a different path for each individual file using the `>>` operator: |
| } | ||
| ``` | ||
|
|
||
| Each `>>` specifies a *source file* and *publish target*. The source file should be a file or collection of files, and the publish target should be a directory or file name. If the publish target ends with a slash, it is treated as the directory in which source files are published. Otherwise, it is treated as the target filename of a source file. Only files that are published with the `>>` operator are saved to the output directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Each `>>` specifies a *source file* and *publish target*. The source file should be a file or collection of files, and the publish target should be a directory or file name. If the publish target ends with a slash, it is treated as the directory in which source files are published. Otherwise, it is treated as the target filename of a source file. Only files that are published with the `>>` operator are saved to the output directory. | |
| Each `>>` specifies a *source file* and *publish target*. The source file should be a file or collection of files, and the publish target should be a directory or file name. If the publish target ends with a slash, Nextflow treats it as the directory in which to publish source files. |
|
|
||
| ### Index files | ||
|
|
||
| Each output can create an index file of the values that were published. An index file preserves the structure of channel values, including metadata, which is simpler than encoding this information with directories and file names. The index file can be a CSV (`.csv`), JSON (`.json`), or YAML (`.yml`, `.yaml`) file. The channel values should be files, lists, maps, or tuples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Each output can create an index file of the values that were published. An index file preserves the structure of channel values, including metadata, which is simpler than encoding this information with directories and file names. The index file can be a CSV (`.csv`), JSON (`.json`), or YAML (`.yml`, `.yaml`) file. The channel values should be files, lists, maps, or tuples. | |
| Index files are structured metadata files that catalog published outputs and their associated metadata. An index file preserves the structure of channel values, including metadata, which is simpler than encoding this information with directories and file names. The index file can be a CSV (`.csv`), JSON (`.json`), or YAML (`.yml`, `.yaml`) file. The channel values should be files, lists, maps, or tuples. |
|
I started working on a PR here before the holidays: https://github.com/christopher-hakkaart/nextflow/tree/chris-docs-workflow-page |
This PR re-organizes the documentation on workflows and dataflow logic to match the current Nextflow programming model.
Move "Outputs" to the top after "Entry workflow" and "Parameters" to emphasize that these three things go together
Move "Dataflow" page into the "Workflows" page as a section after "Entry workflow" and "Named workflows", since dataflow logic exists primarily in the context of a workflow
Move auxiliary sections (calling processes, special operators, recursion) under "Dataflow", since all of these concepts fall under the umbrella of dataflow logic
I'm open to different ordering / structuring. I just think all of this content goes together