From 08ddd8a07a575c217d94fa134e995ec427248820 Mon Sep 17 00:00:00 2001 From: Chase Naples Date: Sun, 14 Sep 2025 14:48:01 -0400 Subject: [PATCH] docs: expand resume guidance beyond failures\n\n- Add examples for iterating on downstream steps\n- Show combining resume with compute backends (--with batch/kubernetes)\n- Note safe local reproduction of production runs\n- Include snippet to detect resumed runs via current.origin_run_id\n- Link to programmatic Runner.resume\n\nFixes Netflix/metaflow-docs#38 --- docs/metaflow/debugging.md | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/docs/metaflow/debugging.md b/docs/metaflow/debugging.md index 1c7d3b2e..6a54ec54 100644 --- a/docs/metaflow/debugging.md +++ b/docs/metaflow/debugging.md @@ -159,6 +159,43 @@ This would resume execution from the step `start`. If you specify a step that co after the step that failed, execution resumes from the failed step - you can't skip over steps. +### Using `resume` beyond failures + +While `resume` is most often used to recover from failures, it is equally useful for +fast iteration when nothing has failed. Typical patterns include: + +- Iterate on downstream logic while reusing upstream results. For example, after editing + `join`, you can re-run only downstream work by resuming from an earlier step: + + ```bash + python debug.py resume start + ``` + +- Try a different compute backend without re-running upstream steps. You can combine + `resume` with an execution decorator: + + ```bash + python debug.py resume --origin-run-id --with batch + # or + python debug.py resume --origin-run-id --with kubernetes + ``` + +- Reproduce and extend a production run locally. Resuming a production run with + `--origin-run-id` executes in your personal namespace, so it’s safe to experiment + without affecting production. + +- Detect resumed runs inside your code when you need conditional behavior: + + ```python + from metaflow import current + + if current.origin_run_id: + print(f"Resumed from run {current.origin_run_id}") + ``` + +For programmatic control, see [Runner.resume](/metaflow/managing-flows/runner#programmatic-resume), +which offers synchronous and async variants for notebooks and scripts. + ### Resume and parameters and configs If your flow has [`Parameters`](basics#how-to-define-parameters-for-flows), you can't