Add file_extension, enable_legacy_filename fields to BlobType#7009
Add file_extension, enable_legacy_filename fields to BlobType#7009ddl-rliu wants to merge 2 commits intoflyteorg:masterfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7009 +/- ##
==========================================
- Coverage 58.55% 56.91% -1.65%
==========================================
Files 701 931 +230
Lines 41100 58220 +17120
==========================================
+ Hits 24068 33138 +9070
- Misses 14911 22039 +7128
- Partials 2121 3043 +922
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
flytecopilot/data/download.go
Outdated
| var resultLit *core.Literal | ||
| var resultV interface{} | ||
| var err error | ||
| // TODO: Refactor handleLiteral to accept a list of file paths |
There was a problem hiding this comment.
Assuming this PR's general approach is acceptable, I can do this refactor in the same PR, or in a follow-up PR (Also, there can be an improvement here to use symlinks, if the output will be written 2+ times).
The refactor isn't actually needed for correctness, it's more for clarity. The returned resultV, resultLit is identical for calls to handleLiteral that only differ by the value of varPath.
Add a new `file_extension` string field to the BlobType protobuf message, allowing FlyteFile to optionally specify a file extension (e.g. "csv") that flytecopilot appends when writing blobs to local disk during the download phase (e.g. "data.csv"). When empty (the default), behavior is unchanged. Add a new `enable_legacy_filename` bool field to the BlobType protobuf message, allowing FlyteFile to optionally specify whether to preserve backward compatibility for tasks that read from the extensionless path. Regenerated all protobuf bindings (Go, Python, JS, Rust, ES, Swagger). Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
592e227 to
22ff989
Compare
Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
Tracking issue
Closes #7024
Why are the changes needed?
We'll assume the above behavior is not a bug (it is long-standing, and a "bugfix" will likely break existing workflows). Instead, this PR proposes enhancements to FlyteFile/Blob to support writing workflow inputs with the file extension. This PR includes several changes across flyteidl and flytecopilot.
The enhancements allow workflows to be flexible when writing blobs. Specifically, the existing behavior where flytecopilot writes blobs during the download phase without the file extension (e.g. "inputs/data") can now be enhanced so that the file extension is included (e.g. "inputs/data.csv").
What changes were proposed in this pull request?
[flyteidl]
Add a new
file_extensionstring field to the BlobType protobuf message, allowing FlyteFile to optionally specify a file extension (e.g. "csv") that flytecopilot appends when writing blobs during the download phase (e.g. "data.csv"). When empty (the default), behavior is unchanged.Add a new
enable_legacy_filenamebool field to the BlobType protobuf message, allowing FlyteFile to optionally specify whether to preserve backward compatibility for tasks that read from the extensionless path.Regenerated all protobuf bindings (Go, Python, JS, Rust, ES, Swagger).
[flytecopilot]
The copilot download phase infers the desired download behavior(s) from the input interface.
[flytekit] PR: flyteorg/flytekit#3406
Alternatives considered
The PR's approach configures the file download behaviors at the BlobType level (e.g. per FlyteFile). This has several pros (granularity, explicitness).
But, one con is that unlike
BlobType.format(which can be inferred from the output filename e.g. "data.csv" ->format: "csv"), the new fieldsBlobType.file_extension, BlobType.enable_legacy_filenamecould not be inferred from the output filename (does "data.csv" match tofile_extension: "csv"?). Ultimately, this seems like it is introducing a minor inconsistency, but acceptable given the benefits. (It also seems like this is all hypothetical - copilot upload does not actually infer the format from the filename, so this concern may be somewhat moot?)Here are the other approaches I considered:
1. New flyte-copilot CLI flags
--file_extension_config ENUM=(disabled, enabled, legacy)file_extension_config = disabled (by default) - Same behavior as today (
data: FlyteFile[csv]is written tooutputs/data)file_extension_config = enabled - New behavior as today (
data: FlyteFile[csv]is written tooutputs/data.csv)file_extension_config = legacy - New behavior but backwards compatible, (
data: FlyteFile[csv]is written tooutputs/dataandoutputs/data.csv)Cons:
2. No changes, user code should be modified to read from the existing path e.g.
dataand add the extension itselfCons:
How was this patch tested?
Labels
Please add one or more of the following labels to categorize your PR:
This is important to improve the readability of release notes.
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link