Update eager: make sure client secret can be specified as env var#2720
Update eager: make sure client secret can be specified as env var#2720cosmicBboy merged 14 commits intomasterfrom
Conversation
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2720 +/- ##
===========================================
+ Coverage 45.47% 71.82% +26.34%
===========================================
Files 193 193
Lines 19516 19560 +44
Branches 2818 3845 +1027
===========================================
+ Hits 8875 14049 +5174
+ Misses 10208 4832 -5376
- Partials 433 679 +246 ☔ View full report in Codecov by Sentry. |
flytekit/bin/entrypoint.py
Outdated
| for k, v in output_file_dict.items(): | ||
| utils.write_proto_to_file(v.to_flyte_idl(), os.path.join(ctx.execution_state.engine_dir, k)) | ||
|
|
||
| # make sure an event loop exists for data persistence step |
There was a problem hiding this comment.
this issue comes up not just with eager. Frameworks like Langchain that have async functionality will cause the event loop to end before the flyte task can run the async data persistence operation.
There was a problem hiding this comment.
Could we use a separate event loop just for flytekit?
There was a problem hiding this comment.
sounds right, I think someone more well-versed in async can help with that
Future-Outlier
left a comment
There was a problem hiding this comment.
thank you for this!
can we have
- register the task to remote cluster (sandbox or dogfood gcp) to prove it works?
- add workflow in unit tests.
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
Tests for this exist here and here. The changes made to the PR (adding the new The code snippet in the PR description should work on flyte sandbox |
| remote_cls = type(remote) | ||
| return remote_cls() |
There was a problem hiding this comment.
This assumes that the other FlyteRemote object can be created without a Config and uses an environment variable. Is this okay for OSS?
Also, should this carry over the default_domain and default_project from the original remote obj? Or should the other remote object figure out the domain and project from the execution environment?
There was a problem hiding this comment.
ooo yeah it should probably carry over those args.
admittedly this code path is basically: "tell me you only run on serverless without saying you run on serverless" 🙃
There was a problem hiding this comment.
Basically client_secret_env_var only really applies to serverless (this will fail with Flyte), unless at some point down the road Flyte provides an single API key UX.
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
wild-endeavor
left a comment
There was a problem hiding this comment.
some questions, but this event loop thing... can you explain why it's needed in entrypoint.py but there doesn't seem to be another one for local execution? Won't both need it?
| try: | ||
| secret_requests.append(Secret(group=client_secret_group, key=client_secret_key)) | ||
| except ValueError: | ||
| pass |
There was a problem hiding this comment.
what if there's nothing in the client_secret_env_var environment variable?
There was a problem hiding this comment.
client_secret_env_var doesn't figure into this try except block, or am I missing something?
There was a problem hiding this comment.
no yeah i know, i mean, what happens if all three (client_secret_env_var, client_secret_group and client_secret_key) are None?
| :param local_entrypoint: If True, the eager workflow will can be executed locally but use the provided | ||
| :py:func:`~flytekit.remote.FlyteRemote` object to create task/workflow executions. This is useful for local | ||
| testing against a remote Flyte cluster. | ||
| :param client_secret_env_var: if specified, binds the client secret to the specified environment variable for |
There was a problem hiding this comment.
okay with this for now, but I think we should see if we can't move this to some higher level in the future. the secret here we're talking about is the secret for hitting admin itself right?
There was a problem hiding this comment.
this should be implemented in Secret but we haven't moved on that for a while: #1726
|
|
||
| if client_secret_env_var is not None: | ||
| # this creates a remote client where the env var client secret is sufficient for authentication | ||
| os.environ[client_secret_env_var] = client_secret |
There was a problem hiding this comment.
confused what's happening here. if we're removing the asserts, then that means the group/key might be None. If they're none, won't the secrets_manager.get fail? So then what is the client_secret? Or is there some default in the secrets manager that somehow gets returned?
I don't get why we're setting the environment variable here to the value of client secret.
There was a problem hiding this comment.
this is to make it easy to authenticate with UnionRemote on serverless (via UNION_SERVERLESS_API_KEY env var). Secrets on serverless only needs a secret key (no group).
There was a problem hiding this comment.
just wondering, in the following scenario:
client_secret_keyisNoneclient_secret_groupisNone
(and this scenario is now valid because the asserts are now removed)
after line 626client_secret = secrets_manager.get(client_secret_group, client_secret_key)
what is the value ofclient_secret?
There was a problem hiding this comment.
going through the secretsmanager, it looks like it raises an IndexError if you call get(group=None, key=None), so I don't think it ever gets past line 626. Could we move the assertion that at least one of those has be be non-None into the get function?
wild-endeavor
left a comment
There was a problem hiding this comment.
Had a few more comments and went over this with @eapolinario. Could we
- Add an assert to secretsmanager.get that at least one of group/key should be non-None
- Could we move the async loop stuff to another PR? I think it'd be helpful to discuss it separately. Also do you remember how you triggered the issue? I wanna repro locally to understand more what langchain is doing. Ideally we handle in such a way that it's completely independent of any other library.
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
Updated PR with client secret key/group assertion.
Deleted this. For some reason I'm unable to repro the async issue I was seeing earlier on eager or with langchain async document loaders https://github.com/unionai-oss/union-rag/blob/main/union_rag/langchain.py#L138-L141 |


Why are the changes needed?
Currently, the
@eagerworkflows are too strict with its handling of the client secret values. Both secret group and key need to be specified for the secret request to be passed down to the underlying task. There is also an issue with the async event loop not being available for the downstream data persistence step in the task execution.What changes were proposed in this pull request?
This PR loosens this requirement to be compatible with Flyte backends that do not require secret groups. This PR also:
FlyteRemoteobject can authenticate with just the client secretHow was this patch tested?
The following workflow was tested on flyte sandbox