fix: exclude JWT token from workload repr to prevent log exposure#62964
fix: exclude JWT token from workload repr to prevent log exposure#62964kaxil merged 14 commits intoapache:mainfrom
Conversation
Prevents JWT tokens from leaking into task logs by setting repr=False on the token field in BaseWorkloadSchema. When workload objects are logged (e.g. in execute_workload.py), Pydantic's auto-generated __repr__ previously included the raw JWT token string. This is a security concern as tokens should never appear in log output. The fix uses Pydantic's Field(repr=False) to exclude the token from string representations while keeping it fully accessible as an attribute. Fixes: apache#62428
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
There was a problem hiding this comment.
Pull request overview
This PR fixes a security issue where JWT tokens were being exposed in task logs via Pydantic's auto-generated __repr__ output. When workload objects are logged (e.g., log.info("Executing workload", workload=workload)), the full JWT token was visible in structured log output. The fix uses Pydantic's Field(repr=False) on the token field in BaseWorkloadSchema, the base class for all workload DTOs.
Changes:
- Set
repr=Falseon thetokenfield inBaseWorkloadSchemato prevent JWT tokens from appearing in repr/str output - Added a regression test to verify the token is excluded from
repr()while remaining accessible as an attribute - Added a changelog newsfragment for the bugfix
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
airflow-core/src/airflow/executors/workloads/base.py |
Added Field import and set repr=False on the token field in BaseWorkloadSchema |
airflow-core/tests/unit/executors/test_workloads.py |
Added regression test test_token_excluded_from_workload_repr verifying token exclusion from repr |
airflow-core/newsfragments/62428.bugfix.rst |
Changelog entry for the bugfix |
You can also share your feedback on Copilot code review. Take the survey.
…com/SibtainOcn/airflow into fix/redact-token-from-workload-repr
…com/SibtainOcn/airflow into fix/redact-token-from-workload-repr
|
Hey, all the review comments have been addressed and pushed. Just waiting on CI workflow approval whenever someone gets a chance. |
|
I done fix for the trailing blank line that keeps failing the static checks. Verified locally with pre-commit and trailing-whitespace, end-of-file checks pass now. Hope this goes through, if it still fails, could use some help understanding what the hook expects ? |
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
Backport failed to create: v3-1-test. View the failure log Run detailsNote: As of Merging PRs targeted for Airflow 3.X In matter of doubt please ask in #release-management Slack channel.
You can attempt to backport this manually by running: cherry_picker b196cf3 v3-1-testThis should apply the commit to the v3-1-test branch and leave the commit in conflict state marking After you have resolved the conflicts, you can continue the backport process by running: cherry_picker --continueIf you don't have cherry-picker installed, see the installation guide. |
What
Prevents JWT tokens from leaking into task logs by setting
repr=Falseon thetokenfield inBaseWorkloadSchema.Closes: #62428
Closes: #62773
Why
When workload objects are logged (e.g.
log.info('Executing workload', workload=workload)inexecute_workload.py), Pydantic's auto-generated__repr__includes all fields — including the raw JWT token. This is a security concern since tokens grant API access and should never appear in log output.The log output currently looks like:
ExecuteTask(token='eyJhbGciOi...full_token_here', ti=TaskInstance(...), ...)How
Uses Pydantic's built-in
Field(repr=False)on thetokenfield inBaseWorkloadSchema(the base class for all workload DTOs). This:repr()/str()output — so it never appears in logsworkload.token— no functional changeAfter the fix, log output shows:
ExecuteTask(ti=TaskInstance(...), dag_rel_path=..., type='ExecuteTask')Comparison with #62782
PR #62782 takes a different approach: it modifies the logging call sites to log individual fields and adds a structlog regex redactor. Our approach fixes the root cause at the model level (1 line change vs 4 files), ensuring the token is hidden from repr regardless of where or how the workload object is logged.
Both approaches are valid and complementary —
Field(repr=False)prevents the leak at the source, while a structlog redactor provides defense-in-depth.Changes
airflow-core/src/airflow/executors/workloads/base.py: AddFieldimport; setrepr=Falseontokenfieldairflow-core/tests/unit/executors/test_workloads.py: Add regression test verifying token is excluded from reprairflow-core/newsfragments/62428.bugfix.rst: Changelog entryTesting
test_token_excluded_from_workload_reprthat creates anExecuteTaskwith a fake JWT and assertsrepr()does not contain it