Skip to content

Fix task-level audit logs missing success/running events in Airflow 3.1.x#61932

Open
sjyangkevin wants to merge 7 commits intoapache:mainfrom
sjyangkevin:issues/58381/success-running-audit-logs-missing
Open

Fix task-level audit logs missing success/running events in Airflow 3.1.x#61932
sjyangkevin wants to merge 7 commits intoapache:mainfrom
sjyangkevin:issues/58381/success-running-audit-logs-missing

Conversation

@sjyangkevin
Copy link
Contributor

@sjyangkevin sjyangkevin commented Feb 15, 2026

closes: #58381

Issue

In Airflow 2, task instance state transition such as RUNNING, SUCCESS, SKIPPED, FAILED were logged into the log table through the TaskInstance model methods. In Airflow 3, the task state update/management is moved to the execution API endpoints ti_run and ti_update_state. However, the endpoints were not wired up to create log records, resulting in missing audit logs for task instance state transitions.

Fix

Added session.add(Log(...)) calls to both ti_run and ti_update_state when the task instance state is updated. The call will be executed in the same transaction as the state update in both endpoints.

In ti_run, the audit log is placed inside the else branch such that duplicated requests, or conflict request, due to network glitch, will not be logged.

In ti_update_state, the TI select query is updated to fetch task_id, run_id, and map_index.

Screenshot from 2026-02-14 19-13-44

Caveat

The following fields in the log table are missing. Require extra query/join to fetch the following information.

  • logical_date (previously execution_date)
  • owner (the value is airflow in Airflow 2)
  • owner_display_name (this field is also empty for task state transition)
  • extra (full_command will not be available as it is not run/update through CLI; hostname is not available in ti_update_state)

EmptyOperator or operators skipped by branch doesn't have audit log

Screenshot from 2026-02-14 22-24-31

DAG Test Samples

Screenshot from 2026-02-14 22-13-16

Edited on Feb 27, 2026

Gather data for owner, and logical_date from DagModel and DagRun, and fill extra with host_name.

Screenshot from 2026-02-27 00-34-48 Screenshot from 2026-02-27 00-37-19
Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: [Antigravity] following the guidelines

The tool is used to analyze the existing test cases to understand the expected behavior of both ti_run and ti_update_state. It is also used to generate new test cases.


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:task-sdk labels Feb 15, 2026
Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on @sjyangkevin, some comments

Copy link
Contributor Author

@sjyangkevin sjyangkevin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @amoghrajesh , thanks for the feedback!

I wonder if I should create dedicated tests for those situations (e.g., on duplicate requests, conflict) instead of putting them into the existing tests. I feel those might be good to catch.

I will parameterize the payload and probably task instance param to reduce duplication in the state change test case.

Also, will check how to handle the e2e test.

Thanks!

@sjyangkevin sjyangkevin force-pushed the issues/58381/success-running-audit-logs-missing branch from eec8eee to 1d181d4 Compare February 19, 2026 04:38
@sjyangkevin
Copy link
Contributor Author

sjyangkevin commented Feb 19, 2026

I've cleaned up the test cases, and merge them into a single parameterized one. I feel we don't actually need to set up the time_machine such that the test structure can be unified by parameterizing the payload and state.

Still trying to understand the end-to-end testing failure as I am not very familiar with the context. At high-level understanding, it seems like the test expect the User field to be not None, but will verify it. Would appreciate if anyone familiar with this can provide some insights.

Thanks!

@sjyangkevin sjyangkevin force-pushed the issues/58381/success-running-audit-logs-missing branch 3 times, most recently from 02bf1a9 to 2fc81f4 Compare February 25, 2026 17:50
@sjyangkevin sjyangkevin force-pushed the issues/58381/success-running-audit-logs-missing branch 2 times, most recently from 7c74399 to 02f06df Compare February 25, 2026 23:30
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the query to join with DagModel and DagRun to fetch logical_date and owner, and update with_for_update to lock only TI in the join query. Construct a JSON string only with host_name in extra, which is also present in earlier version of Airflow.

assert ti.state == expected_state
assert ti.end_date == end_date

@pytest.mark.parametrize(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidate all the test cases into a parameterized one.

Comment on lines +1189 to +1216
mock.Mock(
one=lambda: (
"running",
1,
0,
"dag",
"task",
"run",
-1,
"localhost",
timezone.utcnow(),
"test_owner",
)
), # First call returns "queued"
mock.Mock(
one=lambda: (
"running",
1,
0,
"dag",
"task",
"run",
-1,
"localhost",
timezone.utcnow(),
"test_owner",
)
), # Second call returns "queued"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the TI query return more values, update the mock.

@sjyangkevin sjyangkevin force-pushed the issues/58381/success-running-audit-logs-missing branch 2 times, most recently from e453054 to a77b2d1 Compare February 27, 2026 05:29
@sjyangkevin sjyangkevin force-pushed the issues/58381/success-running-audit-logs-missing branch from a77b2d1 to 9d89ebe Compare February 27, 2026 15:48
@eladkal eladkal added the backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch label Feb 27, 2026
@eladkal eladkal added this to the Airflow 3.1.8 milestone Feb 27, 2026
@potiuk
Copy link
Member

potiuk commented Mar 8, 2026

LGTM, I would need more maintainers to confirm though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:task-sdk backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Task-level audit logs missing SUCCESS/RUNNING events in Airflow 3.1.x (only FAILED and state mismatch recorded)

5 participants