Skip to content

feat: implement group().on_done() barrier for multi-parent dependencies#1

Merged
jeremi merged 2 commits into19.0from
feat/group-on-done-barrier
Feb 18, 2026
Merged

feat: implement group().on_done() barrier for multi-parent dependencies#1
jeremi merged 2 commits into19.0from
feat/group-on-done-barrier

Conversation

@jeremi
Copy link
Member

@jeremi jeremi commented Feb 18, 2026

Add dependency_job_ids (Json) and pending_dependency_count (Integer) fields to queue.job to support wait-for-all barriers. Group callbacks now start in waiting state and transition to pending only after every group member completes. Failure of any member cascades to the callback.

The worker path uses atomic SQL (UPDATE SET count = count - 1) for concurrency safety. The ORM path (_release_dependents/_fail_dependents) serves run_now() and button actions.

Add dependency_job_ids (Json) and pending_dependency_count (Integer)
fields to queue.job to support wait-for-all barriers. Group callbacks
now start in waiting state and transition to pending only after every
group member completes. Failure of any member cascades to the callback.

The worker path uses atomic SQL (UPDATE SET count = count - 1) for
concurrency safety. The ORM path (_release_dependents/_fail_dependents)
serves run_now() and button actions.
@gemini-code-assist
Copy link

Summary of Changes

Hello @jeremi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the job queuing system by introducing robust support for multi-parent job dependencies, specifically for group().on_done() callbacks. It enables a "wait-for-all" barrier mechanism, ensuring that a callback job only proceeds after all its preceding group members have successfully completed. This change improves the reliability and flexibility of complex job orchestrations by correctly managing dependencies and propagating failures across job graphs.

Highlights

  • Wait-for-all Barrier for group().on_done(): Implemented group().on_done() to act as a wait-for-all barrier, ensuring callback jobs only proceed after all group members have completed.
  • New Job Dependency Fields: Introduced dependency_job_ids (JSON list of parent job IDs) and pending_dependency_count (integer) fields to the queue.job model to manage multi-parent dependencies.
  • State Management for Callbacks: Configured group().on_done() callback jobs to start in a waiting state and transition to pending only when their pending_dependency_count reaches zero.
  • Failure Cascading: Added logic to automatically cascade failure to the group().on_done() callback if any of its member jobs fail permanently.
  • Concurrency Safety: Utilized atomic SQL updates for pending_dependency_count in the worker path to ensure concurrency safety when releasing multi-parent dependents.
  • ORM Integration: Provided ORM-based methods (_release_dependents, _fail_dependents) to handle multi-parent dependencies for run_now() and manual button actions.
Changelog
  • MIGRATING_FROM_QUEUE_JOB.md
    • Updated documentation to reflect the new wait-for-all barrier behavior of group().on_done().
    • Clarified limitations of chain() when used with group().
  • docs/job-graphs.md
    • Added a new section detailing group callbacks, including a code example and a Mermaid diagram.
    • Explained the use of dependency_job_ids and pending_dependency_count for the barrier.
    • Updated the "How It Works" section to mention group barriers.
  • docs/migration.md
    • Updated migration notes to describe the new group().on_done() wait-for-all barrier.
    • Reiterated chain() limitations.
  • job_worker/cli/worker.py
    • Modified job timeout handling to include graph_uuid in queries.
    • Added logic to cascade failures to multi-parent dependents using dependency_job_ids.
    • Updated finalize_job to decrement pending_dependency_count and transition jobs to pending using atomic SQL.
    • Integrated logic into handle_exception to cascade failures to multi-parent dependents.
  • job_worker/delay.py
    • Extended Delayable class with a _dependency_job_ids attribute.
    • Modified delay method to pass dependency_job_ids to the enqueue function.
    • Updated DelayableGroup.delay to collect member job IDs and set _dependency_job_ids for on_done callbacks.
  • job_worker/models/queue_job.py
    • Added dependency_job_ids (JSON) and pending_dependency_count (Integer) fields to the queue.job model.
    • Created a GIN index for the dependency_job_ids field.
    • Implemented _release_dependents and _fail_dependents methods to manage multi-parent dependencies via ORM.
    • Integrated _release_dependents into button_set_to_done and run_now.
    • Integrated _fail_dependents into button_set_to_failed and run_now.
    • Modified enqueue to accept dependency_job_ids, initialize pending_dependency_count, and set the job state to waiting if dependencies are present.
  • job_worker/tests/test_delayable_semantics.py
    • Renamed and updated test_group_on_done_does_not_set_parent_id to test_group_on_done_creates_barrier.
    • Added assertions for dependency_job_ids and pending_dependency_count in test_group_on_done_creates_barrier.
    • Added test_group_on_done_barrier_released_when_all_done to verify callback transition after all members complete.
    • Added test_group_on_done_cascade_failure to verify failure propagation from group members to the callback.
    • Added test_group_on_done_button_set_to_done to verify correct handling of manual job completion for multi-parent dependents.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and valuable feature: a wait-for-all barrier for group().on_done() callbacks. The implementation correctly adds the necessary fields to the queue.job model and leverages atomic SQL updates in the hot path of the worker for good performance. The documentation and tests are also well-updated.

My review focuses on a few key areas for improvement:

  1. Performance: The ORM-based paths for releasing and failing dependent jobs (_release_dependents, _fail_dependents) are inefficient and can lead to N+1 query problems. I've suggested replacing them with raw SQL queries for better performance, mirroring the efficient approach already used in the worker.
  2. Code Duplication: There's an opportunity to reduce code duplication in the worker's exception handling by reusing the new _fail_dependents method.

Overall, this is a solid implementation of a complex feature. Addressing these points will make it more robust and performant.

Comment on lines +530 to +546
def _release_dependents(self):
"""Release multi-parent dependents (group barriers) when this job completes."""
for job in self:
if not job.graph_uuid:
continue
waiting_deps = self.search(
[
("graph_uuid", "=", job.graph_uuid),
("state", "=", "waiting"),
("dependency_job_ids", "!=", False),
]
)
for dep in waiting_deps:
if job.id in (dep.dependency_job_ids or []):
dep.pending_dependency_count -= 1
if dep.pending_dependency_count <= 0:
dep.state = "pending"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of _release_dependents is inefficient. It fetches all potential dependent jobs within the same graph and then filters them in Python, leading to an N+1 update problem as each dependent job's pending_dependency_count is decremented individually. This can cause significant performance issues for groups with many members.

A much more performant approach would be to use a single, atomic SQL UPDATE query, similar to the logic in the worker's finalize_job function. This avoids fetching unnecessary data and looping in Python.

    def _release_dependents(self):
        """Release multi-parent dependents (group barriers) when this job completes."""
        for job in self:
            if not job.graph_uuid:
                continue
            # Use a single atomic SQL UPDATE for performance, mirroring the worker path.
            # This avoids fetching all dependents and looping in Python (N+1 problem).
            self.env.cr.execute(
                """
                UPDATE queue_job
                SET pending_dependency_count = pending_dependency_count - 1,
                    state = CASE
                        WHEN pending_dependency_count - 1 <= 0
                            THEN 'pending'
                        ELSE state
                    END,
                    write_date = NOW()
                WHERE graph_uuid = %s
                  AND state = 'waiting'
                  AND dependency_job_ids IS NOT NULL
                  AND dependency_job_ids @> (%s)::jsonb
                """,
                (job.graph_uuid, json.dumps([job.id])),
            )

Comment on lines +548 to +563
def _fail_dependents(self):
"""Cascade failure to multi-parent dependents."""
for job in self:
if not job.graph_uuid:
continue
waiting_deps = self.search(
[
("graph_uuid", "=", job.graph_uuid),
("state", "=", "waiting"),
("dependency_job_ids", "!=", False),
]
)
for dep in waiting_deps:
if job.id in (dep.dependency_job_ids or []):
dep.state = "failed"
dep.exc_info = f"Parent job {job.id} failed"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to _release_dependents, this method is inefficient. It fetches all potential dependents and filters in Python, then updates them one by one, causing an N+1 update problem. This can be replaced by a single UPDATE query for better performance.

    def _fail_dependents(self):
        """Cascade failure to multi-parent dependents."""
        for job in self:
            if not job.graph_uuid:
                continue
            # Use a single atomic SQL UPDATE for performance, mirroring the worker path.
            self.env.cr.execute(
                """
                UPDATE queue_job
                SET state = 'failed',
                    exc_info = %s,
                    write_date = NOW()
                WHERE graph_uuid = %s
                  AND state = 'waiting'
                  AND dependency_job_ids IS NOT NULL
                  AND dependency_job_ids @> (%s)::jsonb
                """,
                (
                    f"Parent job {job.id} failed",
                    job.graph_uuid,
                    json.dumps([job.id]),
                ),
            )

Comment on lines +734 to +746
# Cascade failure to multi-parent dependents (group barriers)
if job.graph_uuid:
dep_jobs = job.env["queue.job"].search(
[
("graph_uuid", "=", job.graph_uuid),
("state", "=", "waiting"),
("dependency_job_ids", "!=", False),
]
)
for dep in dep_jobs:
if job.id in (dep.dependency_job_ids or []):
dep.state = "failed"
dep.exc_info = f"Parent job {job.id} failed"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code for cascading failures to multi-parent dependents duplicates the logic from the newly added _fail_dependents method in the queue.job model. To improve maintainability and avoid code duplication, you should call job._fail_dependents() instead. This also has the benefit that if _fail_dependents is optimized (as suggested in another comment), this part of the code will automatically benefit from it.

            # Cascade failure to multi-parent dependents (group barriers)
            job._fail_dependents()

The __getattr__ guard was rejecting all method names starting with _,
but Odoo convention uses single-underscore prefixed methods for private
methods that are commonly delayed (e.g., _import_data, _validate).

Relax the check to only reject dunder methods (__) which are Python
internal attributes that should not be intercepted.
@jeremi jeremi merged commit d74da61 into 19.0 Feb 18, 2026
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant