Skip to content

Conversation

@jupierce
Copy link
Contributor

@jupierce jupierce commented Dec 9, 2025

Enable a new pipeline to run build-sync in multi-model mode
The multi-arch teams want to be able to validate nightlies
which are EC/RC candidates BEFORE they are promoted. These
candidates are selected based on amd64 nightlies.

However, the release controllers are independent and create payloads
in an unsynchronized way, so there is no multi-arch payload that
completely matches the amd64 payload.

This approach will provide a compromise where multi-arch
payloads can be formulated to include the exact same source
commits / digests as the amd64 nightlies with a manifest-listed
multi-arch payload. This process is achieved by providing
gen-payload a "model" on which to build the multi-arch payload.

To keep the multi-arch payloads in sync with amd64, we will
need to run this periodically with --multi-model=amd64:0 which
will construct a multi nightly based on the most recent
amd64 nightly. If a multi nightly already exists with the
same name as the amd64 nightly, the task will exit early. In
other words, you should be able to run the task frequently
without creating too many multi payloads.

Once this new method is running regularly, the normal
build-sync can stop building and applying multi payload
changes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 9, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign fbladilo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 9, 2025

Walkthrough

Adds multi-model payload generation: new async release_inspector utilities; refactors release_gen_assembly to use them; extends release_gen_payload with a --multi-model workflow and helpers; introduces BuildSyncMultiPipeline in pyartcd to orchestrate multi-model sync, notifications, and CLI integration.

Changes

Cohort / File(s) Summary
Release Introspection Module
doozer/doozerlib/release_inspector.py
New async utilities: fetch_release_pullspec_from_stream_api, introspect_release, extract_nvr_from_pullspec, get_build_inspector_from_nvr, get_build_inspector_from_pullspec. Provides centralized release/pullspec/image label parsing and Konflux/Brew inspector resolution with consistent error handling.
Assembly Generation Refactoring
doozer/doozerlib/cli/release_gen_assembly.py
Replaced ad‑hoc per‑tag parsing with release_inspector calls; conditional Konflux DB binding only when konflux_db exists; uses introspect_release, extract_nvr_from_pullspec, and get_build_inspector_from_nvr for payload processing.
Payload Generation — Multi‑Model
doozer/doozerlib/cli/release_gen_payload.py
Adds --multi-model option and multi_model parameter; new helpers _resolve_multi_model_spec, _query_release_controller_for_nightly, _create_assembly_from_model, _apply_multi_model_assembly, _check_multi_payload_exists; flow changes to build/apply assemblies, skip single‑arch imagestream updates when multi_model active, derive deterministic timestamps, and add validation/error handling; new imports include aiohttp, RHCOS_RELEASES_STREAM_URL, KonfluxBuildRecord, GenAssemblyCli.
Doozer Utility Additions
doozer/doozerlib/util.py
Adds rc_api_url(tag, arch, private_nightly) and async check_nightly_exists(nightly_name, private_nightly=False) with HTTP/token handling and JSON parsing for release-controller interactions.
Get Nightlies Cleanup
doozer/doozerlib/cli/get_nightlies.py
Removed local rc_api_url implementation and now imports isolate_nightly_name_components and rc_api_url from doozerlib.util.
PyArtCD Multi‑Model Pipeline
pyartcd/pyartcd/pipelines/build_sync_multi.py
New BuildSyncMultiPipeline with async factory, run(), PR commenting, Slack/Redis failure tracking, _generate_multi_model_imagestream() (invokes doozer gen-payload --multi-model), success/failure handlers, and build_sync_multi CLI command. Adds GEN_PAYLOAD_ARTIFACTS_OUT_DIR.
PyArtCD CLI Entry
pyartcd/pyartcd/__main__.py
Imports new pipeline entry point build_sync_multi.
PyArtCD Locks
pyartcd/pyartcd/locks.py
Adds BUILD_SYNC_MULTI lock enum member and corresponding LOCK_POLICY entry (retry_count=36000, retry_delay_min=0.1, lock_timeout=DEFAULT_LOCK_TIMEOUT).
Elliott Utility Update
elliott/elliottlib/cli/tarball_sources_cli.py
Replaced pipes.quote with shlex.quote for tarball path quoting.

Sequence Diagram

sequenceDiagram
    participant CLI as build_sync_multi CLI
    participant Pipeline as BuildSyncMultiPipeline
    participant Konflux as Konflux API
    participant Doozer as Doozer gen-payload
    participant ReleaseAPI as Release Stream API
    participant GitHub as GitHub API
    participant Slack as Slack API

    CLI->>Pipeline: run()
    activate Pipeline

    Pipeline->>Konflux: query supported arches
    Konflux-->>Pipeline: arch list

    alt multi_model specified
        Pipeline->>Do ozer: _generate_multi_model_imagestream() (exec gen-payload --multi-model)
        activate Doozer

        Doozer->>ReleaseAPI: resolve nightly / fetch pullspec
        ReleaseAPI-->>Do ozer: nightly name / pullspec

        Doozer->>Do ozer: release_inspector.introspect_release(pullspec)
        Doozer->>Do ozer: release_inspector.extract_nvr_from_pullspec(...)
        Doozer->>Pipeline: assembly/artifacts applied
        deactivate Doozer
    end

    alt success
        Pipeline->>GitHub: comment_on_assembly_pr()
        GitHub-->>Pipeline: ok
        Pipeline->>Slack: post success
        Slack-->>Pipeline: ok
        Pipeline->>Pipeline: handle_success()
    else failure
        Pipeline->>GitHub: comment_on_assembly_pr()
        GitHub-->>Pipeline: ok
        Pipeline->>Slack: notify_failures()
        Slack-->>Pipeline: ok
        Pipeline->>Pipeline: handle_failure()
    end

    deactivate Pipeline
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (13)
doozer/doozerlib/release_inspector.py (3)

69-96: Use explicit Optional type annotation.

The arch parameter uses implicit Optional which is prohibited by PEP 484. Also, consider handling the edge case where arch is an empty string.

-async def extract_nvr_from_pullspec(pullspec: str, arch: str = None) -> Tuple[str, str, str]:
+async def extract_nvr_from_pullspec(pullspec: str, arch: Optional[str] = None) -> Tuple[str, str, str]:

And update the import at line 7:

-from typing import Dict, Optional, Tuple
+from typing import Optional, Tuple

Note: Dict import appears unused and can be removed.


115-128: Use explicit Optional type annotation.

Same issue as extract_nvr_from_pullspec - the arch parameter should use explicit Optional[str].

-async def get_build_inspector_from_pullspec(
-    runtime: Runtime, pullspec: str, arch: str = None
-) -> BuildRecordInspector:
+async def get_build_inspector_from_pullspec(
+    runtime: Runtime, pullspec: str, arch: Optional[str] = None
+) -> BuildRecordInspector:

19-44: Hardcoded API URL could be extracted to a constant.

The release stream API URL is hardcoded. Consider extracting it to artcommonlib.constants for maintainability and consistency with other API URLs in the codebase.

pyartcd/pyartcd/pipelines/build_sync_multi.py (3)

40-72: Unused is_base_name attribute.

The is_base_name attribute is defined at line 72 but doesn't appear to be used anywhere in this file.

If this is intended for future use, consider adding a TODO comment. Otherwise, it can be removed to reduce dead code.


434-436: Use logging.exception for better stack trace visibility.

When logging an exception, logging.exception automatically includes the stack trace, which aids debugging.

     except RedisError as e:
-        runtime.logger.error('Encountered error when updating the fail counter: %s', e)
+        runtime.logger.exception('Encountered error when updating the fail counter: %s', e)
         raise

196-196: Redundant jenkins.init_jenkins() call.

jenkins.init_jenkins() is called here and also at line 382 in the CLI command. The duplicate call is harmless but unnecessary.

Consider removing this call since it's already invoked before pipeline.run() is called.

doozer/doozerlib/cli/release_gen_payload.py (7)

410-410: Use explicit Optional type hint.

PEP 484 prohibits implicit Optional. Change to explicit type annotation.

-        multi_model: str = None,
+        multi_model: Optional[str] = None,

479-482: Consider using sys.exit(0) for consistency.

Using exit(0) works but sys.exit(0) is preferred in production code for clarity and consistency with other exit points in this file (e.g., lines 517, 522).

             if await self._check_multi_payload_exists(resolved_nightly):
                 self.logger.info(f"Multi-arch payload for {resolved_nightly} already exists. Exiting successfully.")
-                exit(0)
+                sys.exit(0)

671-671: Use next(iter()) for single element extraction.

More idiomatic and efficient than creating a list first.

-        multi_model_assembly_name = list(assembly_def['releases'].keys())[0]
+        multi_model_assembly_name = next(iter(assembly_def['releases'].keys()))

718-724: Use exception chaining for better debugging.

When re-raising exceptions, use from e to preserve the original traceback.

             try:
                 offset = int(offset_str)
                 if offset < 0:
                     raise ValueError("Offset must be non-negative")
             except ValueError as e:
-                raise DoozerFatalError(f"Invalid offset '{offset_str}'. Must be a non-negative integer: {e}")
+                raise DoozerFatalError(f"Invalid offset '{offset_str}'. Must be a non-negative integer: {e}") from e

743-772: Consider adding request timeouts.

The aiohttp requests don't specify timeouts, which could cause the process to hang indefinitely if the release controller is unresponsive. Consider adding a ClientTimeout to prevent this.

+        timeout = aiohttp.ClientTimeout(total=60)
         try:
-            async with aiohttp.ClientSession() as session:
+            async with aiohttp.ClientSession(timeout=timeout) as session:
                 if offset == 0:

798-805: Use exception chaining for clearer error provenance.

Add from e to preserve the original exception context.

         except aiohttp.ClientError as e:
-            raise DoozerFatalError(f"HTTP client error querying release controller for {arch}:{offset}: {e}")
+            raise DoozerFatalError(f"HTTP client error querying release controller for {arch}:{offset}: {e}") from e
         except json.JSONDecodeError as e:
-            raise DoozerFatalError(f"Failed to parse JSON response from release controller: {e}")
+            raise DoozerFatalError(f"Failed to parse JSON response from release controller: {e}") from e
         except DoozerFatalError:
             raise
         except Exception as e:
-            raise DoozerFatalError(f"Error querying release controller for {arch}:{offset}: {e}")
+            raise DoozerFatalError(f"Error querying release controller for {arch}:{offset}: {e}") from e

550-556: Consider adding request timeout here as well.

Similar to _query_release_controller_for_nightly, this HTTP request lacks a timeout. For consistency and reliability, add a ClientTimeout.

+        timeout = aiohttp.ClientTimeout(total=60)
         try:
-            async with aiohttp.ClientSession() as session:
+            async with aiohttp.ClientSession(timeout=timeout) as session:
                 async with session.get(all_streams_url) as resp:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 2b76eb6 and 7e65c59.

📒 Files selected for processing (7)
  • doozer/doozerlib/cli/release_gen_assembly.py (4 hunks)
  • doozer/doozerlib/cli/release_gen_payload.py (14 hunks)
  • doozer/doozerlib/release_inspector.py (1 hunks)
  • elliott/elliottlib/cli/tarball_sources_cli.py (2 hunks)
  • pyartcd/pyartcd/__main__.py (1 hunks)
  • pyartcd/pyartcd/locks.py (2 hunks)
  • pyartcd/pyartcd/pipelines/build_sync_multi.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
doozer/doozerlib/release_inspector.py (6)
artcommon/artcommonlib/model.py (1)
  • Model (116-155)
doozer/doozerlib/build_info.py (2)
  • BuildRecordInspector (261-362)
  • KonfluxBuildRecordInspector (664-778)
doozer/doozerlib/util.py (1)
  • isolate_nightly_name_components (276-297)
artcommon/artcommonlib/exectools.py (1)
  • cmd_gather_async (522-594)
doozer/doozerlib/repodata.py (1)
  • nvr (46-47)
artcommon/artcommonlib/konflux/konflux_db.py (1)
  • get_build_record_by_nvr (1002-1020)
pyartcd/pyartcd/__main__.py (1)
pyartcd/pyartcd/pipelines/build_sync_multi.py (1)
  • build_sync_multi (371-436)
doozer/doozerlib/cli/release_gen_assembly.py (1)
doozer/doozerlib/release_inspector.py (3)
  • introspect_release (47-66)
  • extract_nvr_from_pullspec (69-96)
  • get_build_inspector_from_nvr (99-112)
pyartcd/pyartcd/pipelines/build_sync_multi.py (9)
artcommon/artcommonlib/redis.py (1)
  • RedisError (16-17)
artcommon/artcommonlib/util.py (1)
  • split_git_url (134-143)
pyartcd/pyartcd/jenkins.py (1)
  • get_build_url (79-83)
pyartcd/pyartcd/oc.py (1)
  • registry_login (56-70)
pyartcd/pyartcd/util.py (1)
  • branch_arches (211-238)
pyartcd/pyartcd/runtime.py (1)
  • new_slack_client (42-53)
pyartcd/pyartcd/slack.py (1)
  • bind_channel (34-50)
artcommon/artcommonlib/exectools.py (1)
  • cmd_assert_async (637-705)
pyartcd/pyartcd/locks.py (2)
  • lock (184-192)
  • Lock (13-31)
🪛 Ruff (0.14.8)
doozer/doozerlib/release_inspector.py

33-33: Avoid specifying long messages outside the exception class

(TRY003)


44-44: Avoid specifying long messages outside the exception class

(TRY003)


59-59: Avoid specifying long messages outside the exception class

(TRY003)


64-64: Avoid specifying long messages outside the exception class

(TRY003)


69-69: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


84-84: Avoid specifying long messages outside the exception class

(TRY003)


94-94: Avoid specifying long messages outside the exception class

(TRY003)


116-116: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

pyartcd/pyartcd/pipelines/build_sync_multi.py

128-128: Do not catch blind exception: Exception

(BLE001)


435-435: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

doozer/doozerlib/cli/release_gen_payload.py

410-410: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


591-591: Consider moving this statement to an else block

(TRY300)


599-599: Do not catch blind exception: Exception

(BLE001)


671-671: Prefer next(iter(assembly_def['releases'].keys())) over single element slice

Replace with next(iter(assembly_def['releases'].keys()))

(RUF015)


709-709: Avoid specifying long messages outside the exception class

(TRY003)


716-716: Avoid specifying long messages outside the exception class

(TRY003)


722-722: Abstract raise to an inner function

(TRY301)


722-722: Avoid specifying long messages outside the exception class

(TRY003)


724-724: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


724-724: Avoid specifying long messages outside the exception class

(TRY003)


752-754: Abstract raise to an inner function

(TRY301)


752-754: Avoid specifying long messages outside the exception class

(TRY003)


758-758: Abstract raise to an inner function

(TRY301)


758-758: Avoid specifying long messages outside the exception class

(TRY003)


769-771: Abstract raise to an inner function

(TRY301)


769-771: Avoid specifying long messages outside the exception class

(TRY003)


782-782: Abstract raise to an inner function

(TRY301)


782-782: Avoid specifying long messages outside the exception class

(TRY003)


787-787: Abstract raise to an inner function

(TRY301)


787-787: Avoid specifying long messages outside the exception class

(TRY003)


790-792: Abstract raise to an inner function

(TRY301)


790-792: Avoid specifying long messages outside the exception class

(TRY003)


799-799: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


799-799: Avoid specifying long messages outside the exception class

(TRY003)


801-801: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


801-801: Avoid specifying long messages outside the exception class

(TRY003)


804-804: Do not catch blind exception: Exception

(BLE001)


805-805: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


805-805: Avoid specifying long messages outside the exception class

(TRY003)


817-817: Avoid specifying long messages outside the exception class

(TRY003)


819-819: Avoid specifying long messages outside the exception class

(TRY003)


1849-1849: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: tests
🔇 Additional comments (21)
elliott/elliottlib/cli/tarball_sources_cli.py (1)

3-3: Switch to shlex.quote is correct; just confirm minimum Python version

Using shlex.quote for the rsync path is a solid improvement over pipes.quote and keeps the constructed command safe for shell execution. This looks good as long as the runtime is guaranteed to be Python ≥ 3.3 (since shlex.quote is not available in Python 2). Please confirm that Python 2 is no longer a supported target; if it is, this change would break at runtime and you’d need a compatibility shim or alternative for older interpreters.

Also applies to: 170-170

pyartcd/pyartcd/locks.py (1)

25-25: LGTM!

The new BUILD_SYNC_MULTI lock enum and its policy entry follow the established patterns for other build-sync locks, with consistent retry and timeout configurations.

Also applies to: 105-109

pyartcd/pyartcd/__main__.py (1)

16-16: LGTM!

The import follows the established alphabetical ordering and pattern for pipeline modules. The CLI command registration occurs via the @cli.command decorator in the imported module.

doozer/doozerlib/cli/release_gen_assembly.py (3)

274-275: LGTM!

Good defensive programming to guard the KonfluxBuildRecord binding with a konflux_db existence check.


366-367: LGTM on shared utility usage.

Using release_inspector.introspect_release promotes code reuse and consistency across gen-payload and gen-assembly commands.


397-403: Verify Konflux DB availability for all build systems.

get_build_inspector_from_nvr unconditionally queries the Konflux DB, but lines 452-461 show that the code handles both brew and konflux build systems. If runtime.build_system == 'brew', the Konflux DB may not be available or appropriate to query.

doozer/doozerlib/release_inspector.py (3)

1-17: LGTM on module structure and imports.

Clean module organization with appropriate docstring explaining the shared utility purpose.


47-66: LGTM on release introspection.

Correct use of oc adm release info with proper error handling and validation of the returned tags.


99-112: Function only supports Konflux DB builds.

This function assumes runtime.konflux_db exists and only returns KonfluxBuildRecordInspector. If the runtime uses Brew (runtime.build_system == 'brew'), this will fail. Consider either:

  1. Documenting this limitation clearly, or
  2. Adding a check and raising a descriptive error if Konflux DB is unavailable
pyartcd/pyartcd/pipelines/build_sync_multi.py (5)

128-129: Broad exception catch is acceptable here.

The broad except Exception is intentional to prevent PR commenting failures from breaking the pipeline. The warning log provides visibility. This is a reasonable pattern for non-critical side-effect operations.


74-129: LGTM on PR commenting logic.

The method correctly handles edge cases (no PRs found, multiple PRs), supports dry-run mode, and includes proper error handling to prevent failures from breaking the main pipeline.


131-159: LGTM on run method.

The orchestration flow is clear: determine arches, update Jenkins title, conditionally comment on PR, login to registry, and generate imagestream.


317-332: LGTM on notification method.

Clear structure for sending failure reports to Slack channels with threaded messages for the detailed report.


370-436: LGTM on CLI command structure.

The CLI command follows established patterns with proper locking for stream assemblies, telemetry integration, and appropriate error handling for different exception types.

doozer/doozerlib/cli/release_gen_payload.py (7)

148-156: LGTM! Well-documented CLI option.

The --multi-model option has clear help text explaining both supported formats (direct nightly name and arch:offset specification).


524-601: Robust error handling for payload existence check.

The method handles various edge cases (malformed responses, missing fields, network errors) gracefully by returning False and logging warnings, which is appropriate for an existence check that should be resilient to transient failures.


614-621: Verify intentional permanent build_system override.

The code modifies rt.build_system to 'konflux' without restoring the original value. Per the comment at line 658, this is intentional. Ensure this is the expected behavior across all downstream code paths.


813-822: Clear validation for multi-model mode restrictions.

The validation appropriately restricts multi-model to stream assembly only and prevents incompatible options like image filters.


1386-1410: Correct handling of single-arch vs multi-arch in multi-model mode.

The logic properly skips single-arch imagestream updates while still populating multi_specs for the multi-arch payload creation, which aligns with the PR objective of only creating multi-arch payloads in multi-model mode.


1838-1851: Deterministic multi-release naming from source nightly.

The timestamp is correctly extracted from the resolved multi-model nightly name, ensuring the multi-arch payload has a consistent name derived from its source nightly rather than the current time.


2151-2152: Intentional rate-limit bypass for on-demand generation.

The bypass for multi-model mode is appropriate since these payloads are generated on-demand for validation, not as part of regular automated nightly builds.

Comment on lines +285 to +283
# Less than 2 failures, assembly != stream: just break the build
if fail_count < 2 or self.assembly != 'stream':
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bare raise without an active exception causes RuntimeError.

At line 287, raise is used without an active exception in the context. This will raise RuntimeError: No active exception to re-raise. The intention seems to be to exit early for non-stream assemblies or first failures, but this needs to raise a proper exception.

         # Less than 2 failures, assembly != stream: just break the build
         if fail_count < 2 or self.assembly != 'stream':
-            raise
+            raise RuntimeError(f"Multi-model build-sync failed for {self.version} (assembly: {self.assembly})")

Alternatively, if you want to re-raise the original exception, handle_failure should accept and store the exception context.

🤖 Prompt for AI Agents
In pyartcd/pyartcd/pipelines/build_sync_multi.py around lines 285 to 287,
replace the bare "raise" (which causes RuntimeError when no active exception)
with an explicit exception; for example raise RuntimeError(f"Build failed:
fail_count={fail_count}, assembly={self.assembly}") so the code exits with a
clear error message (alternatively modify handle_failure to accept/store the
caught exception and re-raise that stored exception here if you prefer
preserving original exception context).

@jupierce jupierce force-pushed the sync_multi_with_amd64 branch from 7e65c59 to 8ad6d6e Compare December 9, 2025 21:59
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
pyartcd/pyartcd/pipelines/build_sync_multi.py (2)

229-249: Always return a dict from get_unpermissable_assembly_issues and handle empty YAML safely

As written:

  • If yaml.safe_load returns None, if 'assembly_issues' in data: will raise TypeError.
  • If 'assembly_issues' is absent, the function implicitly returns None, but handle_failure expects a dict and immediately does unpermissable_issues['assembly_issues'], which will then fail.

Recommend treating a missing/empty report as “no issues” and always returning the initialized structure:

-        # Open and load the YAML file
-        with open(file_path, 'r') as file:
-            data = yaml.safe_load(file)
-
-        filtered_issues = {'assembly_issues': {}}
-
-        if 'assembly_issues' in data:
+        # Open and load the YAML file
+        with open(file_path, 'r') as file:
+            data = yaml.safe_load(file) or {}
+
+        filtered_issues = {'assembly_issues': {}}
+
+        if 'assembly_issues' in data:
             for component, issues in data['assembly_issues'].items():
                 filtered_data = [issue for issue in issues if not issue.get('permitted', False)]
                 if filtered_data:  # Add component only if there are valid issues
                     filtered_issues['assembly_issues'][component] = filtered_data
-
-            return filtered_issues
+        return filtered_issues

This way handle_failure can safely assume it always gets a dict with an assembly_issues key.


252-308: Replace bare raise in handle_failure with an explicit exception

handle_failure is called from another coroutine (after the original exception has already been handled), so the bare raise at Line [283] will trigger RuntimeError: No active exception to reraise instead of propagating the original error, and it runs before any of the notification logic below for repeat stream failures.

Recommend raising an explicit exception, including version/assembly/fail_count for context:

-        # Less than 2 failures, assembly != stream: just break the build
-        if fail_count < 2 or self.assembly != 'stream':
-            raise
+        # Less than 2 failures, or non-stream assemblies: just break the build without extra notifications
+        if fail_count < 2 or self.assembly != 'stream':
+            raise RuntimeError(
+                f"Multi-model build-sync failed for {self.version} "
+                f"(assembly={self.assembly}, failures={fail_count})"
+            )

This preserves the intended early-exit behavior without relying on exception context that doesn’t exist in this coroutine.

🧹 Nitpick comments (5)
pyartcd/pyartcd/pipelines/build_sync_multi.py (1)

124-129: Exception handling: keep broad catch for PR comments, but log Redis errors with stack traces

Two small exception-handling nits:

  • Line [128]: catching Exception around GitHub commenting is reasonable to avoid failing the whole job on notification issues; current warning log is fine.
  • Line [431]: for RedisError, using logger.exception would preserve the stack trace and aid debugging without changing control flow.

A minimal tweak for the Redis path:

-    except RedisError as e:
-        runtime.logger.error('Encountered error when updating the fail counter: %s', e)
-        raise
+    except RedisError:
+        runtime.logger.exception('Encountered error when updating the fail counter')
+        raise

[scope is minor and optional.]

Also applies to: 430-431

doozer/doozerlib/cli/release_gen_assembly.py (1)

273-276: Confirm get_build_inspector_from_nvr usage when build_system == 'brew'

The refactor to use release_inspector looks good and the guarded konflux_db.bind(...) (Line [274]) is a nice safety improvement. However, _process_release now unconditionally does:

name, version, release_ver = await release_inspector.extract_nvr_from_pullspec(payload_tag_pullspec)
build_nvr = f"{name}-{version}-{release_ver}"
build_inspector = await release_inspector.get_build_inspector_from_nvr(self.runtime, build_nvr)

get_build_inspector_from_nvr always queries runtime.konflux_db and returns a KonfluxBuildRecordInspector. If GenAssemblyCli is still expected to run in build_system == 'brew' mode (where konflux_db may be unset or incomplete), this will either:

  • Raise an AttributeError when runtime.konflux_db is None, or
  • Depend on having every brew build mirrored into Konflux DB.

Please confirm which invariant you rely on:

  • If GenAssembly is now only supported for Konflux workflows, consider guarding _process_release with if self.runtime.build_system == 'konflux' and failing fast (or keeping a brew path).
  • If brew flows must still work, either ensure runtime.konflux_db is always configured for brew groups as well, or branch here to use the existing BrewBuildRecordInspector path when build_system == 'brew'.

Also applies to: 366-404

doozer/doozerlib/cli/release_gen_payload.py (3)

526-604: _check_multi_payload_exists: behavior is safe; consider timeout and structured logging

Functionally this helper is defensive: any error (HTTP status, unexpected JSON shape, client error, parse error) logs and returns False, which only means “go ahead and build a payload again”, matching the intent to avoid false positives.

Two optional improvements:

  • Add an explicit client timeout so a stuck multi release controller doesn’t hang the entire job.
  • Use logger.exception in the broad except Exception block to retain stack traces.

For example:

timeout = aiohttp.ClientTimeout(total=60)
async with aiohttp.ClientSession(timeout=timeout) as session:
    ...

except aiohttp.ClientError:
    self.logger.exception("HTTP client error checking for existing multi payload")
    return False
except json.JSONDecodeError:
    self.logger.exception("Failed to parse JSON response from multi release controller")
    return False
except Exception:
    self.logger.exception("Unexpected error checking for existing multi payload")
    return False

Purely optional; current logic is functionally correct.


605-693: Ensure releases_config.releases[...] remains a Model, not a plain dict, when applying multi-model assembly

_apply_multi_model_assembly currently does:

multi_model_assembly_name = list(assembly_def['releases'].keys())[0]
...
rt.releases_config.releases[multi_model_assembly_name] = assembly_def['releases'][multi_model_assembly_name]
rt.assembly = multi_model_assembly_name
rt.assembly_type = AssemblyTypes.STREAM

If rt.releases_config.releases is a Model (as elsewhere in doozer), assigning a raw dict may result in releases[...] returning a plain dict for this key, while other callers typically expect a Model with attribute access (.assembly.group... etc.). That could subtly break any code that later inspects this assembly via releases_config.

Consider wrapping the inserted value in Model (mirroring how releases.yml is originally loaded), and using a more direct key extraction:

from artcommonlib.model import Model  # if not already imported

multi_model_assembly_name = next(iter(assembly_def['releases'].keys()))
assembly_config = assembly_def['releases'][multi_model_assembly_name]
...
rt.releases_config.releases[multi_model_assembly_name] = Model(dict_to_model=assembly_config)

Please verify how Model is used in your runtime and whether callers of runtime.get_releases_config() expect Model instances here; if so, the wrap above will keep the types consistent.


1848-1862: Multi-model timestamp derivation and rate-limit override are reasonable but assume standard nightly naming

Two behaviors here:

  • get_multi_release_names derives multi_ts from the resolved nightly name (e.g. 4.21.0-0.nightly-YYYY-MM-DD-HHMMSS) by joining the last four --separated fields. This will fail with DoozerFatalError if the nightly name format ever changes to fewer than 6 segments.
  • In apply_multi_imagestream_update, the rate-limiting check for how soon a new nightly may be published is skipped when self.multi_model is true:
if not self.multi_model and current_time < last_nightly_time + next_nightly_delay:
    ...
    return

The second behavior is exactly what you want for on-demand multi-model payloads. For the first, if you expect the release-controller nightly naming convention to remain stable, current logic is fine; otherwise you might consider using a helper (like isolate_nightly_name_components) or parsing from basis.reference_releases to make it more robust.

No functional blockers here.

Also applies to: 2161-2163

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 7e65c59 and 8ad6d6e.

📒 Files selected for processing (6)
  • doozer/doozerlib/cli/release_gen_assembly.py (4 hunks)
  • doozer/doozerlib/cli/release_gen_payload.py (14 hunks)
  • doozer/doozerlib/release_inspector.py (1 hunks)
  • pyartcd/pyartcd/__main__.py (1 hunks)
  • pyartcd/pyartcd/locks.py (2 hunks)
  • pyartcd/pyartcd/pipelines/build_sync_multi.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyartcd/pyartcd/locks.py
🧰 Additional context used
🧬 Code graph analysis (3)
pyartcd/pyartcd/__main__.py (1)
pyartcd/pyartcd/pipelines/build_sync_multi.py (1)
  • build_sync_multi (367-432)
doozer/doozerlib/cli/release_gen_assembly.py (2)
artcommon/artcommonlib/konflux/konflux_db.py (1)
  • bind (340-349)
doozer/doozerlib/release_inspector.py (3)
  • introspect_release (49-66)
  • extract_nvr_from_pullspec (69-93)
  • get_build_inspector_from_nvr (96-109)
pyartcd/pyartcd/pipelines/build_sync_multi.py (9)
artcommon/artcommonlib/redis.py (2)
  • RedisError (16-17)
  • set_value (89-97)
artcommon/artcommonlib/util.py (1)
  • split_git_url (134-143)
pyartcd/pyartcd/jenkins.py (1)
  • get_build_url (79-83)
pyartcd/pyartcd/oc.py (1)
  • registry_login (56-70)
pyartcd/pyartcd/util.py (1)
  • branch_arches (211-238)
pyartcd/pyartcd/slack.py (2)
  • bind_channel (34-50)
  • say (60-97)
doozer/doozerlib/cli/release_gen_payload.py (1)
  • run (454-524)
artcommon/artcommonlib/exectools.py (1)
  • cmd_assert_async (637-705)
pyartcd/pyartcd/locks.py (2)
  • lock (184-192)
  • Lock (13-31)
🪛 Ruff (0.14.8)
doozer/doozerlib/cli/release_gen_payload.py

412-412: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


593-593: Consider moving this statement to an else block

(TRY300)


601-601: Do not catch blind exception: Exception

(BLE001)


673-673: Prefer next(iter(assembly_def['releases'].keys())) over single element slice

Replace with next(iter(assembly_def['releases'].keys()))

(RUF015)


711-713: Avoid specifying long messages outside the exception class

(TRY003)


720-720: Avoid specifying long messages outside the exception class

(TRY003)


726-726: Abstract raise to an inner function

(TRY301)


726-726: Avoid specifying long messages outside the exception class

(TRY003)


728-728: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


728-728: Avoid specifying long messages outside the exception class

(TRY003)


758-760: Abstract raise to an inner function

(TRY301)


758-760: Avoid specifying long messages outside the exception class

(TRY003)


764-764: Abstract raise to an inner function

(TRY301)


764-764: Avoid specifying long messages outside the exception class

(TRY003)


775-777: Abstract raise to an inner function

(TRY301)


775-777: Avoid specifying long messages outside the exception class

(TRY003)


788-790: Abstract raise to an inner function

(TRY301)


788-790: Avoid specifying long messages outside the exception class

(TRY003)


795-795: Abstract raise to an inner function

(TRY301)


795-795: Avoid specifying long messages outside the exception class

(TRY003)


798-800: Abstract raise to an inner function

(TRY301)


798-800: Avoid specifying long messages outside the exception class

(TRY003)


807-807: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


807-807: Avoid specifying long messages outside the exception class

(TRY003)


809-809: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


809-809: Avoid specifying long messages outside the exception class

(TRY003)


812-812: Do not catch blind exception: Exception

(BLE001)


813-813: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


813-813: Avoid specifying long messages outside the exception class

(TRY003)


825-825: Avoid specifying long messages outside the exception class

(TRY003)


827-827: Avoid specifying long messages outside the exception class

(TRY003)


1859-1859: Avoid specifying long messages outside the exception class

(TRY003)

doozer/doozerlib/release_inspector.py

35-35: Avoid specifying long messages outside the exception class

(TRY003)


46-46: Avoid specifying long messages outside the exception class

(TRY003)


59-59: Avoid specifying long messages outside the exception class

(TRY003)


64-64: Avoid specifying long messages outside the exception class

(TRY003)


69-69: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


81-81: Avoid specifying long messages outside the exception class

(TRY003)


91-91: Avoid specifying long messages outside the exception class

(TRY003)


112-112: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

pyartcd/pyartcd/pipelines/build_sync_multi.py

128-128: Do not catch blind exception: Exception

(BLE001)


431-431: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: tests
🔇 Additional comments (6)
pyartcd/pyartcd/__main__.py (1)

4-47: New build_sync_multi import is consistent with existing CLI wiring

Importing build_sync_multi alongside other pipeline entrypoints matches the existing pattern of registering CLI commands via module import side effects; no issues here.

doozer/doozerlib/release_inspector.py (1)

19-47: Shared release introspection utilities look consistent and cohesive

Consolidating oc adm release info, oc image info label parsing, and Konflux DB lookups into release_inspector simplifies callers and keeps error handling in one place. The new helpers (introspect_release, extract_nvr_from_pullspec, and get_build_inspector_from_nvr / _from_pullspec) align with existing exectools/Model patterns and make the Konflux dependency explicit.

No changes needed here from a correctness standpoint.

Also applies to: 49-123

doozer/doozerlib/cli/release_gen_payload.py (4)

229-237: Multi-model initialization path and Konflux enforcement look coherent

The multi-model entry path:

  • Forces a Konflux runtime (runtime.initialize(..., build_system='konflux') when multi_model is set).
  • In GenPayloadCli.run, resolves the spec, optionally short-circuits if a multi payload already exists, generates an assembly via GenAssemblyCli, applies that to runtime, and then falls through into the standard assembly-inspection/payload generation flow.
  • Sets self.emergency_ignore_issues = True in multi-model mode so assembly viability doesn’t block building a modeled payload.

This is consistent with the PR goal of “modeling” an existing tested nightly and using the same payload machinery. No changes needed here.

Also applies to: 470-499


694-735: Multi-model spec resolution and release-controller lookups are well-structured

The _resolve_multi_model_spec / _query_release_controller_for_nightly pair:

  • Cleanly distinguishes “nightly name” vs 'arch:offset' specs.
  • Validates arch against a fixed set (['amd64', 's390x', 'ppc64le', 'arm64']).
  • Validates offset as a non-negative integer and surfaces issues as DoozerFatalError.
  • For offset 0, uses the /latest endpoint; for others, walks /releasestreams/all and checks that the requested offset is in range.

This matches the described semantics for multi-model selection. The exception messages are clear and already wrapped in DoozerFatalError, which is appropriate for CLI failures; nothing here blocks merging.

Also applies to: 736-814


820-853: Multi-model validation logic correctly constrains usage

The extended validate_parameters logic around multi-model:

  • Enforces --multi-model only when assembly == "stream".
  • Rejects use with -i/-x image filters (so the modeled nightly is not partially overridden).
  • Forces apply_multi_arch = True in multi-model mode, independent of --arches.

Combined with the existing guard:

if self.apply_multi_arch and (rt.images or rt.exclude or self.exclude_arch) and not self.multi_model:
    ...

this prevents incompatible parameter combinations while keeping previous behavior intact for non-multi-model runs. Looks good as-is.


1393-1421: Skipping single-arch imagestream updates in multi-model mode aligns with the PR intent

In sync_payloads, the new branch:

  • Still mirrors all per-arch payload content.
  • Avoids generating/applying per-arch imagestreams when self.multi_model is set.
  • Populates multi_specs directly from payload_entries_for_arch / private_payload_entries_for_arch so that sync_heterogeneous_payloads has the same input structure as in the normal path.

This cleanly implements “multi-arch payload only, no single-arch IS updates” for multi-model runs and matches the described pipeline behavior.

The multi-arch teams want to be able to validate nightlies
which are EC/RC candidates BEFORE they are promoted. These
candidates are selected based on amd64 nightlies.

However, the release controllers are independent and create payloads
in an unsynchronized way, so there is no multi-arch payload that
completely matches the amd64 payload.

This approach will provide a compromise where multi-arch
payloads can be formulated to include the exact same source
commits / digests as the amd64 nightlies with a manifest-listed
multi-arch payload. This process is achieved by providing
gen-payload a "model" on which to build the multi-arch payload.

To keep the multi-arch payloads in sync with amd64, we will
need to run this periodically with --multi-model=amd64:0 which
will construct a multi nightly based on the most recent
amd64 nightly. If a multi nightly already exists with the
same name as the amd64 nightly, the task will exit early. In
other words, you should be able to run the task frequently
without creating too many multi payloads.

Once this new method is running regularly, the normal
build-sync can stop building and applying multi payload
changes.
@jupierce jupierce force-pushed the sync_multi_with_amd64 branch from 8ad6d6e to ec692af Compare January 7, 2026 13:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @doozer/doozerlib/util.py:
- Around line 22-23: Remove the dead branch that forces arch to 'multi' after
calling isolate_nightly_name_components: the helper
isolate_nightly_name_components(...) always yields a valid brew arch (defaults
to "x86_64"), so the conditional "if not arch: arch = 'multi'" is unreachable;
delete that check from the code path in check_nightly_exists (and any identical
redundant check nearby) so arch flows through to go_arch_for_brew_arch and
go_suffix_for_arch unchanged.

In @pyartcd/pyartcd/pipelines/build_sync_multi.py:
- Around line 405-415: The code assumes get_build_url() returns a string and
calls .replace, which raises AttributeError when BUILD_URL is missing; update
the block that computes lock_identifier (used in locks.run_with_lock) to handle
None: call get_build_url(), check if it's truthy before doing .replace with
constants.JENKINS_UI_URL, and if falsy use a safe fallback (e.g.,
runtime.run_id, pipeline.id, or a fixed string plus timestamp) so
lock_identifier is always a string and locks.run_with_lock(pipeline.run(), lock,
lock_name, lock_id=lock_identifier) can proceed in non‑Jenkins environments.
- Around line 229-249: get_unpermissable_assembly_issues currently assumes
yaml.safe_load(file) returns a dict and only returns filtered_issues inside the
'assembly_issues' branch, which can yield TypeError or None; fix by treating the
loaded data as data = yaml.safe_load(file) or {} (i.e. default to {} if None),
ensure you always return filtered_issues at the end of the function (remove the
early return inside the if), and only iterate when data.get('assembly_issues')
is truthy so the function always returns a dict like filtered_issues even when
the YAML is empty or lacks the key; reference symbols:
get_unpermissable_assembly_issues, file_path, filtered_issues, and the
'assembly_issues' key.
- Around line 263-303: The bare "raise" in handle_failure (around the fail_count
/ self.assembly check) runs with no active exception and will raise
RuntimeError; replace that bare raise with a simple return to stop further
notifications and allow the caller to propagate the original exception (i.e.,
change the block "if fail_count < 2 or self.assembly != 'stream': raise" to "if
fail_count < 2 or self.assembly != 'stream': return"), keeping the rest of the
logic (fail_count, get_unpermissable_assembly_issues, notification frequency)
unchanged.
🧹 Nitpick comments (3)
doozer/doozerlib/cli/release_gen_payload.py (3)

65-90: check_multi_nightly_exists_for_model implements the intended “skip if multi already exists” behavior

The helper cleanly derives the target multi nightly name by:

  • Reusing isolate_nightly_name_components to get major_minor.
  • Parsing the timestamp from the model nightly suffix and reformatting as major_minor.0-0.nightly-multi-<timestamp>.
  • Delegating to check_nightly_exists for the actual RC query.

One minor thing to consider later: if model nightlies can be private, you may want to thread through the private_nightly flag from the source nightly instead of always checking the public controller.


261-288: Multi-model run flow is coherent; be aware of the intentional runtime mutations

The multi-model path in release_gen_payload and GenPayloadCli.run is internally consistent:

  • You re‑initialize the runtime in Konflux mode when multi_model is set, avoiding canonical‑builders‑from‑upstream logic.
  • GenPayloadCli.run resolves the spec (arch:offset or direct nightly), checks for an existing multi payload via check_multi_nightly_exists_for_model, generates an assembly via _create_assembly_from_model, then applies it to the runtime using _apply_multi_model_assembly.
  • _apply_multi_model_assembly mutates rt.releases_config, rt.assembly, and forces rt.assembly_type = AssemblyTypes.STREAM, which keeps downstream logic (issue permits, privacy modes, multi‑arch handling) aligned with stream semantics, while self.emergency_ignore_issues = True ensures the run exits successfully even with unresolved issues.

The side‑effect of permanently flipping rt.build_system to 'konflux' and rt.assembly to the synthetic multi‑model name is intentional here; just keep in mind that if you ever reuse the same Runtime instance for other operations after GenPayloadCli.run, they will see the mutated state.

Also applies to: 485-527, 558-646


647-767: Release-controller resolution for arch:offset specs is reasonable; watch RC ordering assumptions

_resolve_multi_model_spec/_query_release_controller_for_nightly give a clear contract:

  • Accept either a full nightly name or arch:offset (amd64:0, etc.), with good validation and explicit DoozerFatalError messages on bad inputs.
  • For offset 0, hit the .../releasestream/<stream>/latest endpoint.
  • For higher offsets, query .../releasestreams/all, locate the desired stream by name, and index into its releases list.

Two small considerations:

  • This assumes releases from /releasestreams/all are ordered newest‑first; if RC ever changes that contract, the “Nth” semantics would break. If not already documented, it might be worth a brief comment noting that assumption.
  • The heuristic ':' in spec and not spec.startswith('4.') is fine today, but will misclassify if we ever support other major versions or non‑OCP tags that also start with a digit. A stricter “looks like a nightly name” regex would make this more future‑proof.

These don’t block the current implementation, but are worth keeping in mind.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 8ad6d6e and ec692af.

📒 Files selected for processing (9)
  • doozer/doozerlib/cli/get_nightlies.py
  • doozer/doozerlib/cli/release_gen_assembly.py
  • doozer/doozerlib/cli/release_gen_payload.py
  • doozer/doozerlib/release_inspector.py
  • doozer/doozerlib/util.py
  • elliott/elliottlib/cli/tarball_sources_cli.py
  • pyartcd/pyartcd/__main__.py
  • pyartcd/pyartcd/locks.py
  • pyartcd/pyartcd/pipelines/build_sync_multi.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • doozer/doozerlib/cli/release_gen_assembly.py
  • elliott/elliottlib/cli/tarball_sources_cli.py
  • pyartcd/pyartcd/main.py
🧰 Additional context used
🧬 Code graph analysis (4)
doozer/doozerlib/util.py (1)
artcommon/artcommonlib/exectools.py (1)
  • cmd_gather (165-311)
doozer/doozerlib/release_inspector.py (9)
artcommon/artcommonlib/konflux/konflux_build_record.py (1)
  • KonfluxBuildOutcome (22-44)
artcommon/artcommonlib/model.py (1)
  • Model (116-155)
doozer/doozerlib/build_info.py (2)
  • BrewBuildRecordInspector (365-661)
  • BuildRecordInspector (261-362)
doozer/doozerlib/util.py (1)
  • isolate_nightly_name_components (277-298)
ocp-build-data-validator/validator/schema/__init__.py (1)
  • err (21-26)
artcommon/artcommonlib/exectools.py (1)
  • cmd_gather_async (522-594)
doozer/doozerlib/backend/pipelinerun_utils.py (1)
  • labels (296-298)
doozer/doozerlib/repodata.py (1)
  • nvr (46-47)
artcommon/artcommonlib/konflux/konflux_db.py (1)
  • get_build_record_by_nvr (1220-1244)
doozer/doozerlib/cli/get_nightlies.py (1)
doozer/doozerlib/util.py (2)
  • isolate_nightly_name_components (277-298)
  • rc_api_url (851-868)
pyartcd/pyartcd/pipelines/build_sync_multi.py (9)
artcommon/artcommonlib/redis.py (1)
  • RedisError (16-17)
artcommon/artcommonlib/util.py (1)
  • split_git_url (135-144)
pyartcd/pyartcd/jenkins.py (1)
  • get_build_url (80-84)
pyartcd/pyartcd/oc.py (1)
  • registry_login (56-70)
pyartcd/pyartcd/util.py (1)
  • branch_arches (211-238)
pyartcd/pyartcd/slack.py (2)
  • bind_channel (34-50)
  • say (60-97)
doozer/doozerlib/cli/release_gen_payload.py (1)
  • run (486-556)
artcommon/artcommonlib/exectools.py (1)
  • cmd_assert_async (637-705)
pyartcd/pyartcd/locks.py (2)
  • lock (184-192)
  • Lock (13-31)
🪛 Ruff (0.14.10)
doozer/doozerlib/util.py

931-931: Consider moving this statement to an else block

(TRY300)


939-939: Do not catch blind exception: Exception

(BLE001)

doozer/doozerlib/release_inspector.py

35-35: Avoid specifying long messages outside the exception class

(TRY003)


46-46: Avoid specifying long messages outside the exception class

(TRY003)


59-59: Avoid specifying long messages outside the exception class

(TRY003)


64-64: Avoid specifying long messages outside the exception class

(TRY003)


69-69: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


81-81: Avoid specifying long messages outside the exception class

(TRY003)


91-91: Avoid specifying long messages outside the exception class

(TRY003)


96-96: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


108-108: Avoid specifying long messages outside the exception class

(TRY003)


119-119: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

pyartcd/pyartcd/pipelines/build_sync_multi.py

128-128: Do not catch blind exception: Exception

(BLE001)


431-431: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

doozer/doozerlib/cli/release_gen_payload.py

444-444: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


626-626: Prefer next(iter(assembly_def['releases'].keys())) over single element slice

Replace with next(iter(assembly_def['releases'].keys()))

(RUF015)


664-666: Avoid specifying long messages outside the exception class

(TRY003)


673-673: Avoid specifying long messages outside the exception class

(TRY003)


679-679: Abstract raise to an inner function

(TRY301)


679-679: Avoid specifying long messages outside the exception class

(TRY003)


681-681: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


681-681: Avoid specifying long messages outside the exception class

(TRY003)


711-713: Abstract raise to an inner function

(TRY301)


711-713: Avoid specifying long messages outside the exception class

(TRY003)


717-717: Abstract raise to an inner function

(TRY301)


717-717: Avoid specifying long messages outside the exception class

(TRY003)


728-730: Abstract raise to an inner function

(TRY301)


728-730: Avoid specifying long messages outside the exception class

(TRY003)


741-743: Abstract raise to an inner function

(TRY301)


741-743: Avoid specifying long messages outside the exception class

(TRY003)


748-748: Abstract raise to an inner function

(TRY301)


748-748: Avoid specifying long messages outside the exception class

(TRY003)


751-753: Abstract raise to an inner function

(TRY301)


751-753: Avoid specifying long messages outside the exception class

(TRY003)


760-760: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


760-760: Avoid specifying long messages outside the exception class

(TRY003)


762-762: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


762-762: Avoid specifying long messages outside the exception class

(TRY003)


765-765: Do not catch blind exception: Exception

(BLE001)


766-766: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


766-766: Avoid specifying long messages outside the exception class

(TRY003)


778-778: Avoid specifying long messages outside the exception class

(TRY003)


780-780: Avoid specifying long messages outside the exception class

(TRY003)


1812-1812: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: tests
🔇 Additional comments (4)
pyartcd/pyartcd/locks.py (1)

23-27: BUILD_SYNC_MULTI lock is wired consistently with existing build-sync locks

The new BUILD_SYNC_MULTI enum value and its LOCK_POLICY entry mirror the existing build-sync locks (retry count, delay, timeout), so the multi-model pipeline gets its own concurrency guard without changing existing behavior.

Also applies to: 95-109

doozer/doozerlib/cli/get_nightlies.py (1)

3-3: Centralizing rc_api_url and isolate_nightly_name_components usage looks good

Importing rc_api_url and isolate_nightly_name_components from doozerlib.util keeps nightly/RC URL logic in one place and this module’s existing calls (rc_api_url(tag_base, _arch, private_nightly)) remain compatible.

Also applies to: 18-18

doozer/doozerlib/cli/release_gen_payload.py (1)

1316-1374: Multi-model behavior in sync_payloads and get_multi_release_names aligns with the design

  • In sync_payloads, skipping single‑arch imagestream updates when self.multi_model is set and only populating multi_specs for multi‑arch creation avoids touching per‑arch imagestreams in the modeling workflow, which matches the PR intent.
  • get_multi_release_names now derives the multi nightly timestamp from self.resolved_multi_model_nightly when in multi‑model mode. That yields deterministic 4.x.0-0.nightly-multi-<ts> tags that line up with the source nightly and prevents “drifting” timestamps across retries.

This looks correct and should make the multi nightly naming predictable and idempotent relative to the model nightly.

Also applies to: 1795-1815

doozer/doozerlib/release_inspector.py (1)

19-129: Release inspector helpers look correct

The new helpers correctly:

  • Locate a nightly's pullspec from the stream API and fail fast when the tag is missing.
  • Wrap oc adm release info into a reusable introspect_release.
  • Extract N/V/R labels via oc image info and enforce that all three are present.
  • Route Brew vs Konflux builds through get_build_inspector_from_nvr/get_build_inspector_from_pullspec.

Comment on lines +22 to 23
from artcommonlib.arch_util import GO_ARCHES, brew_arch_for_go_arch, go_arch_for_brew_arch, go_suffix_for_arch
from artcommonlib.assembly import AssemblyTypes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's see the structure and find the mentioned line ranges
ls -la doozer/doozerlib/util.py 2>/dev/null || echo "File may not exist yet"
wc -l doozer/doozerlib/util.py 2>/dev/null || echo "Cannot count lines"

Repository: openshift-eng/art-tools

Length of output: 170


🏁 Script executed:

# Get the imports and main code sections
cat -n doozer/doozerlib/util.py | head -50

Repository: openshift-eng/art-tools

Length of output: 1780


🏁 Script executed:

# Check the check_nightly_exists function around lines 277-299
cat -n doozer/doozerlib/util.py | sed -n '270,310p'

Repository: openshift-eng/art-tools

Length of output: 2128


🏁 Script executed:

# Check lines 851-941
cat -n doozer/doozerlib/util.py | sed -n '845,945p'

Repository: openshift-eng/art-tools

Length of output: 4727


🏁 Script executed:

# Search for isolate_nightly_name_components definition
rg -n "def isolate_nightly_name_components" doozer

Repository: openshift-eng/art-tools

Length of output: 171


🏁 Script executed:

# Check artcommonlib for GO_ARCHES and go_arch_for_brew_arch
find . -name "arch_util.py" -o -name "*.py" | xargs grep -l "GO_ARCHES\|go_arch_for_brew_arch" 2>/dev/null | head -5

Repository: openshift-eng/art-tools

Length of output: 259


🏁 Script executed:

# Search for check_nightly_exists calls
rg -n "check_nightly_exists" doozer -B2 -A2

Repository: openshift-eng/art-tools

Length of output: 1140


🏁 Script executed:

# Read the arch_util.py to understand GO_ARCHES and relevant functions
cat -n artcommon/artcommonlib/arch_util.py | head -150

Repository: openshift-eng/art-tools

Length of output: 2418


🏁 Script executed:

# Search for go_arch_for_brew_arch definition
rg -n "def go_arch_for_brew_arch" artcommon

Repository: openshift-eng/art-tools

Length of output: 154


🏁 Script executed:

# Search for go_suffix_for_arch definition
rg -n "def go_suffix_for_arch" artcommon

Repository: openshift-eng/art-tools

Length of output: 173


🏁 Script executed:

# Check what GO_ARCHES contains
rg -n "GO_ARCHES\s*=" artcommon -A 3

Repository: openshift-eng/art-tools

Length of output: 429


Nightly RC helpers: dead code to remove; multi-arch handling is sound; error semantics confirmed

The new rc_api_url centralization and check_nightly_exists implementation are mostly correct, but one issue needs fixing:

  • Dead code branch (lines 882-884): isolate_nightly_name_components always returns a brew architecture—it defaults to x86_64 when no recognized arch is found (line 294). The check if not arch: arch = 'multi' will never execute and should be removed.

  • Multi-arch handling: Verified that 'multi' is properly supported in both GO_ARCHES and BREW_ARCHES, and the routing through go_arch_for_brew_arch and go_suffix_for_arch correctly handles the multi-arch case with no issues.

  • Error handling semantics: The function treats all failures (HTTP errors, JSON parse errors, auth failures) identically as "nightly does not exist" (returns False). This is fine for the current caller in release_gen_payload.py, but if future code needs to distinguish "RC is down / auth failed" from "no such nightly", consider refactoring to a tri-state or custom exception.

🤖 Prompt for AI Agents
In @doozer/doozerlib/util.py around lines 22 - 23, Remove the dead branch that
forces arch to 'multi' after calling isolate_nightly_name_components: the helper
isolate_nightly_name_components(...) always yields a valid brew arch (defaults
to "x86_64"), so the conditional "if not arch: arch = 'multi'" is unreachable;
delete that check from the code path in check_nightly_exists (and any identical
redundant check nearby) so arch flows through to go_arch_for_brew_arch and
go_suffix_for_arch unchanged.

Comment on lines +229 to +249
def get_unpermissable_assembly_issues(self) -> dict[str, dict[any, any]]:
"""
Filters assembly issues with 'permitted: false' from assembly_report.yml.
"""

# path to local assembly_report.yml
file_path = f'{GEN_PAYLOAD_ARTIFACTS_OUT_DIR}/assembly-report.yaml'

# Open and load the YAML file
with open(file_path, 'r') as file:
data = yaml.safe_load(file)

filtered_issues = {'assembly_issues': {}}

if 'assembly_issues' in data:
for component, issues in data['assembly_issues'].items():
filtered_data = [issue for issue in issues if not issue.get('permitted', False)]
if filtered_data: # Add component only if there are valid issues
filtered_issues['assembly_issues'][component] = filtered_data

return filtered_issues
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

get_unpermissable_assembly_issues can return None or crash on empty/missing data

get_unpermissable_assembly_issues currently:

  • Assumes yaml.safe_load(file) returns a dict; if the file is empty it returns None, so if 'assembly_issues' in data: will raise TypeError.
  • Returns filtered_issues only when 'assembly_issues' exists; otherwise it implicitly returns None.

handle_failure then does unpermissable_issues['assembly_issues'], which will raise when data is None or when the key is absent, masking the original build failure.

Consider:

  • Treating an empty/None YAML as {}.
  • Always returning the filtered_issues dict (even when there are no assembly_issues).

For example:

Suggested fix
     file_path = f'{GEN_PAYLOAD_ARTIFACTS_OUT_DIR}/assembly-report.yaml'
 
-    # Open and load the YAML file
-    with open(file_path, 'r') as file:
-        data = yaml.safe_load(file)
-
-    filtered_issues = {'assembly_issues': {}}
-
-    if 'assembly_issues' in data:
+    filtered_issues = {'assembly_issues': {}}
+
+    try:
+        with open(file_path, 'r') as file:
+            data = yaml.safe_load(file) or {}
+    except FileNotFoundError:
+        # No report; treat as no assembly issues
+        return filtered_issues
+
+    if 'assembly_issues' in data:
         for component, issues in data['assembly_issues'].items():
             filtered_data = [issue for issue in issues if not issue.get('permitted', False)]
             if filtered_data:  # Add component only if there are valid issues
                 filtered_issues['assembly_issues'][component] = filtered_data
 
-        return filtered_issues
+    return filtered_issues
🤖 Prompt for AI Agents
In @pyartcd/pyartcd/pipelines/build_sync_multi.py around lines 229 - 249,
get_unpermissable_assembly_issues currently assumes yaml.safe_load(file) returns
a dict and only returns filtered_issues inside the 'assembly_issues' branch,
which can yield TypeError or None; fix by treating the loaded data as data =
yaml.safe_load(file) or {} (i.e. default to {} if None), ensure you always
return filtered_issues at the end of the function (remove the early return
inside the if), and only iterate when data.get('assembly_issues') is truthy so
the function always returns a dict like filtered_issues even when the YAML is
empty or lacks the key; reference symbols: get_unpermissable_assembly_issues,
file_path, filtered_issues, and the 'assembly_issues' key.

Comment on lines +263 to +303
# Increment failure count
current_count = await redis.get_value(self.fail_count_name)
if current_count is None: # does not yet exist in Redis
current_count = 0
fail_count = int(current_count) + 1
self.runtime.logger.info('Failure count for multi-model %s: %s', self.version, fail_count)

# Update fail counter on Redis
await redis.set_value(self.fail_count_name, fail_count)

unpermissable_issues = self.get_unpermissable_assembly_issues()
if unpermissable_issues['assembly_issues']:
report = yaml.safe_dump(unpermissable_issues)
jenkins.update_title(' [UNVIABLE]')
else:
report = "Unknown Failure. Please investigate"
jenkins.update_title(' [FAILURE]')

# Less than 2 failures, assembly != stream: just break the build
if fail_count < 2 or self.assembly != 'stream':
raise

# More than 2 failures: we need to notify ART and #forum-release before breaking the build
if 10 <= fail_count <= 50:
art_notify_frequency = 5
forum_release_notify_frequency = 10

elif 50 <= fail_count <= 200:
art_notify_frequency = 10
forum_release_notify_frequency = 50

elif fail_count > 200:
art_notify_frequency = 100
forum_release_notify_frequency = 100

else:
# Default notify frequency
art_notify_frequency = 2
forum_release_notify_frequency = 5

# Spam ourselves a little more often than forum-ocp-release
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bare raise in handle_failure will always raise RuntimeError instead of propagating the original failure

In handle_failure:

# Less than 2 failures, assembly != stream: just break the build
if fail_count < 2 or self.assembly != 'stream':
    raise

This raise runs in a fresh context (no active exception), so it raises RuntimeError: No active exception to re-raise, masking the real cause of the failure. The outer build_sync_multi catch is already re‑raising the original RuntimeError/ChildProcessError after calling handle_failure, so this extra raise is not needed.

You can simply return here:

-        if fail_count < 2 or self.assembly != 'stream':
-            raise
+        if fail_count < 2 or self.assembly != 'stream':
+            # For non-stream or early failures, just record the failure and let the caller re-raise.
+            return
🤖 Prompt for AI Agents
In @pyartcd/pyartcd/pipelines/build_sync_multi.py around lines 263 - 303, The
bare "raise" in handle_failure (around the fail_count / self.assembly check)
runs with no active exception and will raise RuntimeError; replace that bare
raise with a simple return to stop further notifications and allow the caller to
propagate the original exception (i.e., change the block "if fail_count < 2 or
self.assembly != 'stream': raise" to "if fail_count < 2 or self.assembly !=
'stream': return"), keeping the rest of the logic (fail_count,
get_unpermissable_assembly_issues, notification frequency) unchanged.

Comment on lines +405 to +415
# Only for stream assembly, lock the build to avoid parallel runs
if assembly == 'stream':
lock_identifier = get_build_url().replace(f'{constants.JENKINS_UI_URL}/', '')
runtime.logger.info('Lock identifier: %s', lock_identifier)

await locks.run_with_lock(
coro=pipeline.run(),
lock=lock,
lock_name=lock_name,
lock_id=lock_identifier,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Guard lock_identifier against missing BUILD_URL when acquiring BUILD_SYNC_MULTI lock

lock_identifier = get_build_url().replace(...) assumes BUILD_URL is set. When running outside Jenkins, get_build_url() returns None, so .replace will raise AttributeError and prevent the lock wrapper (and the pipeline) from running at all.

A small guard keeps locking robust:

Suggested fix
-        if assembly == 'stream':
-            lock_identifier = get_build_url().replace(f'{constants.JENKINS_UI_URL}/', '')
-            runtime.logger.info('Lock identifier: %s', lock_identifier)
+        if assembly == 'stream':
+            build_url = get_build_url()
+            if build_url:
+                lock_identifier = build_url.replace(f'{constants.JENKINS_UI_URL}/', '')
+            else:
+                runtime.logger.warning('BUILD_URL is not set; using generic lock identifier')
+                lock_identifier = f'build-sync-multi-{version}'
+            runtime.logger.info('Lock identifier: %s', lock_identifier)

This avoids crashes in non‑Jenkins environments while still keeping a reasonably unique lock id.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @pyartcd/pyartcd/pipelines/build_sync_multi.py around lines 405 - 415, The
code assumes get_build_url() returns a string and calls .replace, which raises
AttributeError when BUILD_URL is missing; update the block that computes
lock_identifier (used in locks.run_with_lock) to handle None: call
get_build_url(), check if it's truthy before doing .replace with
constants.JENKINS_UI_URL, and if falsy use a safe fallback (e.g.,
runtime.run_id, pipeline.id, or a fixed string plus timestamp) so
lock_identifier is always a string and locks.run_with_lock(pipeline.run(), lock,
lock_name, lock_id=lock_identifier) can proceed in non‑Jenkins environments.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 7, 2026

@jupierce: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security ec692af link false /test security

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant