feat(gradio): surface live generation progress across LM, diffusion, and decode by 1larity · Pull Request #954 · ace-step/ACE-Step-1.5

1larity · 2026-03-28T10:48:40Z

Summary

This PR extracts the standalone generation-progress-visibility slice from the original #808 umbrella work. This provides better progress bar tracking of generation processing, long running tasks no longer halt the progress bar until complete, tasks such as token generation and diffusion increment the progress bar as they process to completion. I needed this to correctly integrate the external LM calls into progress for an upcoming step, planning generation during music generation, it turned out to be a nice QOL for users.

It improves live progress across active non-vllm 5Hz LM generation, service-side DiT diffusion, and decode-phase reporting, while preserving the coarse estimator only as a fallback when runtime progress is unavailable.

This PR is intentionally independent of the external-LM work. It is somewhat wider than ideal because useful generation progress crosses runtime, service, diffusion, and decode layers. To keep it reviewable, broader profile/progress follow-up work from the original PR808 stream is explicitly deferred to later PRs.

What Changed

Forwarded service-side diffusion progress callbacks through generation execution:
- acestep/core/generation/handler/service_generate.py
- acestep/core/generation/handler/service_generate_execute.py
- acestep/core/generation/handler/diffusion.py
Relayed runtime diffusion updates safely from the service thread back to the main Gradio progress callback:
- acestep/core/generation/handler/generate_music_execute.py
Surfaced live LM progress for the active non-vllm LM paths:
- acestep/llm_inference.py
Restored terminal DiT tqdm visibility while preserving UI progress updates.
Improved decode-phase reporting and monotonicity:
- acestep/core/generation/handler/generate_music_decode.py
- acestep/core/generation/handler/vae_decode.py
- acestep/core/generation/handler/vae_decode_chunks.py
- acestep/core/generation/handler/mlx_vae_decode_native.py
Wired diffusion progress seams consistently across backends:
- acestep/models/base/modeling_acestep_v15_base.py
- acestep/models/sft/modeling_acestep_v15_base.py
- acestep/models/turbo/modeling_acestep_v15_turbo.py
- acestep/models/mlx/dit_generate.py
Added focused regression coverage for relay behavior, monotonicity, decode visibility, and backend callback parity:
- acestep/core/generation/handler/service_generate_test.py
- acestep/core/generation/handler/service_generate_execute_test.py
- acestep/core/generation/handler/generate_music_execute_test.py
- acestep/core/generation/handler/generate_music_decode_test.py
- acestep/core/generation/handler/vae_decode_mixin_test.py
- acestep/llm_inference_cfg_fixes_test.py
- acestep/models/mlx/dit_generate_test.py
- acestep/models/turbo/modeling_acestep_v15_turbo_progress_test.py

Behavioral Parity

Preserves generation outputs and model-selection behavior; this PR only changes progress reporting paths.
Prefers real runtime progress throughout generation; the legacy coarse estimator remains only as a temporary fallback for paths that still lack explicit runtime progress reporting. Mainly for development purposes, will be deprecated and removed at some point.
Keeps progress monotonic across estimator handoff, runtime diffusion updates, and decode phases.
Restores terminal DiT diffusion tqdm visibility instead of hiding it when UI progress forwarding is active.
Keeps this PR independent of external-LM/provider plumbing so it can be reviewed on its own merits.

Validation

Executed:

Focused unittest suite:
- acestep.core.generation.handler.service_generate_test
- acestep.core.generation.handler.service_generate_execute_test
- acestep.core.generation.handler.generate_music_execute_test
- acestep.core.generation.handler.generate_music_decode_test
- acestep.core.generation.handler.vae_decode_mixin_test
- acestep.llm_inference_cfg_fixes_test
- acestep.models.mlx.dit_generate_test
- acestep.models.turbo.modeling_acestep_v15_turbo_progress_test
- Result: Ran 31 tests ... OK
python3 -m py_compile on touched progress/test modules
- Result: success

Scope / Out of Scope

In scope:

service-side diffusion progress forwarding
live progress for active non-vllm LM generation paths
visible DiT diffusion progress in UI and terminal
decode-phase visibility and decode-progress monotonicity
focused regression tests for progress relay and backend parity

Out of scope:

external-LM/provider setup or runtime work
vllm token-level progress
non-CFG HuggingFace model.generate() token progress
chunk-granular progress for the direct non-tiled PyTorch VAE decode path
CPU decode fallback progress granularity
prompt-quality / CER / provider-integration follow-ups

Reviewer Focus

Service-thread to UI-progress relay correctness in generate_music_execute.py.
Monotonic progress handoff across estimator, runtime diffusion, and decode phases.
Backend parity for base/SFT/turbo/MLX diffusion callback forwarding.
Decode-phase visibility without changing non-progress generation behavior.
Scope discipline: this PR should stay independent of the external-LM stream.

CodeRabbit Scope Guard

Please treat findings as in-scope only when introduced by this progress-visibility patch set.

Pre-existing progress limitations outside the touched relay/decode paths should be marked out-of-scope/non-blocking unless they indicate behavior drift introduced by this PR.
Broader profile/progress follow-ups from the original PR808 stream are intentionally deferred, not omitted by accident.

Summary by CodeRabbit

New Features
- Generation APIs now accept optional runtime progress callbacks; step-level progress is emitted for diffusion, VAE decoding chunks, and LLM/token phases with mapped phase ranges for consistent UI.
Bug Fixes
- Progress is enforced monotonic, runtime progress events are queued/drained and forwarded across fallbacks/OOМ, and estimator shutdown/waiting is more robust.
Tests
- Added/updated tests for progress emission, mapping, monotonicity, forwarding, estimator/thread lifecycle, and fallback behavior.

…d decode

coderabbitai · 2026-03-28T10:48:47Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Threaded a step/chunk-level progress_callback(current_step, total_steps, desc) through generation and decoding: LLM, MLX diffusion, VAE tiled/MLX decode; added runtime progress queueing/draining, monotonic UI emission, phase-range mapping, and estimator stop/join coordination.

Changes

Cohort / File(s)	Summary
Diffusion Core & Run `acestep/models/mlx/dit_generate.py`, `acestep/core/generation/handler/diffusion.py`	Added `progress_callback` parameter to MLX diffusion entrypoints and invoked it each diffusion step with `(step, total, "DiT diffusion...")`; forwarded `disable_tqdm`.
Service Wiring & Execute `acestep/core/generation/handler/service_generate.py`, `acestep/core/generation/handler/service_generate_execute.py`, `acestep/core/generation/handler/service_generate_execute_test.py`, `acestep/core/generation/handler/service_generate_test.py`	Plumbed `progress_callback` through service_generate → build kwargs → execution; introduced `use_progress_bar` and passed `disable_tqdm` into MLX calls; tests updated to assert forwarding.
Generate Music Execute (runtime) `acestep/core/generation/handler/generate_music_execute.py`, `acestep/core/generation/handler/generate_music_execute_test.py`, `acestep/core/generation/handler/generate_music_execute_thread_test.py`	Added runtime progress queue, non-blocking drain, mapped runtime events into UI bands, monotonic clamping wrapper that swallows callback exceptions, and deadline-based polling/join logic to coordinate estimator stop/join; tests for lifecycle and progress behavior added/updated.
VAE Decode Paths `acestep/core/generation/handler/vae_decode.py`, `acestep/core/generation/handler/vae_decode_chunks.py`, `acestep/core/generation/handler/mlx_vae_decode_native.py`	Extended tiled and MLX VAE decode APIs with `progress_callback`; emit per-chunk progress, support batch-offset mapping, and preserve monotonic progress across GPU→CPU OOM fallbacks.
Decode Orchestration `acestep/core/generation/handler/generate_music_decode.py`, `acestep/core/generation/handler/generate_music_decode_test.py`	Mapped chunk progress into UI interval [0.82,0.98], added helper callbacks to remap/emit decode progress, and wired callbacks into tiled/MLX decode call sites; updated tests to validate monotonicity and expected emissions.
VAE Decode Tests / Helpers `acestep/core/generation/handler/vae_decode_chunks_test.py`, `acestep/core/generation/handler/vae_decode_mixin_test.py`, `acestep/core/generation/handler/vae_decode_test_helpers.py`	Updated test stubs to accept/record `progress_callback` and added tests verifying monotonic progress through GPU→CPU fallback and matching signatures.
LLM Generation & Batch Mapping `acestep/llm_inference.py`, `acestep/llm_inference_cfg_fixes_test.py`, `acestep/llm_inference_batch_progress_test.py`	Threaded `progress_callback` into vLLM/PyTorch/MLX generation paths; emit per-token/per-step progress; added `_make_phase_progress_callback` for phase-range mapping and tests for batch/phase progress propagation.
Model-level APIs & Tests `acestep/models/base/...modeling_acestep_v15_base.py`, `acestep/models/sft/...modeling_acestep_v15_base.py`, `acestep/models/turbo/...modeling_acestep_v15_turbo.py`, `acestep/models/turbo/..._progress_test.py`	Exposed `progress_callback` (and `use_progress_bar` in turbo) on `generate_audio`; invoke per-step diffusion progress and added tests asserting per-step emissions.
MLX Diffusion Tests `acestep/models/mlx/dit_generate_test.py`	Added unit test that fakes MLX/decoder and asserts `mlx_generate_diffusion` invokes the supplied `progress_callback` once per diffusion timestep with correct totals and description.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant UI as "UI / progress callback"
participant Orch as "Orchestrator (generate_execute)"
participant Service as "service_generate"
participant Diff as "Diffusion (mlx_generate_diffusion/_mlx_run_diffusion)"
participant VAE as "VAE decode (tiled/_mlx_vae_decode)"

Orch->>Service: call service_generate(..., progress_callback=enqueue)
Service->>Diff: start diffusion (progress_callback=enqueue)
Diff->>Service: enqueue(step, total, "DiT diffusion...")
Service->>Orch: drain & emit mapped progress (monotonic clamp)
Diff->>VAE: on completion call decode(progress_callback=enqueue)
VAE->>Service: enqueue(chunk, total, "Decoding audio chunks...")
Service->>Orch: drain & emit mapped decode progress

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: add model init API, model inventory, and MLX CFG/APG support #703: modifies MLX diffusion entrypoints and runtime hooks—closely related to the progress_callback wiring.
feat: enhanced sampler modes for cleaner audio output #978: updates DiffusionMixin._mlx_run_diffusion signature and forwarding—directly overlaps signature/forwarding changes.
fix(generation): add timeout, progress fallback, and VRAM pre-flight … #671: touches runtime progress/estimator lifecycle and drain/join behavior relevant to generate_music_execute changes.

Suggested reviewers

ChuxiJ
ElWalki

"🐰 I hop through loops and steps with cheer,
Each diffusion pulse I nudge and hear,
Chunks decode and tokens softly hum,
Progress maps, monotonic and drum,
Hooray — callbacks hop steady and clear!"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.81% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main objective: adding live generation progress visibility across LM, diffusion, and decode components.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

acestep/models/sft/modeling_acestep_v15_base.py (1)

1831-1867: 🛠️ Refactor suggestion | 🟠 Major

Add a docstring for the modified generate_audio public API.

Line 1857 introduces a new callback contract, but the function still lacks a docstring that documents key args/behavior.

As per coding guidelines, "Docstrings are mandatory for all modules, classes, and public functions."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@acestep/models/sft/modeling_acestep_v15_base.py` around lines 1831 - 1867,
Add a comprehensive docstring to the public method generate_audio documenting
its purpose, key parameters and types (e.g., text_hidden_states,
lyric_hidden_states, src_latents, seed, infer_method, infer_steps,
diffusion_guidance_sale, audio_cover_strength, precomputed_lm_hints_25Hz,
audio_codes, use_progress_bar, and the new progress_callback callable), return
value(s), side effects, default behaviors (cfg_interval_start/end, use_cache,
use_adg, shift), and the new callback contract signature Callable[[int, int,
str], None] including when it is invoked and what the three args represent;
place the docstring immediately under the def generate_audio(...) line and
follow the project's docstring style (short summary, parameter list with types
and defaults, examples/notes if needed).

acestep/models/base/modeling_acestep_v15_base.py (1)

1831-1867: 🛠️ Refactor suggestion | 🟠 Major

Document the updated generate_audio API contract.

Line 1857 adds progress_callback, but the modified public method still lacks a docstring for parameters and behavior.

As per coding guidelines, "Docstrings are mandatory for all modules, classes, and public functions."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@acestep/models/base/modeling_acestep_v15_base.py` around lines 1831 - 1867,
The public method generate_audio was extended with new parameters (notably
progress_callback) but lacks a docstring; add a clear docstring to
generate_audio describing purpose, all parameters (types, default values,
optionality) including progress_callback: Callable[[int,int,str],None] and
use_progress_bar/use_adg/seed/infer_method/infer_steps and returned value(s),
side effects (e.g., uses seed, may modify src_latents), expected tensor
shapes/dtypes for text_hidden_states/lyric_hidden_states/src_latents/etc., error
cases/exceptions, and any threading/process-safety notes; ensure the docstring
follows the project's style (numpy or Google) and is placed immediately under
the generate_audio def so callers and auto-doc tools pick up the updated API
contract.

🧹 Nitpick comments (4)

acestep/core/generation/handler/diffusion.py (1)

37-37: Type the new progress_callback parameter explicitly.

Line 37 adds a new API seam; annotating it helps static checking and call-site clarity.

♻️ Proposed change

-from typing import Any, Dict, Optional
+from typing import Any, Callable, Dict, Optional
...
-        progress_callback=None,
+        progress_callback: Optional[Callable[[int, int, str], None]] = None,

As per coding guidelines, "Type hints: Add type annotations for new/modified functions when practical."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@acestep/core/generation/handler/diffusion.py` at line 37, The new parameter
progress_callback added to the function in diffusion.py should be explicitly
typed; update the function signature that declares progress_callback (e.g., def
...(..., progress_callback=None)) to use a typing annotation such as
progress_callback: Optional[Callable[..., Any]] = None (or a more specific
Callable signature if you know the callback args), and add the necessary imports
(Optional, Callable, Any) from typing at the top of the file; ensure the
annotation matches how progress_callback is invoked within the function.

acestep/models/turbo/modeling_acestep_v15_turbo.py (1)

1847-1848: Docstring should document the new parameters.

The use_progress_bar and progress_callback parameters were added to generate_audio but the docstring (not shown in the changes) may not document them. This is a minor documentation gap for public API clarity.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@acestep/models/turbo/modeling_acestep_v15_turbo.py` around lines 1847 - 1848,
Update the generate_audio docstring to document the two new parameters: describe
use_progress_bar: bool (controls whether a progress bar is shown) and
progress_callback: Optional[Callable[[int, int, str], None]] (called with
current_step, total_steps, and a status message for each progress update). Add
default values and expected behavior (e.g., progress_callback overrides or
complements the progress bar) and include any thread-safety or concurrency notes
if relevant so callers of generate_audio understand how to supply and use these
parameters.

acestep/core/generation/handler/vae_decode.py (1)

22-22: Consider adding type annotation for consistency.

The progress_callback parameter lacks a type annotation, unlike other files in this PR (e.g., service_generate.py, dit_generate.py). Adding the annotation would improve IDE support and maintain consistency.

♻️ Suggested type annotation

+from typing import Callable, Optional
+
 class VaeDecodeMixin:
     ...
     def tiled_decode(
         self,
         latents,
         chunk_size: Optional[int] = None,
         overlap: int = 64,
         offload_wav_to_cpu: Optional[bool] = None,
-        progress_callback=None,
+        progress_callback: Optional[Callable[[int, int, str], None]] = None,
     ):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@acestep/core/generation/handler/vae_decode.py` at line 22, The
progress_callback parameter in the vae_decode function is missing a type
annotation; add a consistent annotation such as progress_callback:
Optional[Callable[[float], None]] (and import Optional, Callable from typing at
the top if not present) to match other modules like service_generate.py and
dit_generate.py; update the function signature in vae_decode.py (the symbol
progress_callback) and ensure any references/calls still accept an
Optional[Callable[[float], None]].

acestep/models/mlx/dit_generate_test.py (1)

8-8: Consider adding a skip decorator for non-MLX environments.

The direct import of mlx_generate_diffusion at line 8 will fail in environments where mlx is not installed (e.g., non-Apple Silicon machines). Consider wrapping with a conditional skip.

♻️ Suggested skip decorator

 import sys
 import types
 import unittest
 from unittest.mock import patch

 import numpy as np

-from acestep.models.mlx.dit_generate import mlx_generate_diffusion
+try:
+    from acestep.models.mlx.dit_generate import mlx_generate_diffusion
+    _MLX_AVAILABLE = True
+except ImportError:
+    _MLX_AVAILABLE = False
+    mlx_generate_diffusion = None


+@unittest.skipUnless(_MLX_AVAILABLE, "mlx not available")
 class MlxGenerateDiffusionProgressTests(unittest.TestCase):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@acestep/models/mlx/dit_generate_test.py` at line 8, The test imports
mlx_generate_diffusion directly which will fail when the MLX package is absent;
update the test to skip in non-MLX environments by using pytest.importorskip or
a pytest.mark.skip/skipif decorator: either call pytest.importorskip("mlx")
before importing or wrap the import/use of mlx_generate_diffusion (symbol:
mlx_generate_diffusion in acestep.models.mlx.dit_generate) in a conditional skip
so tests are skipped rather than erroring when mlx is not installed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@acestep/core/generation/handler/generate_music_execute.py`:
- Around line 77-86: The progress UI callback can be invoked out-of-order
because the code releases progress_lock before calling progress(); modify
_emit_progress so the progress(clamped, desc=...) call happens while holding
progress_lock (i.e., serialize the UI callback inside the same lock that updates
progress_state["value"]) and still catch/log exceptions from the callback; keep
the clamping logic (max(progress_state["value"], value)), update
progress_state["value"] before calling progress, and ensure the lock is released
only after the callback/exception handling completes to preserve monotonic UI
updates.

In `@acestep/core/generation/handler/service_generate_execute_test.py`:
- Line 82: Replace the lambda assigned to callback with a named function
definition to satisfy Ruff E731: create a function (e.g., progress_callback or
callback) that accepts (current, total, desc) and returns (current, total,
desc), then assign callback to that function (or use it directly) so the inline
lambda on the variable named callback is removed and a proper def is used.

In `@acestep/llm_inference.py`:
- Around line 1413-1418: The LM-phase progress bands created via
progress_callback=_make_phase_progress_callback(...) can remain unclosed if
PT/MLX loops exit early; after each successful metadata-generation and
code-generation phase (the calls using _make_phase_progress_callback at the
shown sites), explicitly invoke the phase progress callback with the terminal
fraction to close the band (e.g., call progress(0.3, "LLM metadata generation
complete") after the metadata phase and progress(0.5, "LLM code generation
complete") after the code generation phase). Locate the places that build
progress_callback via _make_phase_progress_callback and, immediately after the
corresponding PT/MLX/ML loop completes successfully, call the progress function
with the final fraction and a short completion message to guarantee the band
closes.

---

Outside diff comments:
In `@acestep/models/base/modeling_acestep_v15_base.py`:
- Around line 1831-1867: The public method generate_audio was extended with new
parameters (notably progress_callback) but lacks a docstring; add a clear
docstring to generate_audio describing purpose, all parameters (types, default
values, optionality) including progress_callback: Callable[[int,int,str],None]
and use_progress_bar/use_adg/seed/infer_method/infer_steps and returned
value(s), side effects (e.g., uses seed, may modify src_latents), expected
tensor shapes/dtypes for
text_hidden_states/lyric_hidden_states/src_latents/etc., error cases/exceptions,
and any threading/process-safety notes; ensure the docstring follows the
project's style (numpy or Google) and is placed immediately under the
generate_audio def so callers and auto-doc tools pick up the updated API
contract.

In `@acestep/models/sft/modeling_acestep_v15_base.py`:
- Around line 1831-1867: Add a comprehensive docstring to the public method
generate_audio documenting its purpose, key parameters and types (e.g.,
text_hidden_states, lyric_hidden_states, src_latents, seed, infer_method,
infer_steps, diffusion_guidance_sale, audio_cover_strength,
precomputed_lm_hints_25Hz, audio_codes, use_progress_bar, and the new
progress_callback callable), return value(s), side effects, default behaviors
(cfg_interval_start/end, use_cache, use_adg, shift), and the new callback
contract signature Callable[[int, int, str], None] including when it is invoked
and what the three args represent; place the docstring immediately under the def
generate_audio(...) line and follow the project's docstring style (short
summary, parameter list with types and defaults, examples/notes if needed).

---

Nitpick comments:
In `@acestep/core/generation/handler/diffusion.py`:
- Line 37: The new parameter progress_callback added to the function in
diffusion.py should be explicitly typed; update the function signature that
declares progress_callback (e.g., def ...(..., progress_callback=None)) to use a
typing annotation such as progress_callback: Optional[Callable[..., Any]] = None
(or a more specific Callable signature if you know the callback args), and add
the necessary imports (Optional, Callable, Any) from typing at the top of the
file; ensure the annotation matches how progress_callback is invoked within the
function.

In `@acestep/core/generation/handler/vae_decode.py`:
- Line 22: The progress_callback parameter in the vae_decode function is missing
a type annotation; add a consistent annotation such as progress_callback:
Optional[Callable[[float], None]] (and import Optional, Callable from typing at
the top if not present) to match other modules like service_generate.py and
dit_generate.py; update the function signature in vae_decode.py (the symbol
progress_callback) and ensure any references/calls still accept an
Optional[Callable[[float], None]].

In `@acestep/models/mlx/dit_generate_test.py`:
- Line 8: The test imports mlx_generate_diffusion directly which will fail when
the MLX package is absent; update the test to skip in non-MLX environments by
using pytest.importorskip or a pytest.mark.skip/skipif decorator: either call
pytest.importorskip("mlx") before importing or wrap the import/use of
mlx_generate_diffusion (symbol: mlx_generate_diffusion in
acestep.models.mlx.dit_generate) in a conditional skip so tests are skipped
rather than erroring when mlx is not installed.

In `@acestep/models/turbo/modeling_acestep_v15_turbo.py`:
- Around line 1847-1848: Update the generate_audio docstring to document the two
new parameters: describe use_progress_bar: bool (controls whether a progress bar
is shown) and progress_callback: Optional[Callable[[int, int, str], None]]
(called with current_step, total_steps, and a status message for each progress
update). Add default values and expected behavior (e.g., progress_callback
overrides or complements the progress bar) and include any thread-safety or
concurrency notes if relevant so callers of generate_audio understand how to
supply and use these parameters.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3fd82acf-ef83-4b87-8176-c13b518b9440

📥 Commits

Reviewing files that changed from the base of the PR and between 946ef47 and 6216131.

📒 Files selected for processing (22)

acestep/core/generation/handler/diffusion.py
acestep/core/generation/handler/generate_music_decode.py
acestep/core/generation/handler/generate_music_decode_test.py
acestep/core/generation/handler/generate_music_execute.py
acestep/core/generation/handler/generate_music_execute_test.py
acestep/core/generation/handler/mlx_vae_decode_native.py
acestep/core/generation/handler/service_generate.py
acestep/core/generation/handler/service_generate_execute.py
acestep/core/generation/handler/service_generate_execute_test.py
acestep/core/generation/handler/service_generate_test.py
acestep/core/generation/handler/vae_decode.py
acestep/core/generation/handler/vae_decode_chunks.py
acestep/core/generation/handler/vae_decode_mixin_test.py
acestep/core/generation/handler/vae_decode_test_helpers.py
acestep/llm_inference.py
acestep/llm_inference_cfg_fixes_test.py
acestep/models/base/modeling_acestep_v15_base.py
acestep/models/mlx/dit_generate.py
acestep/models/mlx/dit_generate_test.py
acestep/models/sft/modeling_acestep_v15_base.py
acestep/models/turbo/modeling_acestep_v15_turbo.py
acestep/models/turbo/modeling_acestep_v15_turbo_progress_test.py

acestep/core/generation/handler/generate_music_execute.py

acestep/core/generation/handler/service_generate_execute_test.py

acestep/core/generation/handler/vae_decode_chunks.py

acestep/llm_inference.py

1larity · 2026-03-28T11:27:03Z

@CodeRabbit I left the outside-diff docstring comments alone: They are not significant enough severity to expand the PR needlessly.

modeling_acestep_v15_base.py

coderabbitai · 2026-03-28T11:27:23Z

Tip

For best results, initiate chat on the files or code changes.

@1larity Understood — keeping the PR focused is the right call. I'll note this preference for future reviews.

(^ᴗ^ )ฅ

✏️ Learnings added

Learnt from: 1larity
URL: https://github.com/ace-step/ACE-Step-1.5/pull/954

Timestamp: 2026-03-28T11:27:22.400Z
Learning: In ace-step/ACE-Step-1.5, for `acestep/models/base/modeling_acestep_v15_base.py` and `acestep/models/sft/modeling_acestep_v15_base.py`, do not flag outside-diff docstring gaps as blocking or significant issues during PR review. The maintainer (1larity) considers them insufficient severity to justify expanding PR scope and prefers to leave them unaddressed rather than widen a focused PR.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: 1larity
Repo: ace-step/ACE-Step-1.5 PR: 423
File: CONTRIBUTING.md:1-1
Timestamp: 2026-02-10T23:22:35.192Z
Learning: In the ace-step/ACE-Step-1.5 repository, avoid flagging minor hyphenation issues in documentation files (like compound modifiers such as "easy to understand"). The maintainer prefers not to be overly pedantic about such minor grammar points.

Learnt from: 1larity
Repo: ace-step/ACE-Step-1.5 PR: 769
File: acestep/api/train_api_dataset_auto_label_async_route.py:158-169
Timestamp: 2026-03-04T22:40:44.930Z
Learning: In ace-step/ACE-Step-1.5, the `builder.label_all_samples(...)` call in both `train_api_dataset_auto_label_async_route.py` and `train_api_dataset_auto_label_sync_route.py` passes `chunk_size`, `batch_size`, and `sample_labeled_callback` kwargs that are not declared in the current `label_all_samples` signature in `acestep/training/dataset_builder_modules/label_all.py`. This is a pre-existing issue carried forward unchanged from the pre-decomposition `train_api_dataset_service.py` and is deferred to a follow-up PR.

Learnt from: 1larity
Repo: ace-step/ACE-Step-1.5 PR: 811
File: docs/en/GRADIO_GUIDE.md:0-0
Timestamp: 2026-03-11T12:03:00.308Z
Learning: In ace-step/ACE-Step-1.5 GRADIO_GUIDE.md, the "How It Integrates With 5Hz LM" section explicitly notes that when external mode is active, some LM-assisted workflows (notably Create Sample) still check `llm_handler.llm_initialized` in both the Gradio handler path and `create_sample()`, and therefore may still require local 5Hz LM initialization until the remaining routing is completed. The doc describes the external-mode paths as "intended behavior" rather than currently fully functional. Do not flag this remaining initialization requirement as a new bug in documentation reviews.

Learnt from: 1larity
Repo: ace-step/ACE-Step-1.5 PR: 774
File: acestep/ui/gradio/interfaces/training_dataset_tab_label_preview.py:70-95
Timestamp: 2026-03-05T23:12:28.095Z
Learning: In ace-step/ACE-Step-1.5, the hardcoded English placeholder strings in `acestep/ui/gradio/interfaces/training_dataset_tab_label_preview.py` (e.g., `"Music description..."` for `edit_caption`, `"pop, electronic, dance..."` for `edit_genre`, `Dropdown` choices `["Use Global Ratio", "Caption", "Genre"]` and its default value, and the lyrics placeholder `"[Verse 1]\nLyrics here...\n\n[Chorus]\n..."`) are pre-existing localization gaps carried forward from before the decomposition refactor. They are deferred to a follow-up PR and should not be flagged as new issues introduced by decomposition PRs.

Learnt from: 1larity
Repo: ace-step/ACE-Step-1.5 PR: 423
File: CONTRIBUTING.md:9-9
Timestamp: 2026-02-10T23:25:46.957Z
Learning: In the ace-step/ACE-Step-1.5 repository, for documentation files like CONTRIBUTING.md, only flag significant issues like spelling errors (e.g., "amature" → "amateur"). Avoid flagging minor hyphenation (e.g., "semi professional" → "semi-professional"), style suggestions (e.g., "kind of" → more formal alternatives), or minor grammar adjustments (e.g., "none technical" → "non-technical"). The maintainer prefers minimal, non-pedantic corrections in documentation.

Learnt from: tonyjohnvan
Repo: ace-step/ACE-Step-1.5 PR: 858
File: acestep/customized_vllm/__init__.py:1-406
Timestamp: 2026-03-17T21:14:13.573Z
Learning: In `acestep/customized_vllm/__init__.py`, the author (tonyjohnvan) intentionally keeps all inference engine internals (ForwardState, thread-local context, GenerationSlot, CachePool, _EngineConfig, LLM, generation loop) in a single ~400 LOC file with clear section comments. They consider splitting into sub-modules to add import complexity for no functional benefit. Do not raise module-size refactor suggestions for this file.

Learnt from: 1larity
Repo: ace-step/ACE-Step-1.5 PR: 0
File: :0-0
Timestamp: 2026-02-27T16:18:08.317Z
Learning: During refactoring PRs in the ACE-Step API, preserve existing API behavior and response contracts to avoid breaking existing clients, even when pre-existing bugs or design issues are identified. Technical debt should be documented for future work rather than fixed in-place during decomposition.

Learnt from: 1larity
Repo: ace-step/ACE-Step-1.5 PR: 774
File: acestep/ui/gradio/interfaces/training.py:40-47
Timestamp: 2026-03-05T23:13:27.540Z
Learning: In `acestep/ui/gradio/interfaces/training.py`, the hardcoded English strings in the top-level `gr.HTML(...)` header block (i.e., `<h2>🎵 LoRA Training for ACE-Step</h2>` and its `<p>` subtitle) are a pre-existing localization gap carried forward unchanged from before the decomposition refactor. They are out of scope for decomposition PRs and should not be flagged as new i18n violations during code review.

Learnt from: CR
Repo: ace-step/ACE-Step-1.5 PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T01:31:25.142Z
Learning: Applies to acestep/gpu_config.py : Changes to CUDA code should not affect MPS/XPU/CPU code paths

Learnt from: tonyjohnvan
Repo: ace-step/ACE-Step-1.5 PR: 858
File: acestep/customized_vllm/pipeline.py:1-423
Timestamp: 2026-03-17T21:17:04.611Z
Learning: In `acestep/customized_vllm/pipeline.py`, the author (tonyjohnvan) intentionally keeps all inference pipeline internals (sampling helpers _filter_by_top_k/_filter_by_nucleus/sample_tokens, model init, KV cache provisioning _provision_kv_storage, transfer buffers _init_transfer_buffers, prefill/decode execution _execute_prefill/_execute_autoregressive/_forward_pass, and CUDA graph capture _compile_execution_graphs) in a single ~420 LOC file with clear section comments. They consider splitting into sub-modules to add import complexity for no functional benefit. Do not raise module-size refactor suggestions for this file.

Learnt from: 1larity
Repo: ace-step/ACE-Step-1.5 PR: 774
File: acestep/ui/gradio/interfaces/training_lora_tab_run_export.py:91-95
Timestamp: 2026-03-05T23:13:00.508Z
Learning: In `acestep/ui/gradio/interfaces/training_lora_tab_run_export.py`, the hardcoded English strings for `resume_checkpoint_dir` (`label="Resume Checkpoint"` and `info="Directory of a saved LoRA checkpoint to resume from"`) are a pre-existing localization gap carried forward and are out of scope for decomposition PRs. Do not flag these as new i18n violations during code review.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

acestep/llm_inference.py (2)

1438-1445: ⚠️ Potential issue | 🟠 Major

Handle the skipped-CoT branch without assuming user_metadata exists.

This branch also runs when use_cot_metas=False. With the default user_metadata=None, Line 1444 raises before generation starts, and if this path is taken for infer_type == "dit" the caller returns with progress still stuck at 0.1.

🐛 Proposed fix

         else:
             # Use user-provided metadata
             if is_batch:
                 logger.info("Batch Phase 1: Using user-provided metadata (skipping generation)")
             else:
                 logger.info("Phase 1: Using user-provided metadata (skipping generation)")
-            metadata = {k: v for k, v in user_metadata.items() if v is not None}
+            metadata = {k: v for k, v in (user_metadata or {}).items() if v is not None}
+            progress(0.3, "LLM metadata generation complete")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@acestep/llm_inference.py` around lines 1438 - 1445, The skipped-CoT branch
assumes user_metadata exists which can be None when use_cot_metas is False;
update the else branch that logs "Using user-provided metadata (skipping
generation)" to safely handle a None user_metadata by treating it as an empty
dict (e.g., set metadata = {} when user_metadata is None) so metadata = {k: v
for k, v in (user_metadata or {}).items() if v is not None}; ensure this change
is applied where the variables is_batch, use_cot_metas, and user_metadata are
used so downstream callers (including infer_type == "dit") don't error or leave
progress stuck.

1179-1212: ⚠️ Potential issue | 🟠 Major

Close each sequential batch item's progress before starting the next one.

_run_pt_single() and _run_mlx_single() can stop on EOS long before current == total. When that happens, the first callback from the next item jumps straight to index * total + 1, so the batch bar can skip a large chunk of the previous item and look much farther along than it is.

🧭 Suggested pattern

-            with self._load_model_context():
+            with self._load_model_context():
+                batch_total = len(formatted_prompt_list)
                 for i, formatted_prompt in enumerate(formatted_prompt_list):
                     # Set seed for this item if provided
                     ...

-                    item_progress_callback = None
+                    item_progress_callback = None
+                    item_progress_state = {"current": 0, "total": 0, "desc": ""}
                     if progress_callback is not None:
-                        def _item_progress(current, total, desc, index=i, batch_total=len(formatted_prompt_list)):
+                        def _item_progress(current, total, desc, index=i):
+                            item_progress_state.update(
+                                current=current,
+                                total=total,
+                                desc=desc,
+                            )
                             progress_callback(index * total + current, batch_total * total, desc)

                         item_progress_callback = _item_progress

                     output_text = self._run_pt_single(
                         ...
                         progress_callback=item_progress_callback,
                     )

                     output_texts.append(output_text)
+                    if (
+                        progress_callback is not None
+                        and item_progress_state["total"] > 0
+                        and item_progress_state["current"] < item_progress_state["total"]
+                    ):
+                        progress_callback(
+                            (i + 1) * item_progress_state["total"],
+                            batch_total * item_progress_state["total"],
+                            item_progress_state["desc"],
+                        )

Apply the same stateful wrapper in _run_mlx()'s sequential fallback.

Also applies to: 4070-4100

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@acestep/llm_inference.py` around lines 1179 - 1212, The per-item progress
callback can leave the batch bar mid-item if a previous
_run_pt_single/_run_mlx_single ended early; before starting each new sequential
item call progress_callback(index * total, batch_total * total, desc) to "close"
the previous item's progress and then pass a stateful wrapper as
item_progress_callback that increments from that base (use the same pattern used
elsewhere for _run_mlx()). Update the code around formatted_prompt_list loop
where item_progress_callback is created (referencing progress_callback, index=i,
batch_total=len(formatted_prompt_list), _run_pt_single, and _run_mlx_single) to
first advance the global progress to the start-of-item, then wrap the incoming
per-item callbacks so they add the base offset when invoked; apply the same
change to the other sequential fallback block (the region referenced around
lines 4070-4100).

acestep/core/generation/handler/generate_music_execute_test.py (1)

67-255: 🛠️ Refactor suggestion | 🟠 Major

Extract the repeated service invocation into a shared test helper.

Lines 67-255 repeat the same long service_inputs payload and _run_generate_music_service_with_progress(...) call across four cases. The file is already over the 200-LOC cap, so every signature tweak now has several copies to keep in sync. Pull the common setup into a helper/fixture, or move the async estimator case into a sibling test module.

As per coding guidelines, "Target module size: optimal <= 150 LOC, hard cap 200 LOC." and "Keep functions focused and short; extract helpers instead of nesting complexity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/core/generation/handler/generate_music_execute_test.py` around lines
67 - 255, Extract the repeated service_inputs dict and the repeated
_run_generate_music_service_with_progress(...) invocation into a shared test
helper (e.g., create a factory function make_service_inputs() that returns the
dict and a helper run_generate_with_defaults(progress, host_overrides=None,
infer_method="ode") that calls host._run_generate_music_service_with_progress
with the common parameters), update the four tests to call these helpers, and
move the _AsyncEstimatorHost-based test to a separate test module if needed to
keep the file under the 200-LOC cap; reference
_run_generate_music_service_with_progress and
_start_diffusion_progress_estimator/_AsyncEstimatorHost when locating code to
change.

🧹 Nitpick comments (1)

acestep/llm_inference.py (1)

1277-1323: Document the two progress callback contracts.

generate_with_stop_condition() now takes a Gradio-style progress(fraction, desc) callback, while generate_from_formatted_prompt() forwards a (current, total, desc) callback. Those are part of the callable contract now, but the modified docstrings still omit both parameters.

📝 Minimal docstring/type-hint update

     def generate_with_stop_condition(
         self,
         ...
-        progress=None,
+        progress: Optional[Callable[[float, str], None]] = None,
     ) -> Dict[str, Any]:
         """Two-phase LM generation: CoT generation followed by audio codes generation.
         ...
             seeds: Optional list of seeds for batch generation (for reproducibility).
                   Only used when batch_size > 1. TODO: not used yet
+            progress: Optional callback invoked as `progress(fraction, description)`.

     def generate_from_formatted_prompt(
         self,
         ...
         stop_at_reasoning: bool = False,
         progress_callback: Optional[Callable[[int, int, str], None]] = None,
     ) -> Tuple[str, str]:
         """
         Generate raw LM text output from a pre-built formatted prompt.
         ...
             constrained_decoding_debug: Whether to enable debug logging for constrained decoding
             stop_at_reasoning: If True, stop generation immediately after </think> tag (no audio codes)
+            progress_callback: Optional callback invoked as
+                `(current_step, total_steps, description)` during generation.

As per coding guidelines, "Docstrings are mandatory for all new/modified Python modules, classes, and functions. Docstrings must be concise and include purpose plus key inputs/outputs and raised exceptions when relevant."

Also applies to: 2364-2395

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@acestep/llm_inference.py` around lines 1277 - 1323, Update the docstrings for
generate_with_stop_condition and generate_from_formatted_prompt to document the
two different progress callback contracts: (1) the Gradio-style
progress(fraction: float, desc: str) used by generate_with_stop_condition and
(2) the (current: int, total: int, desc: str) progress callback forwarded by
generate_from_formatted_prompt; include types, meaning of parameters, when each
is called, and how callers should implement them (e.g., fraction in [0,1] vs
current/total integers), and add these parameters to the function signature docs
(inputs) and returned/side-effect behavior in the Returns section for both
functions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@acestep/core/generation/handler/generate_music_execute.py`:
- Around line 163-169: The code clears progress_thread immediately after a timed
join, losing the handle if the thread didn’t actually stop; modify the blocks
that set progress_thread = None (the branch guarded by
drained_runtime_progress/saw_runtime_progress and the later analogous branch) to
only set progress_thread to None after confirming the thread has terminated
(e.g., call progress_thread.join(timeout=1.0) then check if not
progress_thread.is_alive() before clearing it, and if it is still alive keep the
reference so final cleanup can join again), ensure stop_event is still set as
before and preserve the progress_thread variable until you can guarantee the
thread is finished.

---

Outside diff comments:
In `@acestep/core/generation/handler/generate_music_execute_test.py`:
- Around line 67-255: Extract the repeated service_inputs dict and the repeated
_run_generate_music_service_with_progress(...) invocation into a shared test
helper (e.g., create a factory function make_service_inputs() that returns the
dict and a helper run_generate_with_defaults(progress, host_overrides=None,
infer_method="ode") that calls host._run_generate_music_service_with_progress
with the common parameters), update the four tests to call these helpers, and
move the _AsyncEstimatorHost-based test to a separate test module if needed to
keep the file under the 200-LOC cap; reference
_run_generate_music_service_with_progress and
_start_diffusion_progress_estimator/_AsyncEstimatorHost when locating code to
change.

In `@acestep/llm_inference.py`:
- Around line 1438-1445: The skipped-CoT branch assumes user_metadata exists
which can be None when use_cot_metas is False; update the else branch that logs
"Using user-provided metadata (skipping generation)" to safely handle a None
user_metadata by treating it as an empty dict (e.g., set metadata = {} when
user_metadata is None) so metadata = {k: v for k, v in (user_metadata or
{}).items() if v is not None}; ensure this change is applied where the variables
is_batch, use_cot_metas, and user_metadata are used so downstream callers
(including infer_type == "dit") don't error or leave progress stuck.
- Around line 1179-1212: The per-item progress callback can leave the batch bar
mid-item if a previous _run_pt_single/_run_mlx_single ended early; before
starting each new sequential item call progress_callback(index * total,
batch_total * total, desc) to "close" the previous item's progress and then pass
a stateful wrapper as item_progress_callback that increments from that base (use
the same pattern used elsewhere for _run_mlx()). Update the code around
formatted_prompt_list loop where item_progress_callback is created (referencing
progress_callback, index=i, batch_total=len(formatted_prompt_list),
_run_pt_single, and _run_mlx_single) to first advance the global progress to the
start-of-item, then wrap the incoming per-item callbacks so they add the base
offset when invoked; apply the same change to the other sequential fallback
block (the region referenced around lines 4070-4100).

---

Nitpick comments:
In `@acestep/llm_inference.py`:
- Around line 1277-1323: Update the docstrings for generate_with_stop_condition
and generate_from_formatted_prompt to document the two different progress
callback contracts: (1) the Gradio-style progress(fraction: float, desc: str)
used by generate_with_stop_condition and (2) the (current: int, total: int,
desc: str) progress callback forwarded by generate_from_formatted_prompt;
include types, meaning of parameters, when each is called, and how callers
should implement them (e.g., fraction in [0,1] vs current/total integers), and
add these parameters to the function signature docs (inputs) and
returned/side-effect behavior in the Returns section for both functions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0fe5cfb8-7bd4-4759-ac6c-809202c128f6

📥 Commits

Reviewing files that changed from the base of the PR and between 6216131 and 0bb0095.

📒 Files selected for processing (7)

acestep/core/generation/handler/generate_music_execute.py
acestep/core/generation/handler/generate_music_execute_test.py
acestep/core/generation/handler/service_generate_execute_test.py
acestep/core/generation/handler/vae_decode_chunks.py
acestep/core/generation/handler/vae_decode_chunks_test.py
acestep/llm_inference.py
acestep/llm_inference_cfg_fixes_test.py

🚧 Files skipped from review as they are similar to previous changes (2)

acestep/llm_inference_cfg_fixes_test.py
acestep/core/generation/handler/vae_decode_chunks.py

acestep/core/generation/handler/generate_music_execute.py

1larity · 2026-03-28T11:42:33Z

Addressed the sequential batch-item progress concern in commit 014e99e. _run_pt() and the MLX sequential fallback now explicitly close an item's progress band when generation stops early on EOS, so the next item no longer inherits a partially open bar. I added regression coverage in llm_inference_batch_progress_test.py for both paths.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

acestep/core/generation/handler/generate_music_execute.py (1)

42-215: Module length is over the repository hard cap; split or document a follow-up split plan.

This file now extends past 200 lines. Please either extract one responsibility slice (for example, runtime progress relay helpers) or add a short PR-note justification with a concrete follow-up split plan.

As per coding guidelines, "PR Readiness: Module LOC policy is met (<=150 target, <=200 hard cap or justified exception)." Based on learnings: "only raise module-size concerns when a file exceeds 200 lines of code (LOC)."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/core/generation/handler/generate_music_execute.py` around lines 42 -
215, The file exceeds the repository hard cap; either split out the runtime
progress relay logic or add a PR note with a concrete split plan. Extract the
progress-estimation/relay responsibilities (the helper functions and state used
by _run_generate_music_service_with_progress) into a new module/class — e.g.
move logic around _start_diffusion_progress_estimator,
_drain_runtime_progress_events, progress_events queue and
_stop_progress_estimator_if_finished into a new helper (ProgressEstimator or
progress_relay.py) and have _run_generate_music_service_with_progress call that
API; or, if you prefer not to split now, add a short PR description in this PR
that documents which slices will be extracted (progress estimator, service
wrapper, and timeout handling) and when, and justify the exception for this file
size.

acestep/llm_inference.py (2)

1183-1185: Bind the per-item state into these loop callbacks.

Ruff B023 is right to flag both closures: they capture the loop-local item_progress_state by reference. The callbacks are synchronous today, but binding state=item_progress_state makes each callback self-contained and avoids the late-binding footgun.

♻️ Suggested change

-                    def _item_progress(current, total, desc, index=i):
-                        item_progress_state.update(current=current, total=total, desc=desc)
+                    def _item_progress(current, total, desc, index=i, state=item_progress_state):
+                        state.update(current=current, total=total, desc=desc)
                         progress_callback(index * total + current, batch_total * total, desc)

-                    def _item_progress(current, total, desc, index=i, batch_total=batch_size):
-                        item_progress_state.update(current=current, total=total, desc=desc)
+                    def _item_progress(
+                        current, total, desc, index=i, batch_total=batch_size, state=item_progress_state
+                    ):
+                        state.update(current=current, total=total, desc=desc)
                         progress_callback(index * total + current, batch_total * total, desc)

Also applies to: 4086-4088

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@acestep/llm_inference.py` around lines 1183 - 1185, The closure
_item_progress captures the loop-local item_progress_state by reference causing
late-binding; update the callback signatures to bind the state into the closure
(e.g., def _item_progress(current, total, desc, index=i,
state=item_progress_state):) and replace references to item_progress_state
inside with state, and do the same binding change for the other callbacks noted
(lines ~4086-4088) so each callback closes over its own state rather than the
loop variable.

2377-2408: Document the new progress_callback contract on the public entrypoint.

LLMHandler.generate_from_formatted_prompt() now exposes a new callback, but its docstring still describes the old argument set. Please add the callback shape and how callers should interpret current/total across backends/batch modes. As per coding guidelines, "Docstrings are mandatory for all new or modified Python modules, classes, and functions. Docstrings must be concise and include purpose plus key inputs/outputs and raised exceptions when relevant."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@acestep/llm_inference.py` around lines 2377 - 2408, The docstring for
generate_from_formatted_prompt is missing documentation for the new
progress_callback parameter; update the function docstring to describe the
callback signature (Callable[[int, int, str], None]), explain what each argument
means (current: number of items processed so far, total: total items to process
or -1 if unknown, message: short status string), and clarify semantic guarantees
across backends/batch modes (callbacks are called with per-request progress
where total is the batch size for batched/backends that report totals, use -1
when total is not available, and current increments monotonically from 0 to
total), also note when it will be invoked (e.g., at start/end and on each token
or item depending on backend) and update the Args section to include this new
contract under progress_callback for callers to interpret current/total
consistently.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@acestep/llm_inference.py`:
- Around line 1447-1450: The Phase 1 progress band isn't closed when metadata
generation is skipped (e.g., when has_all_metas is true or use_cot_metas is
false), so ensure progress(0.3, "LLM metadata generation complete") is called
for both branches; update the block around the logger.info calls (the else
branch that runs when metadata is skipped) to also call progress(0.3, ...) or
move the progress call outside the if/else so
LLMHandler.generate_with_stop_condition() always emits the Phase 1 completion
update regardless of has_all_metas/use_cot_metas.

---

Nitpick comments:
In `@acestep/core/generation/handler/generate_music_execute.py`:
- Around line 42-215: The file exceeds the repository hard cap; either split out
the runtime progress relay logic or add a PR note with a concrete split plan.
Extract the progress-estimation/relay responsibilities (the helper functions and
state used by _run_generate_music_service_with_progress) into a new module/class
— e.g. move logic around _start_diffusion_progress_estimator,
_drain_runtime_progress_events, progress_events queue and
_stop_progress_estimator_if_finished into a new helper (ProgressEstimator or
progress_relay.py) and have _run_generate_music_service_with_progress call that
API; or, if you prefer not to split now, add a short PR description in this PR
that documents which slices will be extracted (progress estimator, service
wrapper, and timeout handling) and when, and justify the exception for this file
size.

In `@acestep/llm_inference.py`:
- Around line 1183-1185: The closure _item_progress captures the loop-local
item_progress_state by reference causing late-binding; update the callback
signatures to bind the state into the closure (e.g., def _item_progress(current,
total, desc, index=i, state=item_progress_state):) and replace references to
item_progress_state inside with state, and do the same binding change for the
other callbacks noted (lines ~4086-4088) so each callback closes over its own
state rather than the loop variable.
- Around line 2377-2408: The docstring for generate_from_formatted_prompt is
missing documentation for the new progress_callback parameter; update the
function docstring to describe the callback signature (Callable[[int, int, str],
None]), explain what each argument means (current: number of items processed so
far, total: total items to process or -1 if unknown, message: short status
string), and clarify semantic guarantees across backends/batch modes (callbacks
are called with per-request progress where total is the batch size for
batched/backends that report totals, use -1 when total is not available, and
current increments monotonically from 0 to total), also note when it will be
invoked (e.g., at start/end and on each token or item depending on backend) and
update the Args section to include this new contract under progress_callback for
callers to interpret current/total consistently.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d0e67b11-76b5-4fa6-b586-c875f044b508

📥 Commits

Reviewing files that changed from the base of the PR and between 0bb0095 and 014e99e.

📒 Files selected for processing (4)

acestep/core/generation/handler/generate_music_execute.py
acestep/core/generation/handler/generate_music_execute_thread_test.py
acestep/llm_inference.py
acestep/llm_inference_batch_progress_test.py

acestep/llm_inference.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

acestep/core/generation/handler/generate_music_execute.py (1)
20-221: This module is now past the 200 LOC cap.

The added queue/drain/emit orchestration pushes this file to 221 lines. Please either extract the runtime-progress relay into a small helper module or add a concrete follow-up split plan in the PR notes.

As per coding guidelines, "If a module would exceed 200 LOC, split by responsibility before merging, or add a short justification in PR notes with a concrete follow-up split plan."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/core/generation/handler/generate_music_execute.py` around lines 20 -
221, The file exceeds the 200 LOC cap because the runtime progress relay logic
was inlined; extract that responsibility into a small helper (or add a follow-up
split plan in PR notes). Move the queue/drain/emit orchestration (the
_drain_runtime_progress_events function, the progress_events queue, and the
progress emitting/locking logic used by
_run_generate_music_service_with_progress) into a new helper module/class (e.g.,
RuntimeProgressRelay with methods drain(), emit_progress(),
start_estimator_stop()/stop()). Update _run_generate_music_service_with_progress
to instantiate and use this helper (replace direct uses of
_drain_runtime_progress_events, progress_events, and _emit_progress with the
helper's API) so the handler file stays under 200 LOC; alternatively, if you
choose not to split now, add a concise PR note describing the exact refactor
steps and target module/class names for a follow-up split.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@acestep/core/generation/handler/generate_music_execute.py`:
- Around line 76-77: The wrapper can exit while gen_thread is still running and
continues enqueuing into progress_events, so in the wrapper's cleanup/finally
you must disable the runtime relay and prevent further enqueues: when exiting
the function that creates progress_events and spawns gen_thread (references:
progress_events, progress_state, gen_thread, service_generate()), call the
runtime relay shutdown/disable API (or set the relay flag) and push a
sentinel/close signal to progress_events, then refrain from draining further
items (or join/stop gen_thread if possible) so the background generation cannot
keep writing into an orphaned queue; ensure this disable/cleanup is executed in
every exit path (including the timeout branch).

---

Nitpick comments:
In `@acestep/core/generation/handler/generate_music_execute.py`:
- Around line 20-221: The file exceeds the 200 LOC cap because the runtime
progress relay logic was inlined; extract that responsibility into a small
helper (or add a follow-up split plan in PR notes). Move the queue/drain/emit
orchestration (the _drain_runtime_progress_events function, the progress_events
queue, and the progress emitting/locking logic used by
_run_generate_music_service_with_progress) into a new helper module/class (e.g.,
RuntimeProgressRelay with methods drain(), emit_progress(),
start_estimator_stop()/stop()). Update _run_generate_music_service_with_progress
to instantiate and use this helper (replace direct uses of
_drain_runtime_progress_events, progress_events, and _emit_progress with the
helper's API) so the handler file stays under 200 LOC; alternatively, if you
choose not to split now, add a concise PR note describing the exact refactor
steps and target module/class names for a follow-up split.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 278fd4d2-d73b-4c5d-8bdc-323b60b2a664

📥 Commits

Reviewing files that changed from the base of the PR and between 41c318e and e474cf8.

📒 Files selected for processing (5)

acestep/core/generation/handler/diffusion.py
acestep/core/generation/handler/generate_music_execute.py
acestep/core/generation/handler/service_generate.py
acestep/core/generation/handler/service_generate_execute.py
acestep/models/mlx/dit_generate.py

🚧 Files skipped from review as they are similar to previous changes (4)

acestep/models/mlx/dit_generate.py
acestep/core/generation/handler/service_generate.py
acestep/core/generation/handler/diffusion.py
acestep/core/generation/handler/service_generate_execute.py

acestep/core/generation/handler/generate_music_execute.py

1larity added 4 commits March 28, 2026 09:50

feat(gradio): surface live generation progress across lm diffusion an…

f074481

…d decode

fix(gradio): expose decode phase in generation progress

23e90c0

fix(gradio): keep decode progress monotonic

966c482

test(gradio): lock in diffusion progress callback parity

6216131

1larity marked this pull request as ready for review March 28, 2026 10:57

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

fix(gradio): tighten progress callback monotonicity

0bb0095

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

acestep/core/generation/handler/generate_music_execute.py Outdated Show resolved Hide resolved

fix(gradio): tighten generation progress handoff

014e99e

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

acestep/llm_inference.py Show resolved Hide resolved

1larity and others added 4 commits March 28, 2026 19:55

Merge branch 'main' into pr808/progress-visibility-generation-ux

9e3e0cc

Merge branch 'main' into pr808/progress-visibility-generation-ux

41c318e

Merge branch 'main' into pr808/progress-visibility-generation-ux

a5ba8c8

Merge upstream/main into pr808/progress-visibility-generation-ux

e474cf8

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

acestep/core/generation/handler/generate_music_execute.py Outdated Show resolved Hide resolved

1larity added 2 commits April 1, 2026 14:01

Merge upstream/main into pr808/progress-visibility-generation-ux

3574c33

Merge upstream/main into pr808/progress-visibility-generation-ux

db06dc0

Conversation

1larity commented Mar 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Behavioral Parity

Validation

Scope / Out of Scope

Reviewer Focus

CodeRabbit Scope Guard

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

1larity commented Mar 28, 2026

Uh oh!

coderabbitai bot commented Mar 28, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

1larity commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1larity commented Mar 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 28, 2026 •

edited

Loading

1larity commented Mar 28, 2026 •

edited

Loading