DSE reporting by podkidyshev · Pull Request #846 · NVIDIA/cloudai

podkidyshev · 2026-03-24T22:20:34Z

Summary

Make DSE reporting rich

Test Plan

Automated CI
Manual generate-report runs on a sample DSE run

Additional Notes

Requires a minor change in CloudAIx (load_system_config)
Closes CLOUDAI-24

nixl_bench_ucx_dse_h100-dse-report.html

coderabbitai · 2026-03-24T22:20:49Z

📝 Walkthrough

Walkthrough

Adds DSE reporting: new DSE report generator module, DSEReporter and template, unified system-metadata loader, reporter changes to include node info and emit per-iteration best-config TOML, registration/tests updated, and a minor pyproject import-linter tweak.

Changes

Cohort / File(s)	Summary
Configuration & Registration `pyproject.toml`, `src/cloudai/registration.py`	Removed `cloudai.systems` from `tool.importlinter.forbidden_modules`; registered a new `"dse"` scenario with `DSEReporter` and `ReportConfig(enable=True)`.
DSE Report Generator `src/cloudai/report_generator/dse_report.py`	New module adding frozen dataclasses (parameters, trajectory, summaries), parsing/formatting helpers, CSV + slurm-job parsing, best-step selection, runtime/GPU-hour/USD savings calculations, and `build_dse_summaries(...)`.
Report Utilities `src/cloudai/report_generator/util.py`	Added `load_system_metadata(run_dir: Path, results_root: Path) -> SlurmSystemMetadata
Reporter Core `src/cloudai/reporter.py`	Removed `SlurmReportItem`; `ReportItem.from_test_runs(...)` now attempts node discovery via `load_system_metadata(...)`; `StatusReporter` unified onto `general-report.jinja2`; added `DSEReporter` to render `dse-report.jinja2` and write per-iteration best-config TOML with stricter checks and error handling; removed StatusReporter DSE helpers.
Templates & Styling `src/cloudai/util/base-report.jinja2`, `src/cloudai/util/dse-report.jinja2`, `src/cloudai/util/general-report.jinja2`, (removed) `src/cloudai/util/general-slurm-report.jinja2`	Enhanced base CSS and theme tokens; added `dse-report.jinja2` (tabbed UI, Chart.js support, copy-best-config UI); added "Nodes" column to general report; removed Slurm-specific template.
Registry Order `src/cloudai/_core/registry.py`	Inserted `"dse"` into report ordering at position `3`, bumped `"tarball"` to `4`, adjusting report ordering.
Tests `tests/test_init.py`, `tests/test_reporter.py`	Updated tests to include `DSEReporter`/`"dse"`; removed Slurm-specific test helpers; refactored to use `ReportItem.from_test_runs(...)`; added DSE filesystem/metadata helpers and an end-to-end DSE reporter test validating summaries and generated artifacts.
Docs `README.md`	Documented that `generate-report` can produce `<scenario>-dse-report.html` and per-DSE best-config TOML when DSE cases exist.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Poem

🐇 I nibbled rows of CSV light,
I chased the best step through the night,
Counted GPUs, hours, coins so bright,
Wrote TOML tidy, tucked it right —
A rabbit hops, the DSE's in sight.

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'DSE reporting' is concise and accurately reflects the main objective of this PR, which implements rich DSE report generation with new templates, report classes, and data structures.
Description check	✅ Passed	The description provides relevant context: it explains the purpose ('Make DSE reporting rich'), documents the test plan (automated CI and manual runs), and includes links to artifacts demonstrating the DSE report output.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ipod/cloudai-24/dse-report

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 13

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cloudai/registration.py`:
- Around line 296-299: The registration places "dse" after "status" but
Registry.ordered_scenario_reports() doesn't include "dse" in its ordering map so
it falls into the default bucket and runs earlier; update the ordering map used
by Registry.ordered_scenario_reports() to include the "dse" key in the correct
position (between "status" and "tarball") so the DSEReporter runs in the
intended slot, and add a regression test that asserts the order (e.g., verify
Registry.ordered_scenario_reports() yields ["per_test","status","dse","tarball"]
or that DSEReporter runs after StatusReporter and before TarballReporter) to
prevent regressions; locate changes around the
Registry.ordered_scenario_reports() implementation and the registration call
using Registry().add_scenario_report("dse", DSEReporter,
ReportConfig(enable=True)).

In `@src/cloudai/report_generator/dse_report.py`:
- Around line 188-198: The current winner selection in _build_reward_chart_data
picks the step with highest reward regardless of TrajectoryStep.is_successful,
which can mark a failed run as "best"; update _build_reward_chart_data to first
filter steps to only those with is_successful True (e.g., successful_steps = [s
for s in steps if s.is_successful]) and then compute best_index using max over
successful_steps (and map back to the original index if you need the index into
the full steps list); if there are no successful steps, return None or set
best_index to None. Apply the same filtering and selection rule to the analogous
selection logic referenced in reporter.py so both outputs remain aligned.
- Around line 147-154: Wrap TOML parsing and pydantic validation calls in
try/except and return None on failure so malformed artifacts are treated as
skipped; specifically modify _step_elapsed_time to catch exceptions around
toml.load(f) and SlurmJobMetadata.model_validate(...) and return None on any
exception, and similarly wrap the code that loads best_step_dump using
toml.load(f) and TestRunDetails.model_validate(...) to catch errors and set the
result to None (or skip) instead of propagating the exception.

In `@src/cloudai/report_generator/util.py`:
- Around line 198-205: The code currently picks node_files[0] after
metadata_path.glob("node-*.toml"), which is nondeterministic; make selection
deterministic by sorting the glob result (e.g., sorted(node_files) or sorting by
.name) before picking the first element so the file chosen for
SlurmSystemMetadata.model_validate is stable across runs; update the logic
around node_files and the with node_files[0].open() usage to use the
sorted/ordered list.
- Around line 191-207: Narrow the exception handling and make node file
selection deterministic: replace the broad `except Exception as exc:` that
follows `toml.load(f)` and `SlurmSystemMetadata.model_validate(...)` with a
specific except clause catching `toml.TomlDecodeError` and the pydantic
validation error you use elsewhere (e.g., `ValidationError`) so the error line
becomes `except (toml.TomlDecodeError, ValidationError) as exc:`; also ensure
`node_files` is deterministic by using `node_files =
sorted(metadata_path.glob("node-*.toml"))` before selecting `node_files[0]`.

In `@src/cloudai/reporter.py`:
- Line 164: Add a module-level logger by creating logger =
logging.getLogger(__name__) after imports, then replace root logging calls that
use f-strings with the module logger and %-style lazy formatting (e.g., change
logging.info(f"Generated scenario report at {report_path}") to
logger.info("Generated scenario report at %s", report_path)); apply the same
change to the other occurrences referenced (the logging calls around the report
generation and related messages).
- Around line 156-157: The Jinja2 Environment is created without autoescaping
which can lead to XSS when rendering user-provided parameters; update the
jinja_env creation in reporter.py (the jinja_env = jinja2.Environment(...) line
that uses self.templates_dir and template =
jinja_env.get_template("dse-report.jinja2")) to enable autoescaping (e.g., pass
autoescape=jinja2.select_autoescape(['html','xml']) or autoescape=True) and
import jinja2.select_autoescape if needed so HTML/XML templates are escaped by
default before rendering.
- Around line 166-175: report_best_dse_config currently uses
tr.current_iteration (always 0) so only iteration 0 is processed; change the
loop to iterate all iterations for each DSE test run by enumerating iterations
instead of using tr.current_iteration: for each tr in
self.test_scenario.test_runs, iterate over either a known iteration count (e.g.,
for iteration in range(tr.num_iterations) or range(tr.total_iterations)) or
discover existing iteration directories under self.results_root / tr.name (e.g.,
list subfolders matching numeric iteration names) and use that iteration
variable when building tr_root and reading trajectory.csv; update references to
tr.current_iteration accordingly inside report_best_dse_config and ensure it
still skips non-dse jobs.
- Around line 177-181: The current block that reads trajectory.csv and loads the
best step details (using df, trajectory_file, best_step =
df.loc[df["reward"].idxmax()]["step"], best_step_details, tr_root and
CommandGenStrategy.TEST_RUN_DUMP_FILE_NAME then calling
TestRunDetails.model_validate) needs defensive error handling: check that df is
non-empty and that the "reward" column contains at least one non-NaN value
before calling idxmax(), and wrap the file access in a try/except or pre-check
best_step_details.exists() to handle missing files (FileNotFoundError) and log
an informative message via your logger before skipping or re-raising; ensure any
exception around TestRunDetails.model_validate is caught and logged with context
so failures in parsing the dump are visible.

In `@src/cloudai/util/dse-report.jinja2`:
- Around line 410-429: The copy button inside the <summary> is triggering the
parent <details> toggle because the click event bubbles; update the
".js-copy-config" click handler (the anonymous async function attached in
document.querySelectorAll(...).forEach) to accept the event parameter and call
event.preventDefault() and event.stopPropagation() at the top of the handler (or
alternatively move the button outside the <summary>), then keep the existing
clipboard logic so clicking copies without toggling the disclosure; the same
change should be applied to the other handler at the 642-647 region.
- Around line 432-440: Update the click handler for the tab widgets (the code
iterating over ".js-dse-tabs" and handling ".js-dse-tab-button" clicks) to
update ARIA states alongside CSS classes: set aria-selected="true" on the
clicked button and aria-selected="false" on all other buttons, ensure the tab
container has role="tablist" and each button has role="tab" (and a stable id),
and for panels (".js-dse-tab-panel") set role="tabpanel", aria-hidden="false"
for the active panel and aria-hidden="true" for inactive panels and ensure each
panel has aria-labelledby pointing to its tab button id (use dataset.tabTarget /
dataset.tabPanel to match). Also update tabindex on tabs (0 for active, -1 for
inactive) so keyboard focus flows correctly.
- Around line 557-665: Enable Jinja2 autoescaping when creating template
environments in DSEReporter.generate and
StatusReporter.generate_scenario_report: update the Environment construction to
pass autoescape=jinja2.select_autoescape(['html', 'xml']) (e.g.,
Environment(loader=..., autoescape=select_autoescape(['html','xml']))) so all
injected fields like summary.name, summary.gpu_label, summary.best_config_toml,
parameter row names/values, and any other user-controlled strings are
HTML-escaped by default; also remove/avoid any forced |safe usage in templates
(dse-report.jinja2 and general-report.jinja2) so escaped output is used.

In `@tests/test_reporter.py`:
- Around line 632-636: The test currently only checks that iteration "0"
best-config TOMLs exist after calling reporter.generate(); since dse_case_a has
iterations=2, add an assertion to also verify the iteration "1" best-config file
is generated (similar to the existing checks), i.e. assert that
(slurm_system.output_path / dse_case_a.name / "1" /
f"{dse_case_a.name}.toml").exists(); keep the same pattern used for dse_case_b
and the mixed report to ensure multi-iteration DSE cases are covered.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b12d97b5-a0dc-4aa7-a2ea-f11942cd5fbe

📥 Commits

Reviewing files that changed from the base of the PR and between 43f24aa and d3d894e.

📒 Files selected for processing (11)

pyproject.toml
src/cloudai/registration.py
src/cloudai/report_generator/dse_report.py
src/cloudai/report_generator/util.py
src/cloudai/reporter.py
src/cloudai/util/base-report.jinja2
src/cloudai/util/dse-report.jinja2
src/cloudai/util/general-report.jinja2
src/cloudai/util/general-slurm-report.jinja2
tests/test_init.py
tests/test_reporter.py

💤 Files with no reviewable changes (1)

src/cloudai/util/general-slurm-report.jinja2

src/cloudai/registration.py

src/cloudai/report_generator/dse_report.py

src/cloudai/report_generator/util.py

src/cloudai/reporter.py

src/cloudai/util/dse-report.jinja2

tests/test_reporter.py

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (5)

src/cloudai/reporter.py (3)

170-179: ⚠️ Potential issue | 🟠 Major

Write best-config outputs for every DSE iteration.

self.test_scenario.test_runs holds the case definitions, so tr.current_iteration is still the default here. Multi-iteration DSE jobs will only inspect <case>/0/trajectory.csv and never emit configs for later iterations.

♻️ Minimal fix

     def report_best_dse_config(self):
         for tr in self.test_scenario.test_runs:
             if not tr.test.is_dse_job:
                 continue
 
-            tr_root = self.results_root / tr.name / f"{tr.current_iteration}"
-            trajectory_file = tr_root / "trajectory.csv"
-            if not trajectory_file.is_file():
-                logging.warning(f"No trajectory file found for {tr.name} at {trajectory_file}")
-                continue
+            for iteration in range(tr.iterations):
+                tr_root = self.results_root / tr.name / str(iteration)
+                trajectory_file = tr_root / "trajectory.csv"
+                if not trajectory_file.is_file():
+                    logging.warning(f"No trajectory file found for {tr.name} at {trajectory_file}")
+                    continue
 
-            df = lazy.pd.read_csv(trajectory_file)
-            best_step = df.loc[df["reward"].idxmax()]["step"]
-            best_step_details = tr_root / f"{best_step}" / CommandGenStrategy.TEST_RUN_DUMP_FILE_NAME
-            with best_step_details.open() as f:
-                trd = TestRunDetails.model_validate(toml.load(f))
+                df = lazy.pd.read_csv(trajectory_file)
+                best_step = df.loc[df["reward"].idxmax()]["step"]
+                best_step_details = tr_root / f"{best_step}" / CommandGenStrategy.TEST_RUN_DUMP_FILE_NAME
+                with best_step_details.open() as f:
+                    trd = TestRunDetails.model_validate(toml.load(f))
 
-            best_config_path = tr_root / f"{tr.name}.toml"
-            logging.info(f"Writing best config for {tr.name} to {best_config_path}")
-            with best_config_path.open("w") as f:
-                toml.dump(trd.test_definition.model_dump(), f)
+                best_config_path = tr_root / f"{tr.name}.toml"
+                logging.info(f"Writing best config for {tr.name} to {best_config_path}")
+                with best_config_path.open("w") as f:
+                    toml.dump(trd.test_definition.model_dump(), f)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cloudai/reporter.py` around lines 170 - 179, report_best_dse_config only
checks tr.current_iteration (default) so multi-iteration DSE jobs inspect only
the 0 iteration; change the loop to iterate over all DSE iterations for each
test run: for each tr in self.test_scenario.test_runs where tr.test.is_dse_job,
enumerate the available iteration IDs (e.g. from tr.iterations if present or by
listing directories under self.results_root / tr.name) and for each iteration
set tr_root = self.results_root / tr.name / str(iteration) then check
trajectory_file = tr_root / "trajectory.csv" and emit the best-config output;
update references in report_best_dse_config from tr.current_iteration to the
loop variable so configs are written for every iteration.

181-189: ⚠️ Potential issue | 🟠 Major

Skip incomplete trajectories instead of aborting the whole DSE reporter.

idxmax() on an empty/all-NaN reward column and best_step_details.open() on a missing dump both raise here. A single incomplete iteration will stop generate() before the HTML report is written.

🛡️ Minimal hardening

             df = lazy.pd.read_csv(trajectory_file)
-            best_step = df.loc[df["reward"].idxmax()]["step"]
-            best_step_details = tr_root / f"{best_step}" / CommandGenStrategy.TEST_RUN_DUMP_FILE_NAME
+            if df.empty or not {"step", "reward"} <= set(df.columns) or df["reward"].isna().all():
+                logging.warning("Empty or invalid trajectory for %s at %s", tr.name, trajectory_file)
+                continue
+            best_step = int(df.loc[df["reward"].idxmax(), "step"])
+            best_step_details = tr_root / str(best_step) / CommandGenStrategy.TEST_RUN_DUMP_FILE_NAME
+            if not best_step_details.is_file():
+                logging.warning("Best-step dump not found for %s at %s", tr.name, best_step_details)
+                continue
             with best_step_details.open() as f:
                 trd = TestRunDetails.model_validate(toml.load(f))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cloudai/reporter.py` around lines 181 - 189, The code assumes
trajectory_file contains valid rewards and the best-step dump exists, causing
generate() to abort on empty/NaN reward columns or missing dump; modify the
block that reads df, computes best_step, and opens best_step_details so it first
checks that df is non-empty and that df["reward"] has any finite values (skip
and log if not), then compute best_step safely (avoid idxmax on empty/NaN), and
before calling best_step_details.open() and
TestRunDetails.model_validate(toml.load(f)) verify best_step_details.exists()
(log and skip this tr iteration if missing) so the reporter continues and still
writes the HTML; update related logs (e.g., the logging.info for
best_config_path) to reflect skipped iterations.

160-161: ⚠️ Potential issue | 🟠 Major

Enable autoescaping for the DSE HTML environment.

jinja2.Environment(...) defaults to autoescape=False, so descriptions, parameter values, and TOML content are rendered raw into the page. Because the template filename ends in .jinja2, the fix needs default=True (or an explicit .jinja2 enablement), not just ["html", "xml"].

🔒 Minimal fix

-        jinja_env = jinja2.Environment(loader=jinja2.FileSystemLoader(self.templates_dir))
+        jinja_env = jinja2.Environment(
+            loader=jinja2.FileSystemLoader(self.templates_dir),
+            autoescape=jinja2.select_autoescape(default_for_string=True, default=True),
+        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cloudai/reporter.py` around lines 160 - 161, The Jinja environment is
created without autoescaping (jinja2.Environment(loader=self.templates_dir)),
causing raw HTML/TOML to be injected; update the jinja2.Environment call that
creates jinja_env (and/or its use of jinja2.select_autoescape) to enable
autoescaping for .jinja2 templates—either pass
autoescape=jinja2.select_autoescape(default=True) or include "jinja2" in the
select_autoescape list so templates loaded via templates_dir and referenced by
template = jinja_env.get_template("dse-report.jinja2") are autoescaped.

src/cloudai/report_generator/dse_report.py (2)

147-154: ⚠️ Potential issue | 🟠 Major

Treat malformed TOML artifacts as skipped data, not a hard failure.

Missing files are already handled as best-effort, but TOML parse/validation errors in these two loads still bubble up and abort the whole DSE report. Return None for the step/iteration instead so one bad artifact does not break every summary.

Also applies to: 282-283

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cloudai/report_generator/dse_report.py` around lines 147 - 154, The TOML
parse/validation in _step_elapsed_time currently can raise and abort the report;
change it to treat malformed TOML as skipped data by wrapping the toml.load(...)
and SlurmJobMetadata.model_validate(...) in a try/except that catches
parsing/validation errors (e.g., toml.TomlDecodeError, ValidationError,
ValueError, or a broad Exception) and returns None on failure (optionally
logging a debug/warn); apply the same defensive try/except-and-return-None
pattern to the other similar TOML load/validate call referenced around lines
282-283 so bad artifacts don't crash the DSE report.

188-198: ⚠️ Potential issue | 🟠 Major

Do not mark failed steps as the best configuration.

TrajectoryStep.is_successful is computed but ignored in both winner selections, so a failed high-reward run can be highlighted in the chart and serialized as best_config_toml. Filter to successful steps first, and keep DSEReporter.report_best_dse_config() on the same rule so the HTML and emitted TOMLs stay aligned.

🐛 Minimal fix

 def _build_reward_chart_data(steps: list[TrajectoryStep]) -> dict[str, Any] | None:
     if not steps:
         return None
 
-    best_index = max(range(len(steps)), key=lambda idx: steps[idx].reward)
+    successful_indexes = [idx for idx, step in enumerate(steps) if step.is_successful]
+    if not successful_indexes:
+        return None
+    best_index = max(successful_indexes, key=lambda idx: steps[idx].reward)
     return {
         "labels": [step.step for step in steps],
         "rewards": [step.reward for step in steps],
         "observations": [step.observation_text for step in steps],
         "best_index": best_index,
@@
-    best_step = max(trajectory_steps, key=lambda step: step.reward)
+    successful_steps = [step for step in trajectory_steps if step.is_successful]
+    if not successful_steps:
+        return None
+    best_step = max(successful_steps, key=lambda step: step.reward)

Also applies to: 276-318

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cloudai/report_generator/dse_report.py` around lines 188 - 198, The
reward chart builder currently picks the max reward regardless of success;
change _build_reward_chart_data to first filter steps by
TrajectoryStep.is_successful and operate on that filtered list (return None if
no successful steps), and make the same change in
DSEReporter.report_best_dse_config (use the same successful-only selection logic
when choosing best_index and best_config_toml) so the HTML chart and emitted
TOML remain aligned; reference the _build_reward_chart_data function and the
DSEReporter.report_best_dse_config method when making the edits.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cloudai/report_generator/dse_report.py`:
- Line 21: Replace the direct logging.warning(f"...") calls with a module-level
logger using lazy %-style formatting: add a module logger (e.g., logger =
logging.getLogger(__name__)) near the top (after the existing import logging)
and update every logging.warning(f"...") occurrence in this file (the calls
around lines 21, ~227-232, and ~279) to use logger.warning("... %s ...", value)
or logger.warning("literal %d", int_val) so formatting is deferred and Ruff
G004/LOG015 is satisfied.
- Around line 137-144: The _normalize_gpu_family function incorrectly matches
short family names inside longer ones (e.g., "B200" in "GB200"), so change the
matching to prefer more specific (longer) family names first: iterate
GPU_HOURLY_COST_USD's keys sorted by descending length (or otherwise order by
specificity) and check if each family is in upper, returning the first match;
keep the early None handling and return None if nothing matches. Ensure you
reference _normalize_gpu_family and GPU_HOURLY_COST_USD when making the change.

In `@src/cloudai/report_generator/util.py`:
- Around line 188-207: The function load_system_metadata currently stops if
run_dir/"metadata" exists but has no node-*.toml or the first node file fails
validation; change the logic so when metadata_path (from run_dir) yields no
node_files or when SlurmSystemMetadata.model_validate raises, the function then
tries results_root/"metadata" (fallback_metadata_path) before returning None.
Concretely: when node_files is empty, set metadata_path to
fallback_metadata_path (if it exists) and re-evaluate node_files; and if
model_validate fails on the first node file from run_dir, catch the exception
and attempt to open/validate the first node-*.toml under fallback_metadata_path
(if present) before returning None. Ensure references are to
load_system_metadata, run_dir, results_root, metadata_path,
fallback_metadata_path, node_files, and SlurmSystemMetadata.model_validate.

---

Duplicate comments:
In `@src/cloudai/report_generator/dse_report.py`:
- Around line 147-154: The TOML parse/validation in _step_elapsed_time currently
can raise and abort the report; change it to treat malformed TOML as skipped
data by wrapping the toml.load(...) and SlurmJobMetadata.model_validate(...) in
a try/except that catches parsing/validation errors (e.g., toml.TomlDecodeError,
ValidationError, ValueError, or a broad Exception) and returns None on failure
(optionally logging a debug/warn); apply the same defensive
try/except-and-return-None pattern to the other similar TOML load/validate call
referenced around lines 282-283 so bad artifacts don't crash the DSE report.
- Around line 188-198: The reward chart builder currently picks the max reward
regardless of success; change _build_reward_chart_data to first filter steps by
TrajectoryStep.is_successful and operate on that filtered list (return None if
no successful steps), and make the same change in
DSEReporter.report_best_dse_config (use the same successful-only selection logic
when choosing best_index and best_config_toml) so the HTML chart and emitted
TOML remain aligned; reference the _build_reward_chart_data function and the
DSEReporter.report_best_dse_config method when making the edits.

In `@src/cloudai/reporter.py`:
- Around line 170-179: report_best_dse_config only checks tr.current_iteration
(default) so multi-iteration DSE jobs inspect only the 0 iteration; change the
loop to iterate over all DSE iterations for each test run: for each tr in
self.test_scenario.test_runs where tr.test.is_dse_job, enumerate the available
iteration IDs (e.g. from tr.iterations if present or by listing directories
under self.results_root / tr.name) and for each iteration set tr_root =
self.results_root / tr.name / str(iteration) then check trajectory_file =
tr_root / "trajectory.csv" and emit the best-config output; update references in
report_best_dse_config from tr.current_iteration to the loop variable so configs
are written for every iteration.
- Around line 181-189: The code assumes trajectory_file contains valid rewards
and the best-step dump exists, causing generate() to abort on empty/NaN reward
columns or missing dump; modify the block that reads df, computes best_step, and
opens best_step_details so it first checks that df is non-empty and that
df["reward"] has any finite values (skip and log if not), then compute best_step
safely (avoid idxmax on empty/NaN), and before calling best_step_details.open()
and TestRunDetails.model_validate(toml.load(f)) verify
best_step_details.exists() (log and skip this tr iteration if missing) so the
reporter continues and still writes the HTML; update related logs (e.g., the
logging.info for best_config_path) to reflect skipped iterations.
- Around line 160-161: The Jinja environment is created without autoescaping
(jinja2.Environment(loader=self.templates_dir)), causing raw HTML/TOML to be
injected; update the jinja2.Environment call that creates jinja_env (and/or its
use of jinja2.select_autoescape) to enable autoescaping for .jinja2
templates—either pass autoescape=jinja2.select_autoescape(default=True) or
include "jinja2" in the select_autoescape list so templates loaded via
templates_dir and referenced by template =
jinja_env.get_template("dse-report.jinja2") are autoescaped.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f2487e56-c4d4-4388-bb81-d67812611b63

📥 Commits

Reviewing files that changed from the base of the PR and between d3d894e and 7c0cf8f.

📒 Files selected for processing (3)

src/cloudai/report_generator/dse_report.py
src/cloudai/report_generator/util.py
src/cloudai/reporter.py

src/cloudai/report_generator/dse_report.py

src/cloudai/report_generator/util.py

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

src/cloudai/util/dse-report.jinja2 (1)

435-445: ⚠️ Potential issue | 🟡 Minor

Tab ARIA state not synchronized on click.

The click handler toggles CSS classes but doesn't update aria-selected attributes. While the initial HTML correctly sets aria-selected="true" on the first tab (line 553), switching tabs via JavaScript leaves screen readers with stale state.

♿ Proposed fix to sync ARIA state

         buttons.forEach((button) => {
             button.addEventListener("click", function () {
                 const target = button.dataset.tabTarget;
-                buttons.forEach((candidate) => candidate.classList.toggle("is-active", candidate === button));
+                buttons.forEach((candidate) => {
+                    const isActive = candidate === button;
+                    candidate.classList.toggle("is-active", isActive);
+                    candidate.setAttribute("aria-selected", isActive ? "true" : "false");
+                });
                 panels.forEach((panel) => panel.classList.toggle("is-active", panel.dataset.tabPanel === target));
             });
         });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cloudai/util/dse-report.jinja2` around lines 435 - 445, The click handler
for .js-dse-tab-button currently only toggles CSS classes and leaves ARIA state
stale; update the event listener inside the
document.querySelectorAll(".js-dse-tabs") loop so that when a button is clicked
it also sets aria-selected="true" on the active button and aria-selected="false"
on all other buttons (use the buttons NodeList) and likewise update panels'
accessibility state (e.g., set aria-hidden appropriately on .js-dse-tab-panel
elements based on panel.dataset.tabPanel === target) so screen readers reflect
the active tab.

src/cloudai/report_generator/dse_report.py (1)

302-303: ⚠️ Potential issue | 🟠 Major

Missing error handling for TOML parsing in _build_iteration_summary.

While _step_elapsed_time now has error handling (lines 155-159), the best_step_dump loading at lines 302-303 can still raise exceptions from toml.load() or TestRunDetails.model_validate(), aborting the entire summary build.

🛡️ Proposed fix

     with best_step_dump.open() as f:
-        test_run_details = TestRunDetails.model_validate(toml.load(f))
+        try:
+            test_run_details = TestRunDetails.model_validate(toml.load(f))
+        except Exception as exc:
+            logging.warning(f"Failed to load test run details for {test_case.name} at {best_step_dump}: {exc}")
+            return None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cloudai/report_generator/dse_report.py` around lines 302 - 303, The call
that loads and validates the best step TOML in _build_iteration_summary can
raise exceptions; wrap the block that opens best_step_dump, toml.load(f) and
TestRunDetails.model_validate(...) in a try/except, catch broad
parsing/validation errors (e.g., Exception), log the error with context
(including best_step_dump and the exception) and skip or return a safe default
summary for that iteration instead of letting the exception propagate; reference
the best_step_dump variable and TestRunDetails.model_validate to find the exact
lines to modify.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cloudai/report_generator/dse_report.py`:
- Around line 210-215: The returned "best_index" currently uses best_step.step
(the trajectory step number) which is treated as an array index by the JS chart;
change it to the positional index within the steps list. Replace best_step.step
with the index of best_step in steps (e.g., find the first i where steps[i] is
best_step or where steps[i].step == best_step.step) so "best_index" refers to
the array position used by the chart.

In `@src/cloudai/reporter.py`:
- Around line 184-192: The loop prematurely returns when best_step_details is
missing or TestRunDetails.model_validate fails, which stops processing remaining
test runs; change the early exits to continue so the loop moves on to the next
tr: replace the two return statements in the block that checks
best_step_details.is_file() and the except handling around
TestRunDetails.model_validate(toml.load(f)) with continue, keeping the
logging.warning calls (refer to best_step_details, tr,
TestRunDetails.model_validate and the existing logging.warning lines) so other
DSE test runs are still processed.

In `@src/cloudai/util/dse-report.jinja2`:
- Around line 461-469: The template compares array index (idx) to
chartData.best_index but _build_reward_chart_data in dse_report.py currently
sets best_index to best_step.step (a step number), causing mismatches for
non-contiguous steps; update _build_reward_chart_data so chartData.best_index
holds the array index of the best step (e.g., compute the index where steps[i]
== best_step.step or set best_index to the loop index when you identify the
best) or alternatively include best_step.step as best_step and change the
template comparison to compare step values—modify the function that builds
reward chart data (_build_reward_chart_data / any variables assembling
chartData) so chartData.best_index and the template use the same coordinate
system (index vs step) consistently.

---

Duplicate comments:
In `@src/cloudai/report_generator/dse_report.py`:
- Around line 302-303: The call that loads and validates the best step TOML in
_build_iteration_summary can raise exceptions; wrap the block that opens
best_step_dump, toml.load(f) and TestRunDetails.model_validate(...) in a
try/except, catch broad parsing/validation errors (e.g., Exception), log the
error with context (including best_step_dump and the exception) and skip or
return a safe default summary for that iteration instead of letting the
exception propagate; reference the best_step_dump variable and
TestRunDetails.model_validate to find the exact lines to modify.

In `@src/cloudai/util/dse-report.jinja2`:
- Around line 435-445: The click handler for .js-dse-tab-button currently only
toggles CSS classes and leaves ARIA state stale; update the event listener
inside the document.querySelectorAll(".js-dse-tabs") loop so that when a button
is clicked it also sets aria-selected="true" on the active button and
aria-selected="false" on all other buttons (use the buttons NodeList) and
likewise update panels' accessibility state (e.g., set aria-hidden appropriately
on .js-dse-tab-panel elements based on panel.dataset.tabPanel === target) so
screen readers reflect the active tab.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e2d81fb0-0478-4397-84c8-41bf7010f53d

📥 Commits

Reviewing files that changed from the base of the PR and between 7c0cf8f and c122678.

📒 Files selected for processing (5)

src/cloudai/_core/registry.py
src/cloudai/report_generator/dse_report.py
src/cloudai/reporter.py
src/cloudai/util/dse-report.jinja2
tests/test_reporter.py

src/cloudai/report_generator/dse_report.py

src/cloudai/reporter.py

src/cloudai/util/dse-report.jinja2

src/cloudai/reporter.py

src/cloudai/util/dse-report.jinja2

src/cloudai/report_generator/dse_report.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_reporter.py`:
- Around line 385-388: Remove the dead helper function _create_non_dse_iteration
from tests/test_reporter.py; locate the standalone function definition named
_create_non_dse_iteration(case: TestRun, iteration: int, results_root: Path) and
delete it (and any immediately adjacent blank lines) to keep the test
scaffolding minimal—no replacement code required unless referenced elsewhere (if
referenced, instead update those call sites to remove or replace usage).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4ca1d988-6f4e-45a0-bf45-175a9d6465bb

📥 Commits

Reviewing files that changed from the base of the PR and between 7e3ec64 and 30ccbe9.

📒 Files selected for processing (2)

src/cloudai/util/dse-report.jinja2
tests/test_reporter.py

tests/test_reporter.py

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cloudai/report_generator/dse_report.py`:
- Around line 210-215: The current best_index calculation ("best_index":
best_step.step - 1) is wrong for 0-indexed arrays; replace it with an index
derived from the steps list itself (e.g., use steps.index(best_step) if
best_step is the same object in the list, or use next(i for i, s in
enumerate(steps) if s.step == best_step.step)) so best_index correctly points to
the array position; update the return dict construction that builds
"labels"/"rewards"/"observations" to use this corrected best_index value.
- Line 336: Guard against test_run_details.test_definition being None before
calling model_dump: in the assignment that builds best_config_toml (currently
calling toml.dumps(test_run_details.test_definition.model_dump())), check
whether test_run_details.test_definition is not None and only call model_dump()
when present; otherwise set best_config_toml to a safe default (e.g., an empty
string or None). Update the code path that constructs best_config_toml in
dse_report.py to reference test_run_details.test_definition and use the
conditional to avoid AttributeError from model_dump().

In `@tests/test_reporter.py`:
- Around line 519-524: The test and implementation disagree on best_index
because the code in src/cloudai/report_generator/dse_report.py computes
best_index as best_step.step - 1 (off-by-one); update the implementation to set
best_index to best_step.step (no subtraction) when building the
reward_chart_data and also update the test in tests/test_reporter.py to expect
"best_index": 1 for the sample rewards [ -10.0, -5.0, -7.0 ] so the chart
highlights step 1 as the best step.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d8e86ed9-968b-4819-b4b9-607bd03deeaf

📥 Commits

Reviewing files that changed from the base of the PR and between 7e3ec64 and 36d030b.

📒 Files selected for processing (4)

src/cloudai/report_generator/dse_report.py
src/cloudai/reporter.py
src/cloudai/util/dse-report.jinja2
tests/test_reporter.py

src/cloudai/report_generator/dse_report.py

tests/test_reporter.py

coderabbitai

♻️ Duplicate comments (1)

src/cloudai/reporter.py (1)
176-176: 🧹 Nitpick | 🔵 Trivial

Use %-style formatting for consistency.

Line 110 uses logging.info("Generated scenario report at %s", report_path) with %-style lazy formatting, but line 176 uses an f-string. For consistency within the same file and lazy evaluation benefits, consider using %-style here as well.
♻️ Suggested fix
-        logging.info(f"Generated scenario report at {report_path}")
+        logging.info("Generated scenario report at %s", report_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cloudai/reporter.py` at line 176, Replace the f-string logging call so it
uses %-style lazy formatting for consistency and performance; locate the
logging.info call that prints "Generated scenario report at {report_path}" in
src/cloudai/reporter.py and change it to use logging.info("Generated scenario
report at %s", report_path) instead of the f-string.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/cloudai/reporter.py`:
- Line 176: Replace the f-string logging call so it uses %-style lazy formatting
for consistency and performance; locate the logging.info call that prints
"Generated scenario report at {report_path}" in src/cloudai/reporter.py and
change it to use logging.info("Generated scenario report at %s", report_path)
instead of the f-string.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 23f83393-1e47-481e-85a8-baa1b31e9754

📥 Commits

Reviewing files that changed from the base of the PR and between 36d030b and 68c6d35.

📒 Files selected for processing (3)

README.md
src/cloudai/reporter.py
tests/test_reporter.py

podkidyshev added 23 commits March 22, 2026 18:04

initial implementation

e5dbe30

refactored ai code

62fe3b8

better visuals

e1babd5

always render statuses table

660564a

cleaner look of the report

e099bc9

cleaner look of the report

d36b0e2

almost there

0c7267a

we are somewhere

86741ef

looks nice

9d335ba

top block looks solid

dbd7ebe

graph looks good

e93eb00

better colors

a22100d

support for multiple dse cases

3dd3b49

visuals completed

2a9ec7c

fix graph bug and cleanup js

659a946

huge refactor

9109bb1

we're back to working state

5c887a5

minor style change

87d1e73

html looks good

b0e6d66

reporting best dse config is now part of dse reporter

d8c2caa

code looks much cleaner

a669465

reverted test_reporter

811bc56

reverted test_reporter

d3d894e

podkidyshev requested review from amaslenn, jeffnvidia and srivatsankrishnan as code owners March 24, 2026 22:20

fix copyright

7c0cf8f

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

src/cloudai/report_generator/dse_report.py Show resolved Hide resolved

src/cloudai/report_generator/dse_report.py Show resolved Hide resolved

src/cloudai/report_generator/util.py Show resolved Hide resolved

podkidyshev added 2 commits March 25, 2026 00:10

resolve ai feedback

c122678

fix copyright check

7e3ec64

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

src/cloudai/report_generator/dse_report.py Show resolved Hide resolved

src/cloudai/reporter.py Outdated Show resolved Hide resolved

src/cloudai/util/dse-report.jinja2 Show resolved Hide resolved

podkidyshev added 2 commits March 25, 2026 00:19

apply ai feedback

30ccbe9

fix best_index bug

5487c83

podkidyshev commented Mar 24, 2026

View reviewed changes

src/cloudai/reporter.py Show resolved Hide resolved

src/cloudai/util/dse-report.jinja2 Show resolved Hide resolved

src/cloudai/report_generator/dse_report.py Show resolved Hide resolved

replace return with continue in reporting best dse config

36d030b

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

tests/test_reporter.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

src/cloudai/report_generator/dse_report.py Show resolved Hide resolved

src/cloudai/report_generator/dse_report.py Show resolved Hide resolved

tests/test_reporter.py Show resolved Hide resolved

update docs

68c6d35

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

podkidyshev self-assigned this Mar 25, 2026

amaslenn approved these changes Mar 25, 2026

View reviewed changes

podkidyshev merged commit a32d394 into main Mar 25, 2026
5 checks passed

podkidyshev deleted the ipod/cloudai-24/dse-report branch March 25, 2026 12:11

Conversation

podkidyshev commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Additional Notes

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

podkidyshev commented Mar 24, 2026 •

edited

Loading

coderabbitai bot commented Mar 24, 2026 •

edited

Loading