Skip to content

feature of perf jedi: Add performance summary command with calendar-based weekly change#89

Open
ArthurChenCoding wants to merge 13 commits intoredhat-performance:mainfrom
ArthurChenCoding:option2-config-alias
Open

feature of perf jedi: Add performance summary command with calendar-based weekly change#89
ArthurChenCoding wants to merge 13 commits intoredhat-performance:mainfrom
ArthurChenCoding:option2-config-alias

Conversation

@ArthurChenCoding
Copy link

@ArthurChenCoding ArthurChenCoding commented Feb 5, 2026

Summary

Implements the performance summary command that allows users to request performance metric summaries via Slack. The command fetches telemetry data from Orion and displays formatted tables with Min/Max/Avg statistics and weekly change percentages.

Key features:

  • Calendar-based weekly change calculation comparing the last N days vs the prior N days
  • Ranking the top 15 most influential metrics
  • Automatic message splitting to handle Slack's ~4000 character limit
  • Direction&threshold-aware visual hint for outstanding influence
  • Auto-truncation on long metric name for better formatting

Usage:
@bot performance summary [ALL|config1.yaml,config2.yaml,config3.yaml...] [version(s)] [number of days]d [verbose]

All arguments are optional(order does not matter), the default lookback and version are 14 days and 4.19.
Default configs are:
"metal-perfscale-cpt-virt-udn-density.yaml",
"trt-external-payload-cluster-density.yaml",
"trt-external-payload-node-density.yaml",
"trt-external-payload-node-density-cni.yaml",
"trt-external-payload-crd-scale.yaml",
"small-scale-udn-l3.yaml",
"med-scale-udn-l3.yaml"

"ALL" option is available, which contains 41 predefined config files in https://github.com/cloud-bulldozer/orion/tree/main/examples.

See more in the demo section.

Changes

  • bugzooka/analysis/perf_summary_analyzer.py: New module for fetching and formatting performance data
  • bugzooka/integrations/slack_socket_listener.py: Handle performance summary command and multi-message responses

Test plan

  • Trigger @BugZooka performance summary in Slack
  • Verify table formatting renders correctly
  • Test with specific config files, multiple versions, with and without ALL and verbose option
  • Test default behavior (nothing specified)
  • Verify change shows percentage when sufficient data exists
  • Verify the influence is correctly ranked by influence
  • Test super long Slack message (test ability to split message to keep formatting)
  • Test the emoji (🆘🟢) is correctly presented based on the direction of the metric and threshold
  • Test socket mode is required, and polling mode is deprecated
  • Test non-verbose mode correctly skip the metric with n/a data
  • Test the length limitation of a single Slack message and investigate its influence on formatting
  • Test super long query (365days)

Known limitations

  • Weekly change shows "n/a" if no data exists in either week period
  • Sparse telemetry data (e.g., OKD) may show "n/a" more frequently
  • Future: Consider timestamp-based queries anchored to last available data point
  • Config with an extensive amount of metrics might break formatting

Jira ticket

https://issues.redhat.com/browse/PERFSCALE-4422

Demo

  • greeting message
image
  • single config query
image
  • short look back & multiple version
image
  • multiple config
image
  • multiple configs with verbose option
    @PerfScale Padawan performance summary trt-external-payload-node-density-cni.yaml,trt-external-payload-node-density.yaml 30d verbose
image

logger = logging.getLogger(__name__)

# Default control plane configs used when user doesn't specify a config
_DEFAULT_CONTROL_PLANE_CONFIGS = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the real control plane. These are the ones that we often prefer looking at

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

}


def _calculate_period_change(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean percentage change here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, let me rename it

return text
if len(text) <= max_len:
return text
if max_len <= 3:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this case here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid slack message display issue beyond max width, metrics names are truncated when it is above certain max_len, then added "..." behind it to indicate truncation.

Adding "..." would be redundant when we require some column to be less than 3 char wide, so I decided to just return the truncated name without "...".

"get_orion_metrics_with_meta unavailable, falling back to get_orion_metrics: %s",
e,
)
result = await _call_mcp_tool("get_orion_metrics", {"config_name": config})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What OCP version is going to be used for this call? Looking at the source code it looks like its going to use 4.20 as a hard coded value: https://github.com/jtaleric/orion-mcp/blob/main/utils/utils.py#L266

Copy link
Author

@ArthurChenCoding ArthurChenCoding Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, in latest commit, I extended the MCP tool get_orion_metrics to accept a version parameter, and pass it from get_metrics when falling back.


try:
result = await _call_mcp_tool(
"get_orion_performance_data",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need this tool, when we have a fallback available? Same in the other places

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, I have described the Rationale / Justification of adding new tools in my Orion-MCP PR: cloud-bulldozer/orion-mcp#2

@vishnuchalla
Copy link
Collaborator

@avasilevskii as per your question, I think its fine to make direct mcp tool calls and let them skip agentic workflow if not required. Honestly I am fine with either. Expect for the pr_analyze feature other commands are making direct calls. As long the output is deterministic and makes sense as per the user request, I am good. Don't want to enforce anything in that area IMHO.

@mohit-sheth
Copy link
Collaborator

needs a readme section

- Introduced a new `PerformanceData` dataclass to encapsulate performance metrics.
- Refactored `_format_metrics_table` and `_format_config_table` functions for improved clarity and functionality.
- Updated `get_performance_data` to return `PerformanceData` instances instead of dictionaries.
- Enhanced table formatting to conditionally include configuration details based on parameters.
- Added a new section detailing the usage of the Performance Summary feature in BugZooka.
- Included examples and notes on configuration and behavior.
- Updated mandatory and optional fields to include `JEDI_BOT_SLACK_USER_ID` and performance summary settings.
README.md Outdated
**Notes:**
- If no config is provided, defaults to a curated control-plane config list.
- `ALL` uses all available Orion configs (fallback list is used if MCP is unavailable).
- Multiple versions can be provided (e.g. `4.19 4.20`).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this? Can you provide an example for performance summary with this scenario?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will add to the usage example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants