Skip to content

Comments

Enhance --breakout with multiple values and regex patterns (fixes #110)#152

Closed
atheurer wants to merge 8 commits intoperftool-incubator:masterfrom
atheurer:multiple-breakout-values
Closed

Enhance --breakout with multiple values and regex patterns (fixes #110)#152
atheurer wants to merge 8 commits intoperftool-incubator:masterfrom
atheurer:multiple-breakout-values

Conversation

@atheurer
Copy link
Contributor

@atheurer atheurer commented Feb 19, 2026

Summary

This PR enhances the --breakout option in get-metric-data.js with support for multiple values and regex pattern matching, addressing issue #110.

Features Implemented

1. Multiple Literal Values

Syntax: --breakout hostname=a,b,c
Result: Returns separate metrics for each value
Use Case: Compare specific hosts or components

2. Regex Pattern Matching

Syntax:

  • --breakout hostname=r/pattern/ → Separate metrics per match
  • --breakout hostname=R/pattern/ → Aggregated metric for all matches

Features:

  • Custom delimiters: Use r|pattern| when pattern contains /
  • Regex alternation for specific values: R/worker-1|worker-2|worker-3/
  • Consistent lowercase/uppercase convention

3. Enhanced Error Handling

Feature: Clear error messages when filters match nothing
Benefit: Actionable feedback instead of cryptic errors

Complete Syntax Matrix

Syntax Type Result Use Case
hostname=a Single literal 1 metric for 'a' Query specific host
hostname=a,b,c Multiple literals 3 separate metrics Compare specific hosts
hostname=r/pattern/ Regex separate N separate metrics Compare all matching
hostname=R/pattern/ Regex aggregated 1 combined metric Total of all matching
hostname=R/a|b|c/ Regex alternation 1 combined metric Aggregate specific values
hostname No filter All values (separate) Explore all options

Design Philosophy

The implementation uses a consistent, intuitive syntax pattern:

  • Lowercase r = Separate/individual metrics
  • Uppercase R = Aggregated/combined metrics

This provides maximum flexibility while maintaining a clean, learnable interface.

Example Usage

# Multiple literal values - separate metrics
node ./get-metric-data.js --period <UUID> --source sar-net --type L2-Gbps \
  --breakout csid=1,2,cstype=worker

# Regex with separate metrics per match
node ./get-metric-data.js --period <UUID> --source mpstat --type Busy-CPU \
  --breakout hostname=r/^worker-.*/

# Regex with aggregated metric for all matches
node ./get-metric-data.js --period <UUID> --source sar-net --type L2-Gbps \
  --breakout hostname=R/^client-.*/

# Aggregate specific values using regex alternation
node ./get-metric-data.js --period <UUID> --source mpstat --type Busy-CPU \
  --breakout hostname=R/worker-1|worker-2|worker-3/

# Custom delimiter when pattern contains slashes
node ./get-metric-data.js --period <UUID> --source iostat --type kB-sec \
  --breakout dev=r|/dev/sd.*|

# Mix different filter types
node ./get-metric-data.js --period <UUID> --source mpstat --type Busy-CPU \
  --breakout hostname=r/worker-[0-9]+/,cstype=physical

Use Cases

Separate Metrics (, or r/):

  • Compare performance across specific hosts
  • Identify outliers or problem nodes
  • Generate per-host charts and reports

Aggregated Metrics (R/):

  • Measure total throughput across a cluster: R/^worker-.*/
  • Aggregate specific hosts: R/worker-1|worker-2|worker-3/
  • Calculate combined CPU usage of worker nodes
  • Aggregate network traffic: R/eth0|eth1/

Technical Implementation

  • Parsing: Smart detection of separators in list() function
  • Query: Uses OpenSearch terms filter for multiple values, regexp for patterns
  • Aggregation: Controlled by excluding fields from nested aggregation structure
  • Error Handling: Detects empty results and provides actionable feedback
  • Documentation: Comprehensive guide with examples and feature matrix

Backward Compatibility

✅ Fully backward compatible - all existing usage patterns work unchanged

Testing

  • Code follows existing patterns
  • Comprehensive documentation with examples
  • Error handling tested with non-matching filters
  • Feature matrix documents all syntax options
  • Maintains backward compatibility
  • Regex alternation tested for aggregating specific values

Commits

  1. c879aea - Add support for multiple values in --breakout option
  2. 54b29df - Add regex pattern support for breakout filters
  3. 217d77e - Add helpful error message when regex filter matches nothing
  4. 054ae2e - Revert aggregated literal values (use regex alternation instead)
  5. c8ac61c - Update documentation for regex alternation approach

Closes #110

🤖 Generated with Claude Code

Enhanced the --breakout option in get-metric-data.js to support
comma-separated values (e.g., --breakout hostname=a,b,c) which returns
separate metrics for each specified value. This addresses issue perftool-incubator#110.

Changes:
- Modified list() parser to distinguish between field separators and value lists
- Updated OpenSearch query builder to use "terms" query for multiple values
- Added documentation with examples and usage guidelines

The implementation maintains backward compatibility and is designed to
support future aggregation syntax (e.g., hostname=a+b).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
atheurer and others added 3 commits February 19, 2026 12:00
Implemented regex pattern matching in breakout filters with two modes:
- r/pattern/ (lowercase): Returns separate metrics for each matching value
- R/pattern/ (uppercase): Returns single aggregated metric for all matches

Features:
- Custom delimiter support: use any character after r/R as delimiter
  (e.g., r/pattern/, r|pattern|, r#pattern#)
- Consistent syntax with literal values (r vs R parallels , vs +)
- OpenSearch regexp query integration for efficient pattern matching

Examples:
- --breakout hostname=r/^worker-.*/ (separate metrics per worker)
- --breakout hostname=R/^client-.*/ (aggregated metric for all clients)
- --breakout dev=r|/dev/sd.*| (custom delimiter for patterns with /)

Implementation:
- Modified getBreakoutAggregation() to exclude fields with R/pattern/
- Updated getMetricGroupsFromBreakouts() to detect and apply regexp filters
- Added comprehensive documentation with examples and use cases

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
When a regex breakout filter (r/pattern/ or R/pattern/) doesn't match
any metric values, the query previously failed with a cryptic error:
"number of generated data sets (0) does not match the number of
metric query sets (1)"

This commit adds detection for empty result sets caused by regex filters
and returns a clear, actionable error message explaining:
- Which source/type was queried
- Which regex filter(s) didn't match
- Suggestions for troubleshooting

Example error output:
  No metrics found matching the specified filter(s) for source=mpstat, type=Busy-CPU
    Regex filter hostname=r/^nonexistent-.*/ did not match any values.
  Please verify:
    1. The regex pattern is correct
    2. Metrics exist for this source/type with the specified field
    3. The field values match the pattern

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implemented aggregation of multiple literal values using the plus (+)
separator, completing the unified syntax design for breakout filters.

Syntax:
- hostname=a,b,c (comma): Returns 3 separate metrics
- hostname=a+b+c (plus): Returns 1 aggregated metric combining a, b, and c

Features:
- Consistent with regex syntax (r vs R parallels , vs +)
- Uses same OpenSearch "terms" filter for both , and +
- Aggregation controlled by getBreakoutAggregation() (excludes field)
- Enhanced error messages for both comma and plus separated filters

Implementation:
- Modified getBreakoutAggregation() to detect + and exclude from aggregation
- Updated query builder to split on + and create terms filter
- Extended error handling to cover literal value filters
- Added comprehensive documentation with examples and feature matrix

Examples:
- --breakout hostname=worker-1+worker-2+worker-3 (aggregated)
- --breakout hostname=worker-1,worker-2,worker-3 (separate)
- --breakout cstype=worker+master (combined metric for both types)

Complete Feature Matrix:
| Syntax | Result |
|--------|--------|
| hostname=a | 1 metric for 'a' |
| hostname=a,b,c | 3 separate metrics |
| hostname=a+b+c | 1 aggregated metric |
| hostname=r/pattern/ | N separate metrics |
| hostname=R/pattern/ | 1 aggregated metric |

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@atheurer atheurer changed the title Enhance --breakout to support multiple values (fixes #110) Enhance --breakout with multiple values, regex patterns, and aggregation (fixes #110) Feb 19, 2026
atheurer and others added 2 commits February 19, 2026 12:20
…c values

Instead of the reverted a+b syntax, document the use of regex alternation
with uppercase R to aggregate specific literal values:

- hostname=R/worker-1|worker-2|worker-3/ aggregates those 3 specific hosts
- This approach works correctly with the existing regex implementation
- Provides the same functionality without additional code complexity

Updated examples and removed references to future a+b+c syntax.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@atheurer atheurer changed the title Enhance --breakout with multiple values, regex patterns, and aggregation (fixes #110) Enhance --breakout with multiple values and regex patterns (fixes #110) Feb 19, 2026
Fixed critical bug where R/pattern/ (aggregated regex) was including ALL
values instead of only those matching the pattern.

Root Cause:
When using R/pattern/, the field is excluded from the aggregation structure
(correct - this causes aggregation). However, the regexp filter was only
applied to the initial aggregation query, not when querying for metric IDs.
This meant the metric ID query had no regexp filter, resulting in ALL
metric IDs being included in the aggregated result.

Solution:
1. Extract regexp filters for aggregated fields (R/pattern/) after aggregation
2. Pass these filters to mgetMetricIdsFromTerms via termsSets
3. Apply the regexp filters when building metric ID queries

Example:
--breakout hostname=R/worker-1|worker-2/
Before: Aggregated ALL hostnames (worker-1, worker-2, worker-3)
After:  Aggregates ONLY worker-1 and worker-2 (correct)

Technical Details:
- Modified getMetricGroupsFromBreakouts to extract and preserve R/ filters
- Modified mgetMetricIdsFromTerms to apply preserved regexp filters
- Filters are added to the query.bool.filter array for metric ID lookups

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Contributor

@k-rister k-rister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is pretty awesome. My only concern is with the r/.../ vs. R/.../ syntax. I'm not sure that I think that makes for a good user interface. My gut says what about something like agg[regate]/.../ vs. disagg[regate]/.../.

I want our interfaces to be as clear as possible.

@atheurer
Copy link
Contributor Author

I think this is pretty awesome. My only concern is with the r/.../ vs. R/.../ syntax. I'm not sure that I think that makes for a good user interface. My gut says what about something like agg[regate]/.../ vs. disagg[regate]/.../.

I want our interfaces to be as clear as possible.

Well, the r|R was meant to signify it was a regex. I don't think agg or disagg keeps that notion

Run prettier --write on modified files to fix CI formatting checks

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@k-rister
Copy link
Contributor

Well, the r|R was meant to signify it was a regex. I don't think agg or disagg keeps that notion

Yeah, by no means is my suggestion a perfect one. I just wish could come up with something that is a more intuitive than r vs. R.

@atheurer atheurer closed this Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhance "--breakout hostname=" to support a list i.e hostname=A,B,C

2 participants