Enhance --breakout with multiple values and regex patterns (fixes #110)#152
Enhance --breakout with multiple values and regex patterns (fixes #110)#152atheurer wants to merge 8 commits intoperftool-incubator:masterfrom
Conversation
Enhanced the --breakout option in get-metric-data.js to support comma-separated values (e.g., --breakout hostname=a,b,c) which returns separate metrics for each specified value. This addresses issue perftool-incubator#110. Changes: - Modified list() parser to distinguish between field separators and value lists - Updated OpenSearch query builder to use "terms" query for multiple values - Added documentation with examples and usage guidelines The implementation maintains backward compatibility and is designed to support future aggregation syntax (e.g., hostname=a+b). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implemented regex pattern matching in breakout filters with two modes: - r/pattern/ (lowercase): Returns separate metrics for each matching value - R/pattern/ (uppercase): Returns single aggregated metric for all matches Features: - Custom delimiter support: use any character after r/R as delimiter (e.g., r/pattern/, r|pattern|, r#pattern#) - Consistent syntax with literal values (r vs R parallels , vs +) - OpenSearch regexp query integration for efficient pattern matching Examples: - --breakout hostname=r/^worker-.*/ (separate metrics per worker) - --breakout hostname=R/^client-.*/ (aggregated metric for all clients) - --breakout dev=r|/dev/sd.*| (custom delimiter for patterns with /) Implementation: - Modified getBreakoutAggregation() to exclude fields with R/pattern/ - Updated getMetricGroupsFromBreakouts() to detect and apply regexp filters - Added comprehensive documentation with examples and use cases Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
When a regex breakout filter (r/pattern/ or R/pattern/) doesn't match
any metric values, the query previously failed with a cryptic error:
"number of generated data sets (0) does not match the number of
metric query sets (1)"
This commit adds detection for empty result sets caused by regex filters
and returns a clear, actionable error message explaining:
- Which source/type was queried
- Which regex filter(s) didn't match
- Suggestions for troubleshooting
Example error output:
No metrics found matching the specified filter(s) for source=mpstat, type=Busy-CPU
Regex filter hostname=r/^nonexistent-.*/ did not match any values.
Please verify:
1. The regex pattern is correct
2. Metrics exist for this source/type with the specified field
3. The field values match the pattern
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implemented aggregation of multiple literal values using the plus (+) separator, completing the unified syntax design for breakout filters. Syntax: - hostname=a,b,c (comma): Returns 3 separate metrics - hostname=a+b+c (plus): Returns 1 aggregated metric combining a, b, and c Features: - Consistent with regex syntax (r vs R parallels , vs +) - Uses same OpenSearch "terms" filter for both , and + - Aggregation controlled by getBreakoutAggregation() (excludes field) - Enhanced error messages for both comma and plus separated filters Implementation: - Modified getBreakoutAggregation() to detect + and exclude from aggregation - Updated query builder to split on + and create terms filter - Extended error handling to cover literal value filters - Added comprehensive documentation with examples and feature matrix Examples: - --breakout hostname=worker-1+worker-2+worker-3 (aggregated) - --breakout hostname=worker-1,worker-2,worker-3 (separate) - --breakout cstype=worker+master (combined metric for both types) Complete Feature Matrix: | Syntax | Result | |--------|--------| | hostname=a | 1 metric for 'a' | | hostname=a,b,c | 3 separate metrics | | hostname=a+b+c | 1 aggregated metric | | hostname=r/pattern/ | N separate metrics | | hostname=R/pattern/ | 1 aggregated metric | Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This reverts commit 297baa5.
…c values Instead of the reverted a+b syntax, document the use of regex alternation with uppercase R to aggregate specific literal values: - hostname=R/worker-1|worker-2|worker-3/ aggregates those 3 specific hosts - This approach works correctly with the existing regex implementation - Provides the same functionality without additional code complexity Updated examples and removed references to future a+b+c syntax. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed critical bug where R/pattern/ (aggregated regex) was including ALL values instead of only those matching the pattern. Root Cause: When using R/pattern/, the field is excluded from the aggregation structure (correct - this causes aggregation). However, the regexp filter was only applied to the initial aggregation query, not when querying for metric IDs. This meant the metric ID query had no regexp filter, resulting in ALL metric IDs being included in the aggregated result. Solution: 1. Extract regexp filters for aggregated fields (R/pattern/) after aggregation 2. Pass these filters to mgetMetricIdsFromTerms via termsSets 3. Apply the regexp filters when building metric ID queries Example: --breakout hostname=R/worker-1|worker-2/ Before: Aggregated ALL hostnames (worker-1, worker-2, worker-3) After: Aggregates ONLY worker-1 and worker-2 (correct) Technical Details: - Modified getMetricGroupsFromBreakouts to extract and preserve R/ filters - Modified mgetMetricIdsFromTerms to apply preserved regexp filters - Filters are added to the query.bool.filter array for metric ID lookups Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
k-rister
left a comment
There was a problem hiding this comment.
I think this is pretty awesome. My only concern is with the r/.../ vs. R/.../ syntax. I'm not sure that I think that makes for a good user interface. My gut says what about something like agg[regate]/.../ vs. disagg[regate]/.../.
I want our interfaces to be as clear as possible.
Well, the r|R was meant to signify it was a regex. I don't think agg or disagg keeps that notion |
Yeah, by no means is my suggestion a perfect one. I just wish could come up with something that is a more intuitive than |
Summary
This PR enhances the
--breakoutoption inget-metric-data.jswith support for multiple values and regex pattern matching, addressing issue #110.Features Implemented
1. Multiple Literal Values
Syntax:
--breakout hostname=a,b,cResult: Returns separate metrics for each value
Use Case: Compare specific hosts or components
2. Regex Pattern Matching
Syntax:
--breakout hostname=r/pattern/→ Separate metrics per match--breakout hostname=R/pattern/→ Aggregated metric for all matchesFeatures:
r|pattern|when pattern contains/R/worker-1|worker-2|worker-3/3. Enhanced Error Handling
Feature: Clear error messages when filters match nothing
Benefit: Actionable feedback instead of cryptic errors
Complete Syntax Matrix
hostname=ahostname=a,b,chostname=r/pattern/hostname=R/pattern/hostname=R/a|b|c/hostnameDesign Philosophy
The implementation uses a consistent, intuitive syntax pattern:
r= Separate/individual metricsR= Aggregated/combined metricsThis provides maximum flexibility while maintaining a clean, learnable interface.
Example Usage
Use Cases
Separate Metrics (
,orr/):Aggregated Metrics (
R/):R/^worker-.*/R/worker-1|worker-2|worker-3/R/eth0|eth1/Technical Implementation
list()functiontermsfilter for multiple values,regexpfor patternsBackward Compatibility
✅ Fully backward compatible - all existing usage patterns work unchanged
Testing
Commits
c879aea- Add support for multiple values in --breakout option54b29df- Add regex pattern support for breakout filters217d77e- Add helpful error message when regex filter matches nothing054ae2e- Revert aggregated literal values (use regex alternation instead)c8ac61c- Update documentation for regex alternation approachCloses #110
🤖 Generated with Claude Code