Skip to content

Conversation

@rioloc
Copy link
Member

@rioloc rioloc commented Dec 10, 2025

This PR introduces the following changes:

ALERTS expression simplified

  • The previous expression used as query param in the query_range endpoint is simplified as
    ALERTS{} . The second half of the query, + on () group_left (component, layer) (absent(meta{layer="${query.layer}", component="${query.component}"}))), was removed because component and layer labels are already present in the incident entity.

Split of query_range request into multiple ones for long queries

  • When the query expression, composed by several ALERTS{} joined via or condition, exceeds a MAX_URL_LENGTH limit, then several queries will be created instead.
  • The fetchDataForIncidentsAndAlerts was updated to support both single and multiple queries. In the case of multiple queries then multiple requests will be triggered to the Prometheus API.

skip alert if no incident is matched

  • Added logic to skip the alert if no matchinIncident is found. This should be an edge case that could happen if an alert is triggered and a new incident metric was not produced yet.

Assisted-By: Claude Code

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 10, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 10, 2025

@rioloc: This pull request references OU-632 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set.

Details

In response to this:

This PR introduces the following changes:

ALERTS expression simplified

  • The previous expression used as query param in the query_range endpoint is simplified as
    ALERTS{} . The second half of the query, + on () group_left (component, layer) (absent(meta{layer="${query.layer}", component="${query.component}"}))), was removed because component and layer labels are already present in the incident entity.

Split of query_range request into multiple ones for long queries

  • When the query expression, composed by several ALERTS{} joined via or condition, exceeds a MAX_URL_LENGTH limit, then several queries will be created instead.
  • The fetchDataForIncidentsAndAlerts was updated to support both single and multiple queries. In the case of multiple queries then multiple requests will be triggered to the Prometheus API.

Assisted-By: Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@rioloc rioloc force-pushed the feat/split_query_range_requests branch from 1b594e8 to 7ca9eb0 Compare December 10, 2025 16:30
@rioloc
Copy link
Member Author

rioloc commented Dec 11, 2025

/retest

@rioloc rioloc force-pushed the feat/split_query_range_requests branch from 756ba1e to a1bef57 Compare December 12, 2025 09:32
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 12, 2025

@rioloc: This pull request references OU-632 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR introduces the following changes:

ALERTS expression simplified

  • The previous expression used as query param in the query_range endpoint is simplified as
    ALERTS{} . The second half of the query, + on () group_left (component, layer) (absent(meta{layer="${query.layer}", component="${query.component}"}))), was removed because component and layer labels are already present in the incident entity.

Split of query_range request into multiple ones for long queries

  • When the query expression, composed by several ALERTS{} joined via or condition, exceeds a MAX_URL_LENGTH limit, then several queries will be created instead.
  • The fetchDataForIncidentsAndAlerts was updated to support both single and multiple queries. In the case of multiple queries then multiple requests will be triggered to the Prometheus API.

skip alert if no incident is matched

Added logic to skip the alert if no matchinIncident is found. This should be an edge case that could happen if an alert is triggered and a new incident metric was not produced yet.

Assisted-By: Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 12, 2025

@rioloc: This pull request references OU-632 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR introduces the following changes:

ALERTS expression simplified

  • The previous expression used as query param in the query_range endpoint is simplified as
    ALERTS{} . The second half of the query, + on () group_left (component, layer) (absent(meta{layer="${query.layer}", component="${query.component}"}))), was removed because component and layer labels are already present in the incident entity.

Split of query_range request into multiple ones for long queries

  • When the query expression, composed by several ALERTS{} joined via or condition, exceeds a MAX_URL_LENGTH limit, then several queries will be created instead.
  • The fetchDataForIncidentsAndAlerts was updated to support both single and multiple queries. In the case of multiple queries then multiple requests will be triggered to the Prometheus API.

skip alert if no incident is matched

  • Added logic to skip the alert if no matchinIncident is found. This should be an edge case that could happen if an alert is triggered and a new incident metric was not produced yet.

Assisted-By: Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@falox falox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of minor points

/lgtm

@DavidRajnoha @rioloc make sure to test what happens when one of the subqueries fails (expected: the parent call fails). Also, make sure that the splitting + merging doesn't affect the ordering of the data points (probably covered in the unit tests, but I comment anyway).

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. and removed lgtm Indicates that a PR is ready to be merged. labels Dec 12, 2025
@rioloc
Copy link
Member Author

rioloc commented Dec 12, 2025

make sure to test what happens when one of the subqueries fails (expected: the parent call fails).

If I well understood the point, that is guaranteed by the Promise.all which fails if any of the promises return an error.

@rioloc
Copy link
Member Author

rioloc commented Dec 12, 2025

make sure that the splitting + merging doesn't affect the ordering of the data points (probably covered in the unit tests, but I comment anyway).

Also for this the order should be guaranteed to be deterministic, so the one of the results of customQuery | string[].
In 6a724e5 I've added another unit to check result ordering is correct

@DavidRajnoha
Copy link
Contributor

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Dec 15, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 15, 2025

@rioloc: This pull request references OU-632 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR introduces the following changes:

ALERTS expression simplified

  • The previous expression used as query param in the query_range endpoint is simplified as
    ALERTS{} . The second half of the query, + on () group_left (component, layer) (absent(meta{layer="${query.layer}", component="${query.component}"}))), was removed because component and layer labels are already present in the incident entity.

Split of query_range request into multiple ones for long queries

  • When the query expression, composed by several ALERTS{} joined via or condition, exceeds a MAX_URL_LENGTH limit, then several queries will be created instead.
  • The fetchDataForIncidentsAndAlerts was updated to support both single and multiple queries. In the case of multiple queries then multiple requests will be triggered to the Prometheus API.

skip alert if no incident is matched

  • Added logic to skip the alert if no matchinIncident is found. This should be an edge case that could happen if an alert is triggered and a new incident metric was not produced yet.

Assisted-By: Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@rioloc rioloc force-pushed the feat/split_query_range_requests branch from 6a724e5 to b9ce5f8 Compare December 15, 2025 15:14
Assisted-By: Claude Code
@rioloc rioloc force-pushed the feat/split_query_range_requests branch from 6dc32da to d0a451c Compare December 17, 2025 16:22
@PeterYurkovich
Copy link
Contributor

/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 17, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: falox, PeterYurkovich, rioloc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 17, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: falox, PeterYurkovich, rioloc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 17, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 17, 2025
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 4b9417f and 2 for PR HEAD d0a451c in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD d693b90 and 1 for PR HEAD d0a451c in total

@PeterYurkovich
Copy link
Contributor

/override ci/prow/okd-scos-images

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 17, 2025

@PeterYurkovich: Overrode contexts on behalf of PeterYurkovich: ci/prow/okd-scos-images

Details

In response to this:

/override ci/prow/okd-scos-images

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@PeterYurkovich
Copy link
Contributor

Currently okd image tests are down across the whole openshift org, overriding the test

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD a243f06 and 0 for PR HEAD d0a451c in total

@openshift-ci-robot
Copy link

/hold

Revision d0a451c was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 17, 2025
@jgbernalp
Copy link
Contributor

/test e2e-monitoring

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 18, 2025

@jgbernalp: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

  • okd-scos-images

Only the following failed contexts/checkruns were expected:

  • ci/prow/e2e-aws-ovn
  • ci/prow/e2e-monitoring
  • ci/prow/images
  • ci/prow/lint
  • ci/prow/okd-scos-images
  • ci/prow/periodics-images
  • ci/prow/translations
  • ci/prow/verify-deps
  • pull-ci-openshift-monitoring-plugin-main-e2e-aws-ovn
  • pull-ci-openshift-monitoring-plugin-main-e2e-monitoring
  • pull-ci-openshift-monitoring-plugin-main-images
  • pull-ci-openshift-monitoring-plugin-main-lint
  • pull-ci-openshift-monitoring-plugin-main-okd-scos-images
  • pull-ci-openshift-monitoring-plugin-main-periodics-images
  • pull-ci-openshift-monitoring-plugin-main-translations
  • pull-ci-openshift-monitoring-plugin-main-verify-deps
  • tide

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

Details

In response to this:

/override okd-scos-images

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jgbernalp
Copy link
Contributor

/override ci/prow/okd-scos-images

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 18, 2025

@jgbernalp: Overrode contexts on behalf of jgbernalp: ci/prow/okd-scos-images

Details

In response to this:

/override ci/prow/okd-scos-images

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jgbernalp
Copy link
Contributor

/test e2e-monitoring

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 18, 2025

@rioloc: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-monitoring d0a451c link false /test e2e-monitoring

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants