Skip to content

Conversation

@stleerh
Copy link
Contributor

@stleerh stleerh commented Dec 16, 2025

No description provided.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 16, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign moadz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jotak
Copy link
Contributor

jotak commented Dec 16, 2025

/cc

@openshift-ci openshift-ci bot requested a review from jotak December 16, 2025 16:22

Being able to manage and observe the network in an OpenShift cluster is critical in maintaining the health and integrity of the network. Without it, there’s no way to verify whether your changes are working as expected or whether your network is experiencing issues.

Currently, Network Observability is an optional operator that many customers are not aware of. A majority of customers using OpenShift Networking do not have Network Observability installed. Customers are missing out on features that they should have and have already paid for.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from telemetry, what's the percentage of clusters with NOO installed today?


This adds the installNetworkObservability field in the Network CRD under the spec section. See Listing 2 above.

### Topology Considerations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single node OpenShift (SNO) clusters have strict requirements in terms of resource usage: is it ok that NOO gets installed by default in this case?

### Risks and Mitigations

* Network Observability requires CPU, memory, and storage that the customer might not be aware of.
Mitigation: The default setting stores only metrics at a high sampling interval to minimize the use of resources. If this isn’t sufficient, more fine-tuning and filtering can be done in the provided default configuration (e.g. filtering on specific interfaces only).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single node OpenShift (SNO) clusters have strict requirements in terms of resource usage: is it ok that NOO gets installed by default in this case?


### Risks and Mitigations

* Network Observability requires CPU, memory, and storage that the customer might not be aware of.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get an estimate of the overhead running NOO with a "stripped-down" flow collector on a typical CI cluster (3 control plane nodes + 3 workers)?


Summary:

* Sampling at 400
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it depend on the in-cluster Prometheus stack? If yes how does it play with #1880?


### Topology Considerations

All topologies are supported where CNO is supported, so this excludes MicroShift.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we describe briefly how it works for HyperShift?

4. Wait for NOO to be ready and the OpenShift web console to be available.
5. Create the "netobserv" namespace if it doesn't exist.
6. Check if a FlowCollector instance exists. If yes, exit.
7. Create a FlowCollector instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concretely what gets deployed in terms of pods and services? Regarding pods, what level of customization is offered in terms of resource requests/limits and scheduling (infrastructure nodes)?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 18, 2025

@stleerh: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint cd3b29b link true /test markdownlint

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.


Rather than actually installing NOO and creating the FlowCollector instance, it is less risky and simpler to just display a panel or a button to let the user install and enable Network Observability. This resolves the awareness issue. However, by doing this, it will get much less installs compared to making it enabled by default. It goes against the principle that networking and network observability should always go hand in hand and be there from the start.

## Alternatives (Not Implemented)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was installation via the assisted-installer considered? https://github.com/openshift/assisted-service/blob/master/docs/dev/olm-operator-plugins.md

This seems like a viable option to mitigate the drawbacks around topologies and resource constraints.


## Motivation

Network Observability is an optional OLM operator that collects and stores traffic flow information and provides insights into your network traffic, including troubleshooting features like packet drops, latencies, DNS tracking, and more.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which of this functionality applies to single node deployment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants