Skip to content

Conversation

@mkowalski
Copy link
Contributor

I have discovered that at times it happens that I see Machine is in "" in the reported intervals. This is not correct and may be caused by the fact that phase is defined as an optional Phase *string json:"phase,omitempty" field so technically an empty value is correct one there.

In any case, for constructing interval I would prefer to explicitly see "missing" message instead of empty value.

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 11, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mkowalski
Once this PR has been reviewed and has the lgtm label, please assign sosiouxme for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mkowalski
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-ipv6

@mkowalski
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 11, 2025
@mkowalski
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-ipv6

I have discovered that at times it happens that I see `Machine is in ""`
in the reported intervals. This is not correct and may be caused by the
fact that phase is defined as an optional `Phase *string
json:"phase,omitempty"` field so technically an empty value is correct
one there.

In any case, for constructing interval I would prefer to explicitly see
"missing" message instead of empty value.
@mkowalski mkowalski force-pushed the monitortest-fix-empty-value branch from 7e4350a to 7c18ec9 Compare December 17, 2025 08:01
@mkowalski
Copy link
Contributor Author

/test e2e-aws-ovn-fips

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 17, 2025

@mkowalski: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-images 7c18ec9 link true /test okd-scos-images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@mkowalski
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-gcp-ovn-serial

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 17, 2025

@mkowalski: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-gcp-ovn-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/3326e9e0-db45-11f0-8cd1-6fa5d46e738d-0

@dgoodwin
Copy link
Contributor

Things we learned on a debugging call:

  • The point in time observations are created in machine.go, we later interpret them in monitortest.go to create the full interval with start-end times that gets displayed.
  • For core nodes created by installer, our watch is not running until testing starts after install, we cannot observe the provisioning phase as a result.
  • The code attempts to look back when it does start up and first sees the pre-existing machines using the machine resource creation timestamp. This is why we see a running interval looking back well before testing started.
  • Displayed Interval is created using previous state because we must have an end time for our interval, the end time is our trigger, thus we are creating an interval for the prior state.
  • Similarly for core nodes, our watch is NOT running when they are deleted, we never get an event for these machines after testing starts. Somewhere here this is the bug why the core machine running intervals seem to stop immediately when testing starts. Fixing this may require leaving some kind of dangling interval open which will be automatically closed during shutdown, a mechanism which exists already in the code in a couple places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants