Skip to content

Conversation

@adrianmoisey
Copy link
Member

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

At the moment our e2e coverage for feature flags (default to off) isn't great.

My thinking is to copy k/k and have 2 sets of jobs that execute:

  1. e2e tests that exclude tests that have their feature gates set to disabled by default
  2. e2e tests that include tests that require their featute gates to be enabled

I plan to do this using gingko labels.

WithFeatureGate adds a label Feature:OffByDefault which we can use to ignore or include, depending on which e2e test job is running.

If everyone is in agreement that this is a way forward, I'll go update the remaining e2e tests and add the correct feature gate for them.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/cc omerap12
/cc kamarabbas99

(Kam, I've seen you've put the CPU Boost PR up, so I figured I'd tag you on this since you've got recently relevant experience with e2e tests and feature gates, let me know what you think of this change)

@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Oct 23, 2025
@k8s-ci-robot k8s-ci-robot requested a review from omerap12 October 23, 2025 18:44
@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area labels Oct 23, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adrianmoisey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed do-not-merge/needs-area labels Oct 23, 2025
@omerap12
Copy link
Member

This is fine with me.
While you are doing this can you please add some docs for this change? (e.g. how run locally etc )

@adrianmoisey
Copy link
Member Author

While you are doing this can you please add some docs for this change? (e.g. how run locally etc )

Yeah, definitely.
The current script may need a slight tweak, and some docs.

I think I'll do 3 PRs:

  1. This PR to modify tests to have the labels
  2. A PR into test-infra to run the off-by-default tests
  3. Another PR to update local dev so it's easy to run this locally

vpaClientSet vpa_clientset.Interface
)
ginkgo.BeforeEach(func() {
checkPerVPAConfigTestsEnabled(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can remove this function since its not used anywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, yes, good callout.

@omerap12
Copy link
Member

/retitle WIP: Switch e2e test with feature flag to use framework.WithFeatureGate

@k8s-ci-robot k8s-ci-robot changed the title Switch e2e test with feature flag to use framework.WithFeatureGate WIP: Switch e2e test with feature flag to use framework.WithFeatureGate Oct 23, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 23, 2025
@omerap12
Copy link
Member

While you are doing this can you please add some docs for this change? (e.g. how run locally etc )

Yeah, definitely. The current script may need a slight tweak, and some docs.

I think I'll do 3 PRs:

  1. This PR to modify tests to have the labels
  2. A PR into test-infra to run the off-by-default tests
  3. Another PR to update local dev so it's easy to run this locally

Sounds good. ping when ready :)

At the moment our e2e coverage for feature flags (default to off) isn't
great.

My thinking is to copy k/k and have 2 sets of jobs that execute:
1. e2e tests that exclude tests that have their feature gates set to disabled by default
2. e2e tests that include tests that require their featute gates to be enabled

I plan to do this using gingko labels.

WithFeatureGate adds a label `Feature:OffByDefault` which we can use to
ignore or include, depending on which e2e test job is running.
@adrianmoisey adrianmoisey force-pushed the e2e-tests-feature-gate branch from b5b4f3e to 06db22e Compare October 24, 2025 11:13
Copy link
Member

@maxcao13 maxcao13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot cleaner than what I cooked up originally on this, thanks @adrianmoisey! Makes sense to me.

webhookName = "vpa.k8s.io"
)

var _ = AdmissionControllerE2eDescribe("Admission-controller", ginkgo.Label("FG:InPlaceOrRecreate"), func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these labels just removed then? Does the WithFeatureGate function add something like this for us?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, it adds these:

[FeatureGate:PerVPAConfig] [Alpha] [Feature:OffByDefault]

For beta it just adds the FeatureGate and Beta, it doesn't add an "[Feature:OnByDefault]"

@kamarabbas99
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 24, 2025
@adrianmoisey
Copy link
Member Author

/hold

I realise I still need to handle the logic of selecting which tests to run in this PR

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 24, 2025
@adrianmoisey adrianmoisey changed the title WIP: Switch e2e test with feature flag to use framework.WithFeatureGate Switch e2e test with feature flag to use framework.WithFeatureGate Oct 25, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 25, 2025
@adrianmoisey
Copy link
Member Author

I've discovered that on a PR create, only the 'full-vpa' e2e test is run.
On a master merge, actuation, admission-controller, full-vpa, recommender and updater are run

Before merging this PR, I'm going to change this to:
On PR a single job will run that runs all of these (actuation, admission-controller, full-vpa, recommender and updater)
That same single job is re-run on master

Then I'll come back to this PR, and make an additional job that also runs on PR and master merge, but will enable all feature gates.

However, if `TEST_WITH_FEATURE_GATES_ENABLED=true` is set, then all
feature gates are enabled and their tests will be run
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 25, 2025
@adrianmoisey
Copy link
Member Author

I've discovered that on a PR create, only the 'full-vpa' e2e test is run. On a master merge, actuation, admission-controller, full-vpa, recommender and updater are run

Before merging this PR, I'm going to change this to: On PR a single job will run that runs all of these (actuation, admission-controller, full-vpa, recommender and updater) That same single job is re-run on master

Then I'll come back to this PR, and make an additional job that also runs on PR and master merge, but will enable all feature gates.

I don't think this is possible at the moment.
Some of the tests seem to require situations that we can't provide easily with all 3 components running.

ie:

ginkgo.It("doesn't evict pods when Admission Controller status unavailable", func() {
podList := setupPodsForUpscalingEviction(f)
ginkgo.By(fmt.Sprintf("Waiting for pods to be evicted, hoping it won't happen, sleep for %s", VpaEvictionTimeout.String()))
CheckNoPodsEvicted(f, MakePodSet(podList))
})

If the admission-controller is running here, the test fails.


Another problem is that the admission-controller e2e tests are currently failing on master, due to some tests not being guarded for a feature gate.

it think I'll continue on this path... the plan is:

  1. Get this PR into a place where all e2e tests are passing locally, along with the necessary WithFeatureGate() guards
  2. Expand each job into a "with feature gate enabled" and "without feature gate enabled" variant. We'll end up with 5 x 2 e2e tests (recommender, updater, admission-controller, actuation, full-vpa * AllGatesEnabled, default.

export FEATURE_GATES=""
export TEST_WITH_FEATURE_GATES_ENABLED=""

if [ "${ENABLE_ALL_FEATURE_GATES:-}" == "yes" ] ; then
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to document this, but you can run with gates enabled like this:
ENABLE_ALL_FEATURE_GATES=yes ./hack/run-e2e-locally.sh updater

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would true be better here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, thanks. I had considered it, but forgot to make that change. Let me push something

@kamarabbas99
Copy link
Contributor

@maxcao13 feel free to lgtm this since you already reviewed it!!

}
}
}`,
expectedErr: "spec.resourcePolicy.containerPolicies[0].oomBumpUpRatio: Invalid value: -1: spec.resourcePolicy.containerPolicies[0].oomBumpUpRatio in body should be greater than or equal to 1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused on this change of the test. Is this feature deprecated, or am I getting confused by a this git diff, or something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a bit confusing because of the diff.

Basically, there are 3 steps:

  1. The original tests, that didn't require a feature gate
  2. Those tests got added too with some that did require a feature gate
  3. Now my change, where I split out the feature gated tests.

Here are links to each phase, to try demonstrate it:

  1. Here is the before, where none of the tests required a gate to be enabled:
    ginkgo.It("accepts valid and rejects invalid VPA object", func() {
    ginkgo.By("Setting up valid VPA object")
    validVPA := []byte(`{
    "kind": "VerticalPodAutoscaler",
    "apiVersion": "autoscaling.k8s.io/v1",
    "metadata": {"name": "hamster-vpa-valid"},
    "spec": {
    "targetRef": {
    "apiVersion": "apps/v1",
    "kind": "Deployment",
    "name":"hamster"
    },
    "resourcePolicy": {
    "containerPolicies": [{"containerName": "*", "minAllowed":{"cpu":"50m"}}]
    }
    }
    }`)
    err := InstallRawVPA(f, validVPA)
    gomega.Expect(err).NotTo(gomega.HaveOccurred(), "Valid VPA object rejected")
    ginkgo.By("Setting up invalid VPA object")
    // The invalid object differs by name and minAllowed - there is an invalid "requests" field.
    invalidVPA := []byte(`{
    "kind": "VerticalPodAutoscaler",
    "apiVersion": "autoscaling.k8s.io/v1",
    "metadata": {"name": "hamster-vpa-invalid"},
    "spec": {
    "targetRef": {
    "apiVersion": "apps/v1",
    "kind": "Deployment",
    "name":"hamster"
    },
    "resourcePolicy": {
    "containerPolicies": [{"containerName": "*", "minAllowed":{"requests":{"cpu":"50m"}}}]
    }
    }
    }`)
    err2 := InstallRawVPA(f, invalidVPA)
    gomega.Expect(err2).To(gomega.HaveOccurred(), "Invalid VPA object accepted")
    gomega.Expect(err2.Error()).To(gomega.MatchRegexp(`.*admission webhook .*vpa.* denied the request: .*`))
    })
  2. Here is the expanded list, with some tests that require the gate to be enabled:
    ginkgo.It("accepts valid and rejects invalid VPA object", func() {
    ginkgo.By("Setting up valid VPA object")
    validVPA := []byte(`{
    "kind": "VerticalPodAutoscaler",
    "apiVersion": "autoscaling.k8s.io/v1",
    "metadata": {"name": "hamster-vpa-valid"},
    "spec": {
    "targetRef": {
    "apiVersion": "apps/v1",
    "kind": "Deployment",
    "name":"hamster"
    },
    "resourcePolicy": {
    "containerPolicies": [{"containerName": "*", "minAllowed":{"cpu":"50m"}}]
    }
    }
    }`)
    err := InstallRawVPA(f, validVPA)
    gomega.Expect(err).NotTo(gomega.HaveOccurred(), "Valid VPA object rejected")
    ginkgo.By("Setting up invalid VPA objects")
    testCases := []struct {
    name string
    vpaJSON string
    expectedErr string
    }{
    {
    name: "Invalid oomBumpUpRatio (negative value)",
    vpaJSON: `{
    "apiVersion": "autoscaling.k8s.io/v1",
    "kind": "VerticalPodAutoscaler",
    "metadata": {"name": "oom-test-vpa"},
    "spec": {
    "targetRef": {
    "apiVersion": "apps/v1",
    "kind": "Deployment",
    "name": "oom-test"
    },
    "updatePolicy": {
    "updateMode": "Auto"
    },
    "resourcePolicy": {
    "containerPolicies": [{
    "containerName": "*",
    "oomBumpUpRatio": -1,
    "oomMinBumpUp": 104857600
    }]
    }
    }
    }`,
    expectedErr: "spec.resourcePolicy.containerPolicies[0].oomBumpUpRatio: Invalid value: -1: spec.resourcePolicy.containerPolicies[0].oomBumpUpRatio in body should be greater than or equal to 1",
    },
    {
    name: "Invalid oomBumpUpRatio (string value)",
    vpaJSON: `{
    "apiVersion": "autoscaling.k8s.io/v1",
    "kind": "VerticalPodAutoscaler",
    "metadata": {"name": "oom-test-vpa"},
    "spec": {
    "targetRef": {
    "apiVersion": "apps/v1",
    "kind": "Deployment",
    "name": "oom-test"
    },
    "updatePolicy": {
    "updateMode": "Auto"
    },
    "resourcePolicy": {
    "containerPolicies": [{
    "containerName": "*",
    "oomBumpUpRatio": "12",
    "oomMinBumpUp": 104857600
    }]
    }
    }
    }`,
    expectedErr: "json: cannot unmarshal string into Go struct field ContainerResourcePolicy.spec.resourcePolicy.containerPolicies.oomBumpUpRatio of type float64",
    },
    {
    name: "Invalid oomBumpUpRatio (less than 1)",
    vpaJSON: `{
    "apiVersion": "autoscaling.k8s.io/v1",
    "kind": "VerticalPodAutoscaler",
    "metadata": {"name": "oom-test-vpa"},
    "spec": {
    "targetRef": {
    "apiVersion": "apps/v1",
    "kind": "Deployment",
    "name": "oom-test"
    },
    "updatePolicy": {
    "updateMode": "Auto"
    },
    "resourcePolicy": {
    "containerPolicies": [{
    "containerName": "*",
    "oomBumpUpRatio": 0.5,
    "oomMinBumpUp": 104857600
    }]
    }
    }
    }`,
    expectedErr: "spec.resourcePolicy.containerPolicies[0].oomBumpUpRatio: Invalid value: 0.5: spec.resourcePolicy.containerPolicies[0].oomBumpUpRatio in body should be greater than or equal to 1",
    },
    {
    name: "Invalid oomMinBumpUp (negative value)",
    vpaJSON: `{
    "apiVersion": "autoscaling.k8s.io/v1",
    "kind": "VerticalPodAutoscaler",
    "metadata": {"name": "oom-test-vpa"},
    "spec": {
    "targetRef": {
    "apiVersion": "apps/v1",
    "kind": "Deployment",
    "name": "oom-test"
    },
    "updatePolicy": {
    "updateMode": "Auto"
    },
    "resourcePolicy": {
    "containerPolicies": [{
    "containerName": "*",
    "oomBumpUpRatio": 2,
    "oomMinBumpUp": -1
    }]
    }
    }
    }`,
    expectedErr: "spec.resourcePolicy.containerPolicies[0].oomMinBumpUp: Invalid value: -1: spec.resourcePolicy.containerPolicies[0].oomMinBumpUp in body should be greater than or equal to 0",
    },
    {
    name: "Invalid minAllowed (invalid requests field)",
    vpaJSON: `{
    "apiVersion": "autoscaling.k8s.io/v1",
    "kind": "VerticalPodAutoscaler",
    "metadata": {"name": "hamster-vpa-invalid"},
    "spec": {
    "targetRef": {
    "apiVersion": "apps/v1",
    "kind": "Deployment",
    "name": "hamster"
    },
    "resourcePolicy": {
    "containerPolicies": [{
    "containerName": "*",
    "minAllowed": {
    "requests": {
    "cpu": "50m"
    }
    }
    }]
    }
    }
    }`,
    expectedErr: "admission webhook .*vpa.* denied the request:",
    },
    }
    for _, tc := range testCases {
    ginkgo.By(fmt.Sprintf("Testing %s", tc.name))
    err := InstallRawVPA(f, []byte(tc.vpaJSON))
    gomega.Expect(err).To(gomega.HaveOccurred(), "Invalid VPA object accepted")
    gomega.Expect(err.Error()).To(gomega.MatchRegexp(tc.expectedErr))
    }
    })
  3. Here's the final stage (ie: this PR) where the tests that require the gate to be enabled are split out: https://github.com/adrianmoisey/autoscaler/blob/9c0332a853d944d87b31f4b1784b4d3d6878a99f/vertical-pod-autoscaler/e2e/v1/admission_controller.go#L864-L1043

This also changes a few expectedErr attributes, since they were incorrect when they were initially added (since we don't run these tests on PR, so that wasn't caught when they were added).

I hope that explains what I was aiming for.

Copy link
Member

@maxcao13 maxcao13 Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see whats going on now. I haven't been keeping up with the changes to these tests that much, but I remember I gated only a subset of the admission-controller tests, and later I'm assuming we started InPlaceOrRecreate gating all the tests because of the beta promotion, so now this is separating what needs to be gated now.

Thanks for the explanation :-)

@maxcao13
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 28, 2025
@adrianmoisey
Copy link
Member Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 28, 2025
@k8s-ci-robot k8s-ci-robot merged commit 57e9a05 into kubernetes:master Oct 28, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants