Switch e2e test with feature flag to use framework.WithFeatureGate #8684

adrianmoisey · 2025-10-23T18:44:22Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

At the moment our e2e coverage for feature flags (default to off) isn't great.

My thinking is to copy k/k and have 2 sets of jobs that execute:

e2e tests that exclude tests that have their feature gates set to disabled by default
e2e tests that include tests that require their featute gates to be enabled

I plan to do this using gingko labels.

WithFeatureGate adds a label Feature:OffByDefault which we can use to ignore or include, depending on which e2e test job is running.

If everyone is in agreement that this is a way forward, I'll go update the remaining e2e tests and add the correct feature gate for them.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

/cc omerap12
/cc kamarabbas99

(Kam, I've seen you've put the CPU Boost PR up, so I figured I'd tag you on this since you've got recently relevant experience with e2e tests and feature gates, let me know what you think of this change)

k8s-ci-robot · 2025-10-23T18:44:34Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adrianmoisey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~vertical-pod-autoscaler/OWNERS~~ [adrianmoisey]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

omerap12 · 2025-10-23T18:48:11Z

This is fine with me.
While you are doing this can you please add some docs for this change? (e.g. how run locally etc )

adrianmoisey · 2025-10-23T18:51:49Z

While you are doing this can you please add some docs for this change? (e.g. how run locally etc )

Yeah, definitely.
The current script may need a slight tweak, and some docs.

I think I'll do 3 PRs:

This PR to modify tests to have the labels
A PR into test-infra to run the off-by-default tests
Another PR to update local dev so it's easy to run this locally

kamarabbas99 · 2025-10-23T19:00:41Z

vertical-pod-autoscaler/e2e/v1/recommender.go

-		vpaClientSet vpa_clientset.Interface
-	)
-	ginkgo.BeforeEach(func() {
-		checkPerVPAConfigTestsEnabled(f)


maybe we can remove this function since its not used anywhere?

Whoops, yes, good callout.

omerap12 · 2025-10-23T19:05:01Z

/retitle WIP: Switch e2e test with feature flag to use framework.WithFeatureGate

omerap12 · 2025-10-23T19:07:50Z

While you are doing this can you please add some docs for this change? (e.g. how run locally etc )

Yeah, definitely. The current script may need a slight tweak, and some docs.

I think I'll do 3 PRs:

This PR to modify tests to have the labels

A PR into test-infra to run the off-by-default tests

Another PR to update local dev so it's easy to run this locally

Sounds good. ping when ready :)

At the moment our e2e coverage for feature flags (default to off) isn't great. My thinking is to copy k/k and have 2 sets of jobs that execute: 1. e2e tests that exclude tests that have their feature gates set to disabled by default 2. e2e tests that include tests that require their featute gates to be enabled I plan to do this using gingko labels. WithFeatureGate adds a label `Feature:OffByDefault` which we can use to ignore or include, depending on which e2e test job is running.

maxcao13

A lot cleaner than what I cooked up originally on this, thanks @adrianmoisey! Makes sense to me.

maxcao13 · 2025-10-24T18:45:13Z

vertical-pod-autoscaler/e2e/v1/admission_controller.go

 	webhookName       = "vpa.k8s.io"
 )

-var _ = AdmissionControllerE2eDescribe("Admission-controller", ginkgo.Label("FG:InPlaceOrRecreate"), func() {


Are these labels just removed then? Does the WithFeatureGate function add something like this for us?

Correct, it adds these:

[FeatureGate:PerVPAConfig] [Alpha] [Feature:OffByDefault]

For beta it just adds the FeatureGate and Beta, it doesn't add an "[Feature:OnByDefault]"

kamarabbas99 · 2025-10-24T19:18:49Z

/lgtm

adrianmoisey · 2025-10-24T19:23:38Z

/hold

I realise I still need to handle the logic of selecting which tests to run in this PR

adrianmoisey · 2025-10-25T13:04:03Z

I've discovered that on a PR create, only the 'full-vpa' e2e test is run.
On a master merge, actuation, admission-controller, full-vpa, recommender and updater are run

Before merging this PR, I'm going to change this to:
On PR a single job will run that runs all of these (actuation, admission-controller, full-vpa, recommender and updater)
That same single job is re-run on master

Then I'll come back to this PR, and make an additional job that also runs on PR and master merge, but will enable all feature gates.

However, if `TEST_WITH_FEATURE_GATES_ENABLED=true` is set, then all feature gates are enabled and their tests will be run

adrianmoisey · 2025-10-25T18:23:03Z

I've discovered that on a PR create, only the 'full-vpa' e2e test is run. On a master merge, actuation, admission-controller, full-vpa, recommender and updater are run

Before merging this PR, I'm going to change this to: On PR a single job will run that runs all of these (actuation, admission-controller, full-vpa, recommender and updater) That same single job is re-run on master

Then I'll come back to this PR, and make an additional job that also runs on PR and master merge, but will enable all feature gates.

I don't think this is possible at the moment.
Some of the tests seem to require situations that we can't provide easily with all 3 components running.

ie:

autoscaler/vertical-pod-autoscaler/e2e/v1/updater.go

Lines 136 to 141 in a2934f7

    
           ginkgo.It("doesn't evict pods when Admission Controller status unavailable", func() { 
        
           	podList := setupPodsForUpscalingEviction(f) 
        
           	ginkgo.By(fmt.Sprintf("Waiting for pods to be evicted, hoping it won't happen, sleep for %s", VpaEvictionTimeout.String())) 
        
           	CheckNoPodsEvicted(f, MakePodSet(podList)) 
        
           })

If the admission-controller is running here, the test fails.

Another problem is that the admission-controller e2e tests are currently failing on master, due to some tests not being guarded for a feature gate.

it think I'll continue on this path... the plan is:

Get this PR into a place where all e2e tests are passing locally, along with the necessary WithFeatureGate() guards
Expand each job into a "with feature gate enabled" and "without feature gate enabled" variant. We'll end up with 5 x 2 e2e tests (recommender, updater, admission-controller, actuation, full-vpa * AllGatesEnabled, default.

adrianmoisey · 2025-10-26T13:00:38Z

vertical-pod-autoscaler/hack/run-e2e-locally.sh

+export FEATURE_GATES=""
+export TEST_WITH_FEATURE_GATES_ENABLED=""
+
+if [ "${ENABLE_ALL_FEATURE_GATES:-}" == "yes" ] ; then


I still need to document this, but you can run with gates enabled like this:
ENABLE_ALL_FEATURE_GATES=yes ./hack/run-e2e-locally.sh updater

Would true be better here?

Oh yes, thanks. I had considered it, but forgot to make that change. Let me push something

kamarabbas99 · 2025-10-27T20:18:14Z

@maxcao13 feel free to lgtm this since you already reviewed it!!

maxcao13 · 2025-10-27T20:37:04Z

vertical-pod-autoscaler/e2e/v1/admission_controller.go

                }
            }
        }`,
-				expectedErr: "spec.resourcePolicy.containerPolicies[0].oomBumpUpRatio: Invalid value: -1: spec.resourcePolicy.containerPolicies[0].oomBumpUpRatio in body should be greater than or equal to 1",


I'm a little confused on this change of the test. Is this feature deprecated, or am I getting confused by a this git diff, or something else?

Yeah, it's a bit confusing because of the diff.

Basically, there are 3 steps:

The original tests, that didn't require a feature gate

Those tests got added too with some that did require a feature gate

Now my change, where I split out the feature gated tests.

Here are links to each phase, to try demonstrate it:

Here is the before, where none of the tests required a gate to be enabled:

autoscaler/vertical-pod-autoscaler/e2e/v1/admission_controller.go

Lines 854 to 894 in 2eb783b

ginkgo.It("accepts valid and rejects invalid VPA object", func() {

ginkgo.By("Setting up valid VPA object")

validVPA := []byte(`{

"kind": "VerticalPodAutoscaler",

"apiVersion": "autoscaling.k8s.io/v1",

"metadata": {"name": "hamster-vpa-valid"},

"spec": {

"targetRef": {

"apiVersion": "apps/v1",

"kind": "Deployment",

"name":"hamster"

},

"resourcePolicy": {

"containerPolicies": [{"containerName": "*", "minAllowed":{"cpu":"50m"}}]

}

}

}`)

err := InstallRawVPA(f, validVPA)

gomega.Expect(err).NotTo(gomega.HaveOccurred(), "Valid VPA object rejected")

ginkgo.By("Setting up invalid VPA object")

// The invalid object differs by name and minAllowed - there is an invalid "requests" field.

invalidVPA := []byte(`{

"kind": "VerticalPodAutoscaler",

"apiVersion": "autoscaling.k8s.io/v1",

"metadata": {"name": "hamster-vpa-invalid"},

"spec": {

"targetRef": {

"apiVersion": "apps/v1",

"kind": "Deployment",

"name":"hamster"

},

"resourcePolicy": {

"containerPolicies": [{"containerName": "*", "minAllowed":{"requests":{"cpu":"50m"}}}]

}

}

}`)

err2 := InstallRawVPA(f, invalidVPA)

gomega.Expect(err2).To(gomega.HaveOccurred(), "Invalid VPA object accepted")

gomega.Expect(err2.Error()).To(gomega.MatchRegexp(`.*admission webhook .*vpa.* denied the request: .*`))

})

Here is the expanded list, with some tests that require the gate to be enabled:

autoscaler/vertical-pod-autoscaler/e2e/v1/admission_controller.go

Lines 854 to 1017 in 48dfe75

ginkgo.It("accepts valid and rejects invalid VPA object", func() {

ginkgo.By("Setting up valid VPA object")

validVPA := []byte(`{

"kind": "VerticalPodAutoscaler",

"apiVersion": "autoscaling.k8s.io/v1",

"metadata": {"name": "hamster-vpa-valid"},

"spec": {

"targetRef": {

"apiVersion": "apps/v1",

"kind": "Deployment",

"name":"hamster"

},

"resourcePolicy": {

"containerPolicies": [{"containerName": "*", "minAllowed":{"cpu":"50m"}}]

}

}

}`)

err := InstallRawVPA(f, validVPA)

gomega.Expect(err).NotTo(gomega.HaveOccurred(), "Valid VPA object rejected")

ginkgo.By("Setting up invalid VPA objects")

testCases := []struct {

name string

vpaJSON string

expectedErr string

}{

{

name: "Invalid oomBumpUpRatio (negative value)",

vpaJSON: `{

"apiVersion": "autoscaling.k8s.io/v1",

"kind": "VerticalPodAutoscaler",

"metadata": {"name": "oom-test-vpa"},

"spec": {

"targetRef": {

"apiVersion": "apps/v1",

"kind": "Deployment",

"name": "oom-test"

},

"updatePolicy": {

"updateMode": "Auto"

},

"resourcePolicy": {

"containerPolicies": [{

"containerName": "*",

"oomBumpUpRatio": -1,

"oomMinBumpUp": 104857600

}]

}

}

}`,

expectedErr: "spec.resourcePolicy.containerPolicies[0].oomBumpUpRatio: Invalid value: -1: spec.resourcePolicy.containerPolicies[0].oomBumpUpRatio in body should be greater than or equal to 1",

},

{

name: "Invalid oomBumpUpRatio (string value)",

vpaJSON: `{

"apiVersion": "autoscaling.k8s.io/v1",

"kind": "VerticalPodAutoscaler",

"metadata": {"name": "oom-test-vpa"},

"spec": {

"targetRef": {

"apiVersion": "apps/v1",

"kind": "Deployment",

"name": "oom-test"

},

"updatePolicy": {

"updateMode": "Auto"

},

"resourcePolicy": {

"containerPolicies": [{

"containerName": "*",

"oomBumpUpRatio": "12",

"oomMinBumpUp": 104857600

}]

}

}

}`,

expectedErr: "json: cannot unmarshal string into Go struct field ContainerResourcePolicy.spec.resourcePolicy.containerPolicies.oomBumpUpRatio of type float64",

},

{

name: "Invalid oomBumpUpRatio (less than 1)",

vpaJSON: `{

"apiVersion": "autoscaling.k8s.io/v1",

"kind": "VerticalPodAutoscaler",

"metadata": {"name": "oom-test-vpa"},

"spec": {

"targetRef": {

"apiVersion": "apps/v1",

"kind": "Deployment",

"name": "oom-test"

},

"updatePolicy": {

"updateMode": "Auto"

},

"resourcePolicy": {

"containerPolicies": [{

"containerName": "*",

"oomBumpUpRatio": 0.5,

"oomMinBumpUp": 104857600

}]

}

}

}`,

expectedErr: "spec.resourcePolicy.containerPolicies[0].oomBumpUpRatio: Invalid value: 0.5: spec.resourcePolicy.containerPolicies[0].oomBumpUpRatio in body should be greater than or equal to 1",

},

{

name: "Invalid oomMinBumpUp (negative value)",

vpaJSON: `{

"apiVersion": "autoscaling.k8s.io/v1",

"kind": "VerticalPodAutoscaler",

"metadata": {"name": "oom-test-vpa"},

"spec": {

"targetRef": {

"apiVersion": "apps/v1",

"kind": "Deployment",

"name": "oom-test"

},

"updatePolicy": {

"updateMode": "Auto"

},

"resourcePolicy": {

"containerPolicies": [{

"containerName": "*",

"oomBumpUpRatio": 2,

"oomMinBumpUp": -1

}]

}

}

}`,

expectedErr: "spec.resourcePolicy.containerPolicies[0].oomMinBumpUp: Invalid value: -1: spec.resourcePolicy.containerPolicies[0].oomMinBumpUp in body should be greater than or equal to 0",

},

{

name: "Invalid minAllowed (invalid requests field)",

vpaJSON: `{

"apiVersion": "autoscaling.k8s.io/v1",

"kind": "VerticalPodAutoscaler",

"metadata": {"name": "hamster-vpa-invalid"},

"spec": {

"targetRef": {

"apiVersion": "apps/v1",

"kind": "Deployment",

"name": "hamster"

},

"resourcePolicy": {

"containerPolicies": [{

"containerName": "*",

"minAllowed": {

"requests": {

"cpu": "50m"

}

}

}]

}

}

}`,

expectedErr: "admission webhook .*vpa.* denied the request:",

},

}

for _, tc := range testCases {

ginkgo.By(fmt.Sprintf("Testing %s", tc.name))

err := InstallRawVPA(f, []byte(tc.vpaJSON))

gomega.Expect(err).To(gomega.HaveOccurred(), "Invalid VPA object accepted")

gomega.Expect(err.Error()).To(gomega.MatchRegexp(tc.expectedErr))

}

})

Here's the final stage (ie: this PR) where the tests that require the gate to be enabled are split out: https://github.com/adrianmoisey/autoscaler/blob/9c0332a853d944d87b31f4b1784b4d3d6878a99f/vertical-pod-autoscaler/e2e/v1/admission_controller.go#L864-L1043

This also changes a few expectedErr attributes, since they were incorrect when they were initially added (since we don't run these tests on PR, so that wasn't caught when they were added).

I hope that explains what I was aiming for.

Ah, I see whats going on now. I haven't been keeping up with the changes to these tests that much, but I remember I gated only a subset of the admission-controller tests, and later I'm assuming we started InPlaceOrRecreate gating all the tests because of the beta promotion, so now this is separating what needs to be gated now.

Thanks for the explanation :-)

maxcao13 · 2025-10-28T17:01:39Z

/lgtm

adrianmoisey · 2025-10-28T18:59:12Z

/unhold

k8s-ci-robot requested a review from kamarabbas99 October 23, 2025 18:44

k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Oct 23, 2025

k8s-ci-robot requested a review from omerap12 October 23, 2025 18:44

k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area labels Oct 23, 2025

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed do-not-merge/needs-area labels Oct 23, 2025

kamarabbas99 reviewed Oct 23, 2025

View reviewed changes

k8s-ci-robot changed the title ~~Switch e2e test with feature flag to use framework.WithFeatureGate~~ WIP: Switch e2e test with feature flag to use framework.WithFeatureGate Oct 23, 2025

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 23, 2025

omerap12 mentioned this pull request Oct 23, 2025

Add PerVPAConfig feature gate kubernetes/test-infra#35749

Closed

adrianmoisey force-pushed the e2e-tests-feature-gate branch from b5b4f3e to 06db22e Compare October 24, 2025 11:13

maxcao13 reviewed Oct 24, 2025

View reviewed changes

k8s-ci-robot assigned kamarabbas99 Oct 24, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 24, 2025

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 24, 2025

adrianmoisey changed the title ~~WIP: Switch e2e test with feature flag to use framework.WithFeatureGate~~ Switch e2e test with feature flag to use framework.WithFeatureGate Oct 25, 2025

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 25, 2025

Make e2e tests skip all featuregate tests

1ad66f3

However, if `TEST_WITH_FEATURE_GATES_ENABLED=true` is set, then all feature gates are enabled and their tests will be run

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 25, 2025

Split out features.PerVPAConfig tests

ee87745

adrianmoisey commented Oct 26, 2025

View reviewed changes

Prefer true

9c0332a

maxcao13 reviewed Oct 27, 2025

View reviewed changes

k8s-ci-robot assigned maxcao13 Oct 28, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 28, 2025

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 28, 2025

adrianmoisey mentioned this pull request Oct 28, 2025

Fix the VPA e2e situation #8705

Open

k8s-ci-robot merged commit 57e9a05 into kubernetes:master Oct 28, 2025
8 checks passed

	ginkgo.It("accepts valid and rejects invalid VPA object", func() {
	ginkgo.By("Setting up valid VPA object")
	validVPA := []byte(`{
	"kind": "VerticalPodAutoscaler",
	"apiVersion": "autoscaling.k8s.io/v1",
	"metadata": {"name": "hamster-vpa-valid"},
	"spec": {
	"targetRef": {
	"apiVersion": "apps/v1",
	"kind": "Deployment",
	"name":"hamster"
	},
	"resourcePolicy": {
	"containerPolicies": [{"containerName": "*", "minAllowed":{"cpu":"50m"}}]
	}
	}
	}`)
	err := InstallRawVPA(f, validVPA)
	gomega.Expect(err).NotTo(gomega.HaveOccurred(), "Valid VPA object rejected")

	ginkgo.By("Setting up invalid VPA object")
	// The invalid object differs by name and minAllowed - there is an invalid "requests" field.
	invalidVPA := []byte(`{
	"kind": "VerticalPodAutoscaler",
	"apiVersion": "autoscaling.k8s.io/v1",
	"metadata": {"name": "hamster-vpa-invalid"},
	"spec": {
	"targetRef": {
	"apiVersion": "apps/v1",
	"kind": "Deployment",
	"name":"hamster"
	},
	"resourcePolicy": {
	"containerPolicies": [{"containerName": "*", "minAllowed":{"requests":{"cpu":"50m"}}}]
	}
	}
	}`)
	err2 := InstallRawVPA(f, invalidVPA)
	gomega.Expect(err2).To(gomega.HaveOccurred(), "Invalid VPA object accepted")
	gomega.Expect(err2.Error()).To(gomega.MatchRegexp(`.admission webhook .vpa.* denied the request: .*`))
	})

Uh oh!

Switch e2e test with feature flag to use framework.WithFeatureGate #8684

Switch e2e test with feature flag to use framework.WithFeatureGate #8684

Conversation

adrianmoisey commented Oct 23, 2025

What type of PR is this?

What this PR does / why we need it:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Oct 23, 2025

Uh oh!

omerap12 commented Oct 23, 2025

Uh oh!

adrianmoisey commented Oct 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

omerap12 commented Oct 23, 2025

Uh oh!

omerap12 commented Oct 23, 2025

Uh oh!

maxcao13 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kamarabbas99 commented Oct 24, 2025

Uh oh!

adrianmoisey commented Oct 24, 2025

Uh oh!

adrianmoisey commented Oct 25, 2025

Uh oh!

adrianmoisey commented Oct 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kamarabbas99 commented Oct 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxcao13 Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxcao13 commented Oct 28, 2025

Uh oh!

adrianmoisey commented Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

maxcao13 Oct 28, 2025 •

edited

Loading