OCPBUGS-75869: kubelet: Less aggressive low memory reservation#5716
OCPBUGS-75869: kubelet: Less aggressive low memory reservation#5716sdodson wants to merge 1 commit intoopenshift:mainfrom
Conversation
Out of the box a standard OpenShift worker has about 3000 Mi of unevictable workload. Thus when we reserve 2GiB on an 8GiB instance that node will not autoscale down because it never drops below the 50% usage threshold. Therefore, lets reduce the system reserved on the lowest end. The assumption here is that nodes this small are less likely to run the full 250 pods and actually consume the full set of resources. We should make sure that this aligns with our understanding of the problem we're trying to solve by enabling dynamic resource reservation in the first place, which I believe is the fact that massive nodes were only getting 1GiB of reserved memory despite running hundreds of pods. Here's the difference in memory reservation at common sizes : | Total | Old Reserved | New Reserved | | ----- | ------------ | ------------ | | 8 | 2 | 1 | | 16 | 3 | 1.48 | | 32 | 4 | 2.44 | | 64 | 5 | 4.36 | | 128 | 9 | 8.2 | | 256 | 12 | 10.44 | | 512 | 17 | 15.56 | | 1024 | 27 | 25.8 | | 2048 | 48 | 46.28 |
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sdodson The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@sdodson: This pull request references Jira Issue OCPBUGS-75869, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@sdodson: This pull request references Jira Issue OCPBUGS-75869, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@sdodson: This pull request references Jira Issue OCPBUGS-75869, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
|
@sdodson: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
The configuration of the auto-node sizing will be covered as part of long running test:
These are additional tasks that can be taken up post this merge:
|
|
/payload-job periodic-ci-openshift-release-master-ci-4.22-e2e-aws-ovn-techpreview-serial-2of3 periodic-ci-openshift-release-master-ci-4.22-e2e-aws-ovn-techpreview-serial-3of3 Running additional tests that were attempted during the auto-node sizing |
|
@ngopalak-redhat: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
|
/payload-job periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview-serial-2of3 periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview-serial-3of3 |
|
@ngopalak-redhat: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a74baa70-1698-11f1-8d41-df7d51334589-0 |
|
/payload-aggregate periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-upgrade-fips 10 |
|
@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1614df30-1699-11f1-9639-0fa330f5b6dd-0 |
|
/test e2e-aws-mco-disruptive |
|
/payload-job periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-1of2 |
|
@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d3bf6d80-16ac-11f1-86d3-185bd84fe8af-0 |
|
/payload-aggregate periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-upgrade-fips 1 |
|
@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4b9bfa70-16ae-11f1-9ae2-ecfe0ac2f325-0 |
Out of the box a standard OpenShift worker has about 3000 Mi of unevictable workload. Thus when we reserve 2GiB on an 8GiB instance that node will not autoscale down because it never drops below the 50% usage threshold.
Therefore, lets reduce the system reserved on the lowest end. The assumption here is that nodes this small are less likely to run the full 250 pods and actually consume the full set of resources. We should make sure that this aligns with our understanding of the problem we're trying to solve by enabling dynamic resource reservation in the first place, which I believe is the fact that massive nodes were only getting 1GiB of reserved memory despite running hundreds of pods.
Here's the difference in memory reservation at common sizes :
Fixes OCPBUGS-75869
Please provide the following information:
- What I did
Amended the dynamic system reservation scripts to only reserve 1GiB of the first 8GiB of memory. All other memory reservation logic is left in place. See the table above
- How to verify it
Launch a cluster with an 8GiB node, review allocatable and it should be 7GiB rather than 6GiB.
- Description for the changelog
Reduced dynamic memory reservation, on for workers by default in clusters installed on 4.21 or newer, for the first 8GiB of memory to a static 1GiB which mirrors the old non dynamic reservation. This slightly reduces all reservations by less than 2GiB.