Skip to content

too many cloud api calls in node-update-controller #442

@yussufsh

Description

@yussufsh

/kind bug
/kind enhancement

What happened?
There are lots of API calls in node-update-controller which creates the powervs cloud object where some fails.

In a minute, a total of ~13 calls to create a cloud object and calls GET pvm instance for checking and setting the storage affinity policy.

# oc logs ibm-powervs-block-csi-driver-controller-86f4c6459-gxn8f -c node-update-controller --previous | grep 'I0821 05:21' | wc -l
27

See the example below where a few errors are while fetching the pvm instance. The last one is while getting the powervs client object (which is fatal) and suggesting the container restart (See #441 ).

Examples:

# oc logs ibm-powervs-block-csi-driver-controller-86f4c6459-gxn8f -c node-update-controller --previous | grep -v 'StoragePoolAffinity' | grep -v 'PROVIDER-ID'
2023-08-19T02:42:36Z    INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": ":8081"}
2023-08-19T02:42:36Z    INFO    setup   starting manager
2023-08-19T02:42:36Z    INFO    Starting server {"kind": "health probe", "addr": "[::]:8082"}
2023-08-19T02:42:36Z    INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8081"}
2023-08-19T02:42:36Z    INFO    Starting EventSource    {"controller": "node", "controllerGroup": "", "controllerKind": "Node", "source": "kind source: *v1.Node"}
2023-08-19T02:42:36Z    INFO    Starting Controller     {"controller": "node", "controllerGroup": "", "controllerKind": "Node"}
2023-08-19T02:42:36Z    INFO    Starting workers        {"controller": "node", "controllerGroup": "", "controllerKind": "Node", "worker count": 1}
I0819 05:54:42.543016       1 nodeupdate_controller.go:81] Unable to fetch Instance Details failed to Get PVM Instance 36776ce2-ef10-400b-be7d-c9511d00f01b :[GET /pcloud/v1/cloud-instances/{cloud_instance_id}/pvm-instances/{pvm_instance_id}][500] pcloudPvminstancesGetInternalServerError  &{Code:0 Description:pvm-instance 36776ce2-ef10-400b-be7d-c9511d00f01b in cloud-instance f4d71e5f9bea49f9a6fdae6f38c4b2cb error: failed to get server and update cache: timed out of retrieving resource for pvmInstanceServer:lon06:f4d71e5f9bea49f9a6fdae6f38c4b2cb:36776ce2-ef10-400b-be7d-c9511d00f01b Error:internal server error Message:}
I0820 06:54:24.914454       1 nodeupdate_controller.go:81] Unable to fetch Instance Details failed to Get PVM Instance 36776ce2-ef10-400b-be7d-c9511d00f01b :[GET /pcloud/v1/cloud-instances/{cloud_instance_id}/pvm-instances/{pvm_instance_id}][500] pcloudPvminstancesGetInternalServerError  &{Code:0 Description:pvm-instance 36776ce2-ef10-400b-be7d-c9511d00f01b in cloud-instance f4d71e5f9bea49f9a6fdae6f38c4b2cb error: failed to get server and update cache: timed out of retrieving resource for pvmInstanceServer:lon06:f4d71e5f9bea49f9a6fdae6f38c4b2cb:36776ce2-ef10-400b-be7d-c9511d00f01b Error:internal server error Message:}
I0820 17:30:31.360402       1 nodeupdate_controller.go:81] Unable to fetch Instance Details failed to Get PVM Instance 36776ce2-ef10-400b-be7d-c9511d00f01b :[GET /pcloud/v1/cloud-instances/{cloud_instance_id}/pvm-instances/{pvm_instance_id}][403] pcloudPvminstancesGetForbidden  &{Code:403 Description: Error: Message:user iam-ServiceId-c27c3ef5-8405-4dc1-9590-4440adaad19f does not have correct permissions to access crn:v1:bluemix:public:power-iaas:lon06:a/bf9f1f230466481b95a99f18739fede9:dbc67d5e-9579-49da-b1d9-fc2ec7ddc680:: with {role:user-unauthorized permissions (read:false write:false manage:false)}}
F0821 05:22:32.216618       1 powervs_node.go:69] Failed to get powervs cloud: errored while getting the Power VS service instance with ID: dbc67d5e-9579-49da-b1d9-fc2ec7ddc680, err: Get "https://resource-controller.cloud.ibm.com/v2/resource_instances/dbc67d5e-9579-49da-b1d9-fc2ec7ddc680": read tcp 192.168.81.10:46226->104.102.54.251:443: read: connection reset by peer

What you expected to happen?
The node-update-controller should not have so many cloud API calls.

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?:

Environment

  • Kubernetes version (use kubectl version):
  • Driver version: latest

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions