Fix multi-line logging of AWS Activity struct in cluster-autoscaler #8674

Copilot · 2025-10-21T21:13:27Z

Description

Fixes #8127 (replace with actual issue number)

This PR fixes a logging issue where AWS Auto Scaling Group activity failures were being logged in multi-line format, breaking log parsing tools like Datadog.

Problem

When scaling operations fail, the cluster-autoscaler logs AWS Activity details at line 551 in auto_scaling_groups.go. The code was logging the entire Activity struct pointer directly:

klog.Warningf("ASG %s scaling failed with %s", asgRef.Name, *activity)

This caused Go's default string formatting to output the struct across multiple lines:

W0116 07:10:42.815402 auto_scaling_groups.go:551] ASG eks-batch-spot-xxxx scaling failed with {
  ActivityId: "4b6651b7-...",
  AutoScalingGroupARN: "arn:aws:autoscaling:...",
  AutoScalingGroupName: "eks-batch-spot-...",
  Cause: "At 2025-01-16T07:09:48Z an instance was started...",
  Description: "Launching a new EC2 instance...",
  Details: "{\"Subnet ID\":\"subnet-xxx\"...}",
  EndTime: 2025-01-16 07:09:49 +0000 UTC,
  Progress: 100,
  StartTime: 2025-01-16 07:09:49.096 +0000 UTC,
  StatusCode: "Failed",
  StatusMessage: "Could not launch Spot Instances..."
}

This multi-line format breaks log aggregation services and makes it difficult to search and analyze logs.

Solution

The log statement now extracts and formats the key fields in a single line:

klog.Warningf("ASG %s scaling failed: StatusCode=%s, StatusMessage=%s, Description=%s, Cause=%s", 
    asgRef.Name, 
    aws.StringValue(activity.StatusCode),
    aws.StringValue(activity.StatusMessage),
    aws.StringValue(activity.Description),
    aws.StringValue(activity.Cause))

Output is now single-line and parseable:

W0116 07:10:42.815402 auto_scaling_groups.go:551] ASG eks-batch-spot-xxxx scaling failed: StatusCode=Failed, StatusMessage=Could not launch Spot Instances..., Description=Launching a new EC2 instance..., Cause=At 2025-01-16T07:09:48Z an instance was started...

Changes

Modified logging at line 551 to format Activity fields explicitly
Used aws.StringValue() for safe nil pointer handling
Included the most relevant fields: StatusCode, StatusMessage, Description, and Cause
All critical debugging information is preserved

Testing

All existing AWS cloudprovider tests pass
Verified single-line log output format
No regressions introduced

/area cluster-autoscaler
/kind bug

Original prompt

This section details on the original issue you should resolve

<issue_title>Logging multi-line klog output on cluster-autoscaler when scaling failed</issue_title>
<issue_description>

Which component are you using?:

/area cluster-autoscaler

What version of the component are you using?:

Component version: v1.31.1

What k8s version are you using (kubectl version)?:
kubectl version Output
$ kubectl version
Client Version: v1.30.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.31.4-eks-2d5f260
What environment is this in?:

AWS EKS

What did you expect to happen?:

cluster-autoscaler always logs single-line klog output.

What happened instead?:

auto_scaling_groups.go:551 log contains multi line logging in klog.
I0116 07:10:42.719737       1 auto_scaling_groups.go:508] Instance group eks-batch-spot202407091400-ap-northeast-1a-72c85d24-0d0c-3317-6d99-xxxxxxxxxxxx has only 0 instances created while requested count is 1. Creating placeholder instances.
W0116 07:10:42.815402       1 auto_scaling_groups.go:551] ASG eks-batch-spot202407091400-ap-northeast-1a-72c85d24-0d0c-3317-6d99-xxxxxxxxxxxx scaling failed with {
  ActivityId: "4b6651b7-c98a-a0c3-5fe1-xxxxxxxxxxxx",
  AutoScalingGroupARN: "arn:aws:autoscaling:ap-northeast-1:xxxxxxxxxxxx:autoScalingGroup:15be2ddc-7564-4bf2-bb35-xxxxxxxxxxxx:autoScalingGroupName/eks-batch-spot202407091400-ap-northeast-1a-72c85d24-0d0c-3317-6d99-xxxxxxxxxxxx",
  AutoScalingGroupName: "eks-batch-spot202407091400-ap-northeast-1a-72c85d24-0d0c-3317-6d99-xxxxxxxxxxxx",
  Cause: "At 2025-01-16T07:09:48Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 0 to 1.",
  Description: "Launching a new EC2 instance.  Status Reason: Could not launch Spot Instances. UnfulfillableCapacity - Unable to fulfill capacity due to your request configuration. Please adjust your request and try again. Launching EC2 instance failed.",
  Details: "{\"Subnet ID\":\"subnet-xxxxxxxxxxxx\",\"Availability Zone\":\"ap-northeast-1a\"}",
  EndTime: 2025-01-16 07:09:49 +0000 UTC,
  Progress: 100,
  StartTime: 2025-01-16 07:09:49.096 +0000 UTC,
  StatusCode: "Failed",
  StatusMessage: "Could not launch Spot Instances. UnfulfillableCapacity - Unable to fulfill capacity due to your request configuration. Please adjust your request and try again. Launching EC2 instance failed."
}
W0116 07:10:42.815520       1 auto_scaling_groups.go:516] Instance group eks-batch-spot202407091400-ap-northeast-1a-72c85d24-0d0c-3317-6d99-xxxxxxxxxxxx cannot provision any more nodes!
I0116 07:10:43.011817       1 auto_scaling_groups.go:161] Updated ASG cache for eks-daemon-spot-202407091400-ap-northeast-1a-56c85d24-0d05-8da9-1e78-xxxxxxxxxxxx. min/max/current is 1/450/2
Then some services cannot parse this log as correct klog such as datadog and so on.

How to reproduce it (as minimally and precisely as possible):

just logging on

autoscaler/cluster-autoscaler/cloudprovider/aws/auto_scaling_groups.go

Line 551 in 0ddb3d3

klog.Warningf("ASG %s scaling failed with %s", asgRef.Name, *activity)

Anything else we need to know?:

N/A</issue_description>

Comments on the Issue (you are @copilot in this section)
@jackfrancis /remove-lifecycle stale @jackfrancis /assign @copilot @k8s-ci-robot @jackfrancis: GitHub didn't allow me to assign the following users: copilot.
Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @copilot

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Fixes #7702

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Fix multi-line logging of AWS Activity struct

k8s-ci-robot · 2025-10-21T21:13:37Z

Hi @Copilot. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Co-authored-by: jackfrancis <1494193+jackfrancis@users.noreply.github.com>

k8s-ci-robot · 2025-10-21T21:21:49Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Copilot
Once this PR has been reviewed and has the lgtm label, please ask for approval from jackfrancis. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

cluster-autoscaler/cloudprovider/aws/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jackfrancis · 2025-10-21T21:31:45Z

/ok-to-test

jackfrancis · 2025-10-21T21:34:00Z

/release-note-edit

Fix multi-line logging of AWS Activity struct

jackfrancis · 2025-10-21T21:43:39Z

/label tide/merge-method-squash

Initial plan

6233a52

Copilot AI assigned Copilot and jackfrancis Oct 21, 2025

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Oct 21, 2025

Copilot started work on behalf of jackfrancis October 21, 2025 21:14 View session

Fix multi-line logging of AWS Activity struct

af33f17

Co-authored-by: jackfrancis <1494193+jackfrancis@users.noreply.github.com>

k8s-ci-robot added area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider and removed do-not-merge/needs-area labels Oct 21, 2025

Copilot AI changed the title ~~[WIP] Fix logging of multi-line klog output in cluster-autoscaler~~ Fix multi-line logging of AWS Activity struct in cluster-autoscaler Oct 21, 2025

Copilot AI requested a review from jackfrancis October 21, 2025 21:26

Copilot finished work on behalf of jackfrancis October 21, 2025 21:26

k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Oct 21, 2025

jackfrancis marked this pull request as ready for review October 21, 2025 21:31

k8s-ci-robot removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Oct 21, 2025

k8s-ci-robot requested review from drmorr0 and gjtempleton October 21, 2025 21:31

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 21, 2025

k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix multi-line logging of AWS Activity struct in cluster-autoscaler #8674

Fix multi-line logging of AWS Activity struct in cluster-autoscaler #8674

Uh oh!

Copilot AI commented Oct 21, 2025 •

edited by k8s-ci-robot

Loading

Uh oh!

k8s-ci-robot commented Oct 21, 2025

Uh oh!

k8s-ci-robot commented Oct 21, 2025

Uh oh!

jackfrancis commented Oct 21, 2025

Uh oh!

jackfrancis commented Oct 21, 2025

Uh oh!

jackfrancis commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Fix multi-line logging of AWS Activity struct in cluster-autoscaler #8674

Are you sure you want to change the base?

Fix multi-line logging of AWS Activity struct in cluster-autoscaler #8674

Uh oh!

Conversation

Copilot AI commented Oct 21, 2025 • edited by k8s-ci-robot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Testing

Comments on the Issue (you are @copilot in this section)

Uh oh!

k8s-ci-robot commented Oct 21, 2025

Uh oh!

k8s-ci-robot commented Oct 21, 2025

Uh oh!

jackfrancis commented Oct 21, 2025

Uh oh!

jackfrancis commented Oct 21, 2025

Uh oh!

jackfrancis commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Oct 21, 2025 •

edited by k8s-ci-robot

Loading