Skip to content

Conversation

@github-actions
Copy link
Contributor

These files are used for picking the starting (pre-upgrade) or ending (post-upgrade) agent versions in upgrade integration tests.

The content is based on responses from https://www.elastic.co/api/product_versions and https://snapshots.elastic.co

The current update is generated based on the following requirements:

.package-version

{
  "version": "9.3.0-SNAPSHOT",
  "build_id": "9.3.0-94331b25",
  "manifest_url": "https://snapshots.elastic.co/9.3.0-94331b25/manifest-9.3.0-SNAPSHOT.json",
  "summary_url": "https://snapshots.elastic.co/9.3.0-94331b25/summary-9.3.0-SNAPSHOT.html",
  "core_version": "9.3.0",
  "stack_build_id": "9.3.0-94331b25-SNAPSHOT"
}

testing/integration/testdata/.upgrade-test-agent-versions.yml

{
  "UpgradeToVersion": "9.3.0",
  "CurrentMajors": 1,
  "PreviousMajors": 1,
  "PreviousMinors": 2,
  "SnapshotBranches": [
    "9.2",
    "9.1",
    "8.19",
    "7.17"
  ]
}

@github-actions github-actions bot requested a review from a team as a code owner November 21, 2025 00:33
@github-actions github-actions bot added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent Label for the Agent team backport-skip skip-changelog update-versions Updates to the agent versions file labels Nov 21, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@swiatekm swiatekm force-pushed the update/main-update-agent-versions-19555764160 branch 4 times, most recently from 4b89384 to d85e410 Compare November 22, 2025 15:26
@pchila pchila self-assigned this Nov 24, 2025
These files are used for picking the starting (pre-upgrade) or ending (post-upgrade) agent versions in upgrade integration tests.

The content is based on responses from https://www.elastic.co/api/product_versions and https://snapshots.elastic.co

The current update is generated based on the following requirements:

`.package-version`

```json
{
  "version": "9.3.0-SNAPSHOT",
  "build_id": "9.3.0-94331b25",
  "manifest_url": "https://snapshots.elastic.co/9.3.0-94331b25/manifest-9.3.0-SNAPSHOT.json",
  "summary_url": "https://snapshots.elastic.co/9.3.0-94331b25/summary-9.3.0-SNAPSHOT.html",
  "core_version": "9.3.0",
  "stack_build_id": "9.3.0-94331b25-SNAPSHOT"
}
```

`testing/integration/testdata/.upgrade-test-agent-versions.yml`

```json
{
  "UpgradeToVersion": "9.3.0",
  "CurrentMajors": 1,
  "PreviousMajors": 1,
  "PreviousMinors": 2,
  "SnapshotBranches": [
    "9.2",
    "9.1",
    "8.19",
    "7.17"
  ]
}
```
@swiatekm swiatekm force-pushed the update/main-update-agent-versions-19555764160 branch from d85e410 to dcba73f Compare November 24, 2025 12:00
@swiatekm
Copy link
Contributor

There's been consistent issues with the ESS stack over several CI runs, so I have a suspicion this update itself is at fault.

@pchila
Copy link
Member

pchila commented Nov 24, 2025

@swiatekm please have a look at #11375
The issue that was present when I reviewed this PR earlier today stems from an incomplete set of env vars passed during retries of fips steps.

I am not sure what else you mean with

There's been consistent issues with the ESS stack over several CI runs, so I have a suspicion this update itself is at fault.

Could you be more specific ?

@swiatekm
Copy link
Contributor

@pchila for example https://buildkite.com/elastic/elastic-agent/builds/30767/steps/canvas?jid=019aa6a8-cd11-4ca9-92e7-4bdb3fde05c8 had the following error:

linux_deb_test.go:308: error getting agent version: error calling get agent API: Get "https://c3538372bd4745b19cca8ff55dadd3fe.us-west2.gcp.elastic-cloud.com:443/api/fleet/agents/fc0dfbc0-bf5b-47ca-b223-256b42b60cf1": context canceled

This is not a fips test, and the failure doesn't look retry related.

@pchila
Copy link
Member

pchila commented Nov 24, 2025

@pchila for example https://buildkite.com/elastic/elastic-agent/builds/30767/steps/canvas?jid=019aa6a8-cd11-4ca9-92e7-4bdb3fde05c8 had the following error:

linux_deb_test.go:308: error getting agent version: error calling get agent API: Get "https://c3538372bd4745b19cca8ff55dadd3fe.us-west2.gcp.elastic-cloud.com:443/api/fleet/agents/fc0dfbc0-bf5b-47ca-b223-256b42b60cf1": context canceled

This is not a fips test, and the failure doesn't look retry related.

@swiatekm Are you gonna investigate that failure ?

@swiatekm
Copy link
Contributor

@pchila for example https://buildkite.com/elastic/elastic-agent/builds/30767/steps/canvas?jid=019aa6a8-cd11-4ca9-92e7-4bdb3fde05c8 had the following error:

linux_deb_test.go:308: error getting agent version: error calling get agent API: Get "https://c3538372bd4745b19cca8ff55dadd3fe.us-west2.gcp.elastic-cloud.com:443/api/fleet/agents/fc0dfbc0-bf5b-47ca-b223-256b42b60cf1": context canceled

This is not a fips test, and the failure doesn't look retry related.

@swiatekm Are you gonna investigate that failure ?

Not this week, but I was planning to do it next Monday if a new version doesn't fix it by then.

@jlind23
Copy link
Contributor

jlind23 commented Nov 24, 2025

@swiatekm next time please make sure you discuss with the person that self assigned to this before doing anything with the PR. Going forward this would avoid double effort.

@swiatekm
Copy link
Contributor

@swiatekm next time please make sure you discuss with the person that self assigned to this before doing anything with the PR. Going forward this would avoid double effort.

We've recently had some flaky tests in our CI that we were progressively addressing, and I'd rebased this PR a few times as this went on, hoping those fixes would address the problems with it. I'd also noted that some of the test failures looked like they were related to the new stack version this PR is updating to, but didn't have time to properly investigate. All of this happened before @pchila assigned it to himself today. When I saw him do it, I pointed out my observations to help point him in the right direction.

What exactly should I have done differently, in your view?

@jlind23
Copy link
Contributor

jlind23 commented Nov 24, 2025

Screenshot 2025-11-24 at 20 04 00

Not force pushing after he self assigned this without you reaching out would have been a good approach IMO.

@pchila
Copy link
Member

pchila commented Nov 24, 2025

Buildkite test this

@pchila
Copy link
Member

pchila commented Nov 25, 2025

On the latest run using manifest https://snapshots.elastic.co/9.3.0-94331b25/manifest-9.3.0-SNAPSHOT.json the stack is using commit https://github.com/elastic/fleet-server/commits/f687ef4c11b0bb65c73c8e0b82124a2fef421149 for fleet server. This commit includes PR elastic/fleet-server#5834 that since been reverted.

Retry steps are now working correctly (spinning up an ECH deployment in the correct region for both FIPS and non-FIPS tests).

The test failures appear related to the upgraded version of the agent not appearing online and with the correct version in fleet-server output, so there's a good chance that the specific fleet-server version has a bug that causes it:

Will close this PR and re-run the automation to see if the latest stack version has a different behavior.
At the time of writing the latest manifest is https://snapshots.elastic.co/9.3.0-9546ac47/manifest-9.3.0-SNAPSHOT.json that includes the revert of elastic/fleet-server#5834

@pchila pchila closed this Nov 25, 2025
@swiatekm
Copy link
Contributor

Screenshot 2025-11-24 at 20 04 00

Not force pushing after he self assigned this without you reaching out would have been a good approach IMO.

You're right, sorry about that @pchila. For the record, this was me rebasing the branch on main through Github, but it was still rude to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants