Skip to content

Conversation

@jugal-chauhan
Copy link
Collaborator

@jugal-chauhan jugal-chauhan commented Nov 18, 2025

Description

This PR replaces the static workflow definitions with a DSL generated testMigrationWithWorkflowCli WorkflowTemplate that executes migration workflows via the workflow CLI, aligning Jenkins integration tests with the production deployment model.

Changes include

Removed:

  • Old workflow definitions (reduced file count and maintenance burden)

Added:

  • testMigrationWithWorkflowCli.ts - DSL-generated WorkflowTemplate that:
    • Configures and submits migration workflows programmatically via CLI
    • Monitors workflow execution via status codes based on workflow status output

This workflow has three main templates:

  1. configureAndSubmitWorkflow - Builds migration config JSON and submits via workflow configure + workflow submit
  2. monitorWorkflow- Polls workflow status, auto-approves suspended workflows, exits on success/failure
  3. testRunMigration - entrypoint

Issues Resolved

MIGRATIONS-2750

Testing

None yet

Check List

  • New functionality includes testing
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…to run a migration

Signed-off-by: Jugal Chauhan <jugaldc@amazon.com>
@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.39%. Comparing base (847e824) to head (ad4709f).

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #1983   +/-   ##
=========================================
  Coverage     78.39%   78.39%           
  Complexity       21       21           
=========================================
  Files           603      603           
  Lines         24180    24180           
  Branches       1844     1844           
=========================================
  Hits          18955    18955           
  Misses         4328     4328           
  Partials        897      897           
Flag Coverage Δ
gradle 76.52% <ø> (ø)
node 90.02% <ø> (ø)
python 78.80% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Jugal Chauhan <jugaldc@amazon.com>
Signed-off-by: Jugal Chauhan <jugaldc@amazon.com>
Signed-off-by: Jugal Chauhan <jugaldc@amazon.com>
Signed-off-by: Jugal Chauhan <jugaldc@amazon.com>
Signed-off-by: Jugal Chauhan <jugaldc@amazon.com>
popd
}
apply_workflows "workflows"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach - this makes it easy to have a separate production workflows directory and a separate test one (testWorkflows). That solves some of the other gradle directory collision issues that I was describing previously.

import {CreateSnapshot} from "./createSnapshot";
import {DocumentBulkLoad} from "./documentBulkLoad";
import {FullMigration} from "./fullMigration";
import {testMigrationWithWorkflowCli} from "./testMigrationWithWorkflowCli";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the casing to be consistent

import {CommonWorkflowParameters} from "./commonUtils/workflowParameters";
import {makeRequiredImageParametersForKeys} from "./commonUtils/imageDefinitions";

const configComponentParameters = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO (can be for later): We'll need to add http auth (see here)

{
"snapshotConfig": {
"snapshotNameConfig": {
"snapshotNamePrefix": "source1snapshot1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably include the string "test" or "workflowtest" somehow so that if we ever run this on a cluster that wasn't meant to be a test cluster, one would understand where it came from.
You can also include a workflow.uid or name w/ expr.getWorkflowValue("uid") (here) and see the argo docs) for which variables are supported

Comment on lines +118 to +119
WORKFLOW_OUTPUT=$(workflow submit 2>&1)
echo "Workflow submit output: $WORKFLOW_OUTPUT"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - would be safer to just dump this directly to stdout rather than to a variable

echo "$STATUS_OUTPUT"
RESULT="ERROR"
EXIT_CODE=2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this to something like - MONITOR_SHOULD_CONTINUE (TRUE/FALSE) and then return 0 or non-zero at the end.
You could also just remove the variable and exit w/ 0 when RESULT="SUCCESS" or "FAILURE" (and let all others retry).

.addCommand(["/bin/sh", "-c"])
.addResources(DEFAULT_RESOURCES.MIGRATION_CONSOLE_CLI)
.addArgs([WORKFLOW_MONITOR_SCRIPT])
.addPathOutput(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where/how do you use this output? I don't see any other occurrence of monitorResult - I would expect to see it in testRunMigration to check to see if that template itself should return success or failure.

)
.addRetryParameters({
limit: "864", // ~72 hours (864 * 5min max backoff)
retryPolicy: "Always",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to return with a failure and not retry when there's a permanent failure. There are two ways to solve this. One is to add another task to the step here, grabbing the output parameter and testing it (run a failure template conditionally). The other would be to make the retry smarter to retry only transient errors.

Transient errors are defined here - but I wouldn't suggest using that because the definition of a transient error is global, so the test change would affect production workflows. Just throwing it out there though.

It certainly would be cleaner to tweak the retry - though permanent failures shouldn't be common & adding another container task like this isn't bad.

Here's the failure check and template

- name: explicit-fail
  container:
    image: busybox
    command: [sh, -c, "echo 'Workflow condition failed' && exit 1"]

... from some steps
  - - name: fail-workflow
      template: explicit-fail
      when: "{{steps.validate.outputs.parameters.monitorResult}} == FAILED"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants