Skip to content

Conversation

@diamonwiggins
Copy link
Contributor

What this PR does / why we need it:

Implements file-based state persistence for kURL to Embedded Cluster migration to enable resume capability after interruptions.

Key Changes:

  • File-based store: Migration state persists to /var/lib/embedded-cluster/migration-state.json with atomic writes (temp file + os.Rename pattern)
  • Thread safety: Uses sync.RWMutex for concurrent access protection
  • Data isolation: Deep copy via github.com/tiendc/go-deepcopy prevents mutation
  • Hybrid storage: Handles SetUserConfig() called before InitializeMigration() by temporarily storing config in memory
  • Code quality improvements: Refactored NewStoreWithDataDir() to eliminate 45 lines of duplication by reusing NewMemoryStore()
  • Testing: Table-driven unit tests following project standards, plus E2E dry run test

Technical Implementation:

  • fileStore implements the Store interface as drop-in replacement for memoryStore
  • Atomic writes prevent file corruption during crashes
  • State includes: migrationID, transferMode, config, userConfig, status (state, phase, progress, error)
  • File permissions: 0644 (owner read/write, group/others read)

This foundation enables kURL migration to survive process restarts and provides reliable state tracking for the multi-phase migration orchestration.

Which issue(s) this PR fixes:

Fixes https://app.shortcut.com/replicated-prod/story/130972

Does this PR require a test?

Yes - Comprehensive testing added:

Unit Tests (api/internal/store/kurlmigration/file_store_test.go):

  • Table-driven tests for InitializeMigration (3 scenarios), SetState (2 scenarios), GetUserConfig (3 scenarios)
  • Deep copy verification in GetConfig and GetStatus tests
  • File-specific tests: atomic writes, permissions (0644), JSON formatting, corrupted file handling
  • Concurrency tests: concurrent reads and writes
  • Persistence across store instances

Dry Run Test (tests/dryrun/upgrade_kurl_migration_test.go):

  • Test verifying migration state persists to disk
  • Validates file exists at /var/lib/embedded-cluster/migration-state.json
  • Verifies JSON contains migrationID, state, phase, and error fields
  • Tests actual API flow: start migration → poll status → verify persistence

…github.com:replicatedhq/embedded-cluster into diamonwiggins/130971/kurl-migration-api-foundation
…github.com:replicatedhq/embedded-cluster into diamonwiggins/130971/kurl-migration-api-foundation
…github.com:replicatedhq/embedded-cluster into diamonwiggins/130971/kurl-migration-api-foundation
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

This PR has been released (on staging) and is available for download with a embedded-cluster-smoke-test-staging-app license ID.

Online Installer:

curl "https://staging.replicated.app/embedded/embedded-cluster-smoke-test-staging-app/ci/appver-dev-a77b58b" -H "Authorization: $EC_SMOKE_TEST_LICENSE_ID" -o embedded-cluster-smoke-test-staging-app-ci.tgz

Airgap Installer (may take a few minutes before the airgap bundle is built):

curl "https://staging.replicated.app/embedded/embedded-cluster-smoke-test-staging-app/ci-airgap/appver-dev-a77b58b?airgap=true" -H "Authorization: $EC_SMOKE_TEST_LICENSE_ID" -o embedded-cluster-smoke-test-staging-app-ci.tgz

Happy debugging!

Copy link
Member

@JGAntunes JGAntunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments/questions.

Comment on lines +34 to +41
// persistedState represents the structure saved to migration-state.json
type persistedState struct {
MigrationID string `json:"migrationId"`
TransferMode string `json:"transferMode"`
Config types.LinuxInstallationConfig `json:"config"` // resolved/merged config
UserConfig types.LinuxInstallationConfig `json:"userConfig"` // user-provided config
Status types.KURLMigrationStatusResponse `json:"status"`
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably re-use the same state between file and memory store:

  • // memoryStore is an in-memory implementation of Store
    type memoryStore struct {
    migrationID string
    transferMode string
    config types.LinuxInstallationConfig // resolved/merged config
    userConfig types.LinuxInstallationConfig // user-provided config
    status types.KURLMigrationStatusResponse
    initialized bool
    mu sync.RWMutex
    }

Comment on lines +38 to +39
Config types.LinuxInstallationConfig `json:"config"` // resolved/merged config
UserConfig types.LinuxInstallationConfig `json:"userConfig"` // user-provided config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still unclear to me how these 2 will defer, I guess that's logic that will land later?

Thoughts on renaming Config to something clearer (e.g. ResolvedConfig or MergedConfig like you have have in the comment)

Comment on lines +606 to +629
func TestFileStore_JSONFormatting(t *testing.T) {
tmpDir := t.TempDir()
store := NewFileStore(tmpDir)

config := types.LinuxInstallationConfig{
DataDirectory: "/var/lib/embedded-cluster",
}
err := store.InitializeMigration("test-id", "copy", config)
require.NoError(t, err)

// Read file and verify it's valid, pretty-printed JSON
statePath := filepath.Join(tmpDir, "migration-state.json")
data, err := os.ReadFile(statePath)
require.NoError(t, err)

// Verify it's valid JSON
var jsonData map[string]interface{}
err = json.Unmarshal(data, &jsonData)
require.NoError(t, err)

// Verify pretty-printing (should contain newlines and indentation)
assert.Contains(t, string(data), "\n")
assert.Contains(t, string(data), " ")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels implementation specific, I wonder if it's worth testing out? Unless we deem the storage format as part of our API/document that. Even if we do the pretty printing validation seems unnecessary IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants