Skip to content

Integrate deployment metadata service for locking and state#4856

Draft
shreyas-goenka wants to merge 11 commits intomainfrom
shreyas-goenka/deployment-metadata-service
Draft

Integrate deployment metadata service for locking and state#4856
shreyas-goenka wants to merge 11 commits intomainfrom
shreyas-goenka/deployment-metadata-service

Conversation

@shreyas-goenka
Copy link
Copy Markdown
Contributor

Summary

  • Add client integration with the deployment metadata service API (/api/2.0/bundle/) for server-side deployment locking and resource state tracking
  • Replace file-based workspace locks with CreateVersion (lock acquire) / CompleteVersion (lock release) when the feature is enabled
  • Report resource operations to the metadata service after each deploy (using INITIAL_REGISTER for first-time tracking)
  • Background heartbeat goroutine keeps the lock alive during long-running deployments

Gated behind DATABRICKS_BUNDLE_DEPLOYMENT_SERVICE=true environment variable. Zero behavior change when the flag is off.

New files

  • bundle/env/deployment_metadata.go — env var definition
  • bundle/deploy/metadata/service/types.go — Go structs matching the proto API
  • bundle/deploy/metadata/service/client.go — HTTP client for all deployment metadata endpoints
  • bundle/deploy/metadata/service/heartbeat.go — background lock renewal
  • bundle/phases/deploy_metadata.go — new deploy flow with metadata service
  • bundle/phases/destroy_metadata.go — new destroy flow with metadata service

Modified files

  • bundle/deploy/state_update.go — export LoadState() function
  • bundle/phases/deploy.go — feature flag check
  • bundle/phases/destroy.go — feature flag check

Test plan

  • Unit tests for metadata service client
  • Acceptance tests with [[Server]] stubs for deploy/destroy flows
  • E2E test against dev workspace with service deployed
  • Verify zero behavior change with flag OFF (existing acceptance tests pass)

This pull request was AI-assisted by Isaac.

Add client integration with the deployment metadata service API for
server-side deployment locking and resource state tracking. Gated behind
DATABRICKS_BUNDLE_DEPLOYMENT_SERVICE=true environment variable.

Co-authored-by: Isaac
@eng-dev-ecosystem-bot
Copy link
Copy Markdown
Collaborator

eng-dev-ecosystem-bot commented Mar 26, 2026

Commit: 342fef8

Run: 23764962616

Env ✅​pass 🙈​skip Time
✅​ aws linux 73 14 0:46
✅​ aws windows 73 14 0:32
✅​ aws-ucws linux 76 11 0:41
✅​ aws-ucws windows 76 11 0:30
✅​ azure linux 72 14 0:49
✅​ azure windows 72 14 0:39
✅​ azure-ucws linux 75 11 0:43
✅​ azure-ucws windows 75 11 0:36
✅​ gcp linux 73 14 0:42
✅​ gcp windows 73 14 0:31

- Use background context with timeout for CompleteVersion in defer blocks,
  so the lock is released even if the parent context is cancelled (e.g. Ctrl+C)
- Add nil state.ID guard in destroy to avoid querying with zero UUID
- Fix misleading --force-lock error message to explain lock expiry behavior
- Fix import ordering

Co-authored-by: Isaac
Move the deployment metadata service client from bundle/deploy/metadata/service
to libs/tempdms with SDK-style method signatures (single request struct param).
When the protos land in the Go SDK, migration is just an import path change.

Unify deploy and destroy flows: instead of separate *WithMetadataService
functions that duplicated all mutator calls, the core logic stays in Deploy()
and Destroy() with conditional lock management based on the env var.

Co-authored-by: Isaac
The proto HTTP bindings use `body: "deployment"`, `body: "version"`, and
`body: "operation"` for Create endpoints, which means only the sub-message
goes in the request body. The identifier fields (deployment_id, version_id,
resource_key) must be passed as query parameters.

Previously these fields were incorrectly included in the request body,
which would cause "required field missing" errors against the real service.

Also updates the test server to read these fields from query parameters
instead of the body, so acceptance tests validate the real API contract.

Co-authored-by: Isaac
- Rename VersionCompleteLeaseExpire to VersionCompleteLeaseExpired to
  match proto enum VERSION_COMPLETE_LEASE_EXPIRED.
- Remove redundant "parent" query parameter from ListResources (the
  deployment ID is already in the URL path).
- Add acceptance test for the deployment metadata service integration
  that validates the correct API call sequence during deploy and destroy.

Co-authored-by: Isaac
Use print_requests.py to print all requests to /bundle endpoints at
each stage (deploy and destroy) for clear visibility into the API
call sequence.

Co-authored-by: Isaac
- Rename libs/tempdms package to libs/tmpdms
- Rename env var to DATABRICKS_BUNDLE_MANAGED_STATE
- Use lineage from resources.json as deployment ID
- Write _deployment_id file to state directory
- Remove postApplyHook, add inline OperationReporter
- Set heartbeat interval to 30 seconds

Co-authored-by: Isaac
Fix map[string]string -> map[string]any in tmpdms API client for SDK
v0.126.0 compatibility. Generate golden files for metadata-service
acceptance test showing the full deploy/destroy request flow.

Co-authored-by: Isaac
…tion

- Change all enum types from int to string using proto enum name strings
  (e.g. "OPERATION_ACTION_TYPE_CREATE" instead of 4), matching proto-over-HTTP
  serialization format.
- Report failed operations to the metadata service with error messages,
  not just successful ones.
- Enforce direct deployment engine for managed state (early return).
- Extract acquireMetadataLock helper to deduplicate deploy/destroy lock blocks.
- Add deploy-error acceptance test verifying failed operation reporting.

Co-authored-by: Isaac
When the DATABRICKS_LITESWAP_ID environment variable is set, wrap the
SDK HTTP transport to inject the x-databricks-traffic-id header on all
API requests. This routes traffic to the liteswap service instance for
E2E testing against dev deployments.

Usage: DATABRICKS_LITESWAP_ID=my-env databricks bundle deploy

Co-authored-by: Isaac
The LiteswapID() function was never called; workspace.go reads
DATABRICKS_LITESWAP_ID via os.Getenv directly.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants