Integrate deployment metadata service for locking and state#4856
Draft
shreyas-goenka wants to merge 58 commits intomainfrom
Draft
Integrate deployment metadata service for locking and state#4856shreyas-goenka wants to merge 58 commits intomainfrom
shreyas-goenka wants to merge 58 commits intomainfrom
Conversation
Add client integration with the deployment metadata service API for server-side deployment locking and resource state tracking. Gated behind DATABRICKS_BUNDLE_DEPLOYMENT_SERVICE=true environment variable. Co-authored-by: Isaac
Collaborator
|
Commit: 342fef8
|
- Use background context with timeout for CompleteVersion in defer blocks, so the lock is released even if the parent context is cancelled (e.g. Ctrl+C) - Add nil state.ID guard in destroy to avoid querying with zero UUID - Fix misleading --force-lock error message to explain lock expiry behavior - Fix import ordering Co-authored-by: Isaac
Move the deployment metadata service client from bundle/deploy/metadata/service to libs/tempdms with SDK-style method signatures (single request struct param). When the protos land in the Go SDK, migration is just an import path change. Unify deploy and destroy flows: instead of separate *WithMetadataService functions that duplicated all mutator calls, the core logic stays in Deploy() and Destroy() with conditional lock management based on the env var. Co-authored-by: Isaac
The proto HTTP bindings use `body: "deployment"`, `body: "version"`, and `body: "operation"` for Create endpoints, which means only the sub-message goes in the request body. The identifier fields (deployment_id, version_id, resource_key) must be passed as query parameters. Previously these fields were incorrectly included in the request body, which would cause "required field missing" errors against the real service. Also updates the test server to read these fields from query parameters instead of the body, so acceptance tests validate the real API contract. Co-authored-by: Isaac
- Rename VersionCompleteLeaseExpire to VersionCompleteLeaseExpired to match proto enum VERSION_COMPLETE_LEASE_EXPIRED. - Remove redundant "parent" query parameter from ListResources (the deployment ID is already in the URL path). - Add acceptance test for the deployment metadata service integration that validates the correct API call sequence during deploy and destroy. Co-authored-by: Isaac
Use print_requests.py to print all requests to /bundle endpoints at each stage (deploy and destroy) for clear visibility into the API call sequence. Co-authored-by: Isaac
- Rename libs/tempdms package to libs/tmpdms - Rename env var to DATABRICKS_BUNDLE_MANAGED_STATE - Use lineage from resources.json as deployment ID - Write _deployment_id file to state directory - Remove postApplyHook, add inline OperationReporter - Set heartbeat interval to 30 seconds Co-authored-by: Isaac
Fix map[string]string -> map[string]any in tmpdms API client for SDK v0.126.0 compatibility. Generate golden files for metadata-service acceptance test showing the full deploy/destroy request flow. Co-authored-by: Isaac
…tion - Change all enum types from int to string using proto enum name strings (e.g. "OPERATION_ACTION_TYPE_CREATE" instead of 4), matching proto-over-HTTP serialization format. - Report failed operations to the metadata service with error messages, not just successful ones. - Enforce direct deployment engine for managed state (early return). - Extract acquireMetadataLock helper to deduplicate deploy/destroy lock blocks. - Add deploy-error acceptance test verifying failed operation reporting. Co-authored-by: Isaac
When the DATABRICKS_LITESWAP_ID environment variable is set, wrap the SDK HTTP transport to inject the x-databricks-traffic-id header on all API requests. This routes traffic to the liteswap service instance for E2E testing against dev deployments. Usage: DATABRICKS_LITESWAP_ID=my-env databricks bundle deploy Co-authored-by: Isaac
The LiteswapID() function was never called; workspace.go reads DATABRICKS_LITESWAP_ID via os.Getenv directly. Co-authored-by: Isaac
… LoadState - Call DeleteDeployment after successful bundle destroy to clean up the deployment record in the metadata service. - Replace os.Getenv with env.Get for DATABRICKS_LITESWAP_ID so it respects context-based env overrides. - Remove unused deploy.LoadState function. Co-authored-by: Isaac
…apping - Move DeleteDeployment call into the cleanup closure in deployMetadataLock so it runs after CompleteVersion (correct ordering). This removes the need to expose svc/deploymentID via a struct. - Add Resize, UpdateWithID mappings to planActionToOperationAction. - Return error for unsupported action types instead of silently ignoring. - Silently skip no-op actions like Skip. Co-authored-by: Isaac
…locking and state management Introduce two interfaces to enable seamless transition between legacy filesystem-based deployments and the new deployment metadata service: - DeploymentLock: Acquire/Release with factory that selects filesystem or metadata service lock based on DATABRICKS_BUNDLE_MANAGED_STATE env var. - ResourceState Manager: Read/Push with direct and terraform implementations, consolidating the engine-type branching into a single factory. Move metadata service lock code from phases/ to deploy/lock/ package, delete old acquire/release mutators, and update all consumers (deploy, destroy, bind, unbind) to use the new interfaces. Co-authored-by: Isaac
…critic Co-authored-by: Isaac
- filesystem.go → workspace_filesystem.go - Merge metadata_service.go, acquire_metadata.go, heartbeat.go → deployment_metadata_service.go Co-authored-by: Isaac
…S deployment When the metadata service is used for the first time (version ID is "1"), resources with Skip actions are reported as INITIAL_REGISTER operations instead of being silently ignored. This registers all pre-existing resources in the metadata service on the initial deployment. Co-authored-by: Isaac
Tests the scenario where a user enables the metadata service for the first time: pre-existing resources that have no changes (Skip action) are reported as OPERATION_ACTION_TYPE_INITIAL_REGISTER. Co-authored-by: Isaac
Co-authored-by: Isaac
When DMS is active, resource state is read from ListResources and written per-operation via CreateOperation with the state field populated. This eliminates the local state file (Finalize) and workspace state push (PushResourcesState) for DMS deployments. - Add LoadStateFromDMS: calls ListResources to populate the state DB - Include resource state in CreateOperation calls (both initial registration and regular operations) - Skip local state file read/write when DMS is active - Skip PushResourcesState when DMS is active - Add DMS migration design doc Co-authored-by: Isaac
…subsequent On the first DMS deploy (DMSSerial == 0), ListResources returns empty since no resources are registered yet. Fall back to local state so InitialRegistrationReporter can discover and register existing resources. On subsequent deploys (DMSSerial > 0), state is read from the server. Also run Finalize on first DMS deploy to preserve local state as a fallback for reverting from DMS to file-based deployment. Co-authored-by: Isaac
The plan command doesn't acquire a lock, so DMSLineage is never set. Use b.DeploymentID (set during state pull) as a fallback in LoadStateFromDMS to enable reading state from the server for plan. Also simplify CalculatePlan: check StateDB.Path instead of DMSLineage to determine if state was already loaded. Co-authored-by: Isaac
When LoadStateFromDMS returns resources with an ID but no state (e.g., registered before state tracking was enabled), the plan now forces an UpdateWithID action to re-sync the resource instead of failing with "unexpected end of JSON input". Co-authored-by: Isaac
…teWithID Jobs don't implement DoUpdateWithID. When a resource has an ID but no saved state (loaded from DMS before state tracking), use a zero-value state so the diff sees everything as changed and computes a regular Update action. Co-authored-by: Isaac
InitForApply (used with --plan flag) now checks StateDB.Path before calling Open, consistent with CalculatePlan and ExportState. Co-authored-by: Isaac
Commands like bundle summary that read state (InitIDs) but don't deploy now call LoadStateFromDMS when a DMS deployment exists. This ensures ExportState returns correct resource IDs from the server rather than potentially stale local state. Co-authored-by: Isaac
Verifies that after a DMS deploy: - bundle plan reads state via ListResources and reports no changes - bundle summary shows Deployment ID and correct resource URLs - Both commands call ListResources to read state from the server Co-authored-by: Isaac
…ntext Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
DMSSerial should be the version we just created, not the previous one. This also removes lastVersionID from acquireLock's return since it's no longer needed. Updated all DMSSerial checks accordingly (first deploy is now DMSSerial == 1, subsequent is DMSSerial > 1). Co-authored-by: Isaac
Remove InitialRegistrationReporter, migration from file-based state, backup logic, DMSSerial, and the OCC lineage/serial override. DMS projects are assumed to be net new — migration from existing direct deployments will be implemented separately. This removes: - resolveFromExistingResourcesJSON and backup to resources.json.backup - InitialRegistrationReporter type, field, and makeInitialRegistrationReporter - DMSSerial field and all first-deploy vs subsequent-deploy branching - Plan lineage/serial override (server handles OCC) - initial-register acceptance test - Zero-value state fallback for resources without state Co-authored-by: Isaac
…revert state_load.go Consolidate DMSLineage and DeploymentID into a single field on Bundle. Remove Deployment ID from summary template (not needed for core flow). Revert unnecessary diff in state_load.go. Co-authored-by: Isaac
WorkspaceClientE no longer needs a context parameter since liteswap uses os.Getenv. Revert the signature change and all caller updates to minimize the PR diff. Also refactor readStates to return early when DMS is enabled, removing the nested conditional for skipping remote state reads. Co-authored-by: Isaac
LoadStateFromDMS is now called in one place (ProcessBundleRet) for all commands that read state (deploy, plan, summary, destroy). Remove duplicate calls from RunPlan and destroy phase. Simplify the condition to just check b.DeploymentID. Co-authored-by: Isaac
Use b.DeploymentID directly instead of storing a copy on the lock struct. Co-authored-by: Isaac
Co-authored-by: Isaac
json.RawMessage implements json.Marshaler, so the unmarshal-remarshal round-trip was unnecessary. Co-authored-by: Isaac
Co-authored-by: Isaac
Verifies that after deleting local .databricks cache between deploys, the second deploy correctly reads existing state from DMS and only creates the new resource. Also verifies plan reads from DMS after cache deletion. Co-authored-by: Isaac
LoadStateFromDMS now opens the local resources.json (which contains the deployment_id pointer) before populating state from the server. This means Open/AssertOpened work normally and no Path/Loaded guards are needed in CalculatePlan, InitForApply, or ExportState. Co-authored-by: Isaac
Extract lock acquire/release with status tracking into a shared helper. This removes duplicate boilerplate from Deploy and Destroy, and removes the DMS-specific log messages and engine checks (the lock factory already handles backend selection). Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
…, omit state on delete - Use bytes.NewReader instead of strings.NewReader(string(...)) - Remove stale reference to makeInitialRegistrationReporter - Copy TargetName from request body in mock CreateVersion - Don't set Operation.State for delete operations (server rejects it) Co-authored-by: Isaac
Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/api/2.0/bundle/) for server-side deployment locking and resource state trackingINITIAL_REGISTERfor first-time tracking)Gated behind
DATABRICKS_BUNDLE_DEPLOYMENT_SERVICE=trueenvironment variable. Zero behavior change when the flag is off.New files
bundle/env/deployment_metadata.go— env var definitionbundle/deploy/metadata/service/types.go— Go structs matching the proto APIbundle/deploy/metadata/service/client.go— HTTP client for all deployment metadata endpointsbundle/deploy/metadata/service/heartbeat.go— background lock renewalbundle/phases/deploy_metadata.go— new deploy flow with metadata servicebundle/phases/destroy_metadata.go— new destroy flow with metadata serviceModified files
bundle/deploy/state_update.go— exportLoadState()functionbundle/phases/deploy.go— feature flag checkbundle/phases/destroy.go— feature flag checkTest plan
[[Server]]stubs for deploy/destroy flowsThis pull request was AI-assisted by Isaac.