The Databricks CLI for Platform Teams: Automating Workspace Governance

Shannon Lowder

10 Apr 2020 — 2 min read

Every Databricks workspace starts with the same problem: no guardrails. Developers can create 20-node clusters, leave them running indefinitely, install whatever libraries they want, and access any data they have network access to. The first cloud bill after a team gets comfortable is usually the motivation for adding structure. The CLI is how you add that structure programmatically rather than clicking through the UI every time something needs to change.

What the CLI Is Good For in a Platform Context

If you're a data engineer or platform team member responsible for a shared Databricks workspace, the CLI is the tool for:

Deploying cluster policies across environments (dev, staging, prod)
Creating and maintaining secret scopes as part of environment setup
Scripting notebook deployments for CI/CD pipelines
Automating user provisioning and group management
Auditing what's running in the workspace without going through the UI

Deploying Cluster Policies Across Environments

# Create a cluster policy from a JSON file
databricks cluster-policies create --json-file ./policies/data-engineering-policy.json

# List existing policies
databricks cluster-policies list

# Update an existing policy
databricks cluster-policies edit --json @./policies/data-engineering-policy.json

The pattern: keep your cluster policies in version control as JSON files. Your CI pipeline deploys them on merge. Dev, staging, and prod environments get the same policies, with environment-specific overrides for things like max cluster size.

Workspace Audit: What's Running and What Isn't

# List all clusters and their state
databricks clusters list | python3 -c "
import json, sys
clusters = json.load(sys.stdin)['clusters']
running = [c for c in clusters if c.get('state') == 'RUNNING']
print(f'Running clusters: {len(running)}')
for c in running:
    node_type = c.get('node_type_id', 'unknown')
    num_workers = c.get('num_workers', 0)
    print(f'  {c[\"cluster_name\"]}: {num_workers} {node_type} workers')
"

# Find clusters without auto-termination set
databricks clusters list | python3 -c "
import json, sys
clusters = json.load(sys.stdin)['clusters']
no_autoterminate = [c for c in clusters
                    if not c.get('autotermination_minutes')]
print(f'Clusters without auto-termination: {len(no_autoterminate)}')
for c in no_autoterminate:
    print(f'  {c[\"cluster_name\"]} ({c[\"state\"]})')
"

Bulk Secret Scope Setup

When setting up a new environment, automate the secret scope creation rather than clicking through the UI:

#!/bin/bash
# setup-environment.sh

ENVIRONMENT=$1  # dev, staging, prod

# Create secret scope for this environment
databricks secrets create-scope --scope "myproject-${ENVIRONMENT}"

# Grant access to the appropriate group
databricks secrets put-acl \
  --scope "myproject-${ENVIRONMENT}" \
  --principal "data-engineering-${ENVIRONMENT}" \
  --permission READ

echo "Secret scope myproject-${ENVIRONMENT} created"
echo "Add secrets with: databricks secrets put --scope myproject-${ENVIRONMENT} --key "

The manual step (adding actual secret values) stays manual — you shouldn't store secret values in scripts. But the structural setup (scope creation, ACLs, policies) is scriptable and should be in your infrastructure-as-code repository. As always, I'm here to help.

The Databricks CLI for Platform Teams: Automating Workspace Governance

Shannon Lowder

What the CLI Is Good For in a Platform Context

Deploying Cluster Policies Across Environments

Workspace Audit: What's Running and What Isn't

Bulk Secret Scope Setup

Read more

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

You Don't Need Fable. You Need a Router.

DAIS 2026: Genie One and the Context Problem Databricks Is Solving