dbutils.notebook: Orchestrating Databricks From Databricks

Shannon Lowder

26 Jul 2019 — 1 min read

In SQL Server, you call a stored procedure from another stored procedure with EXEC. In Databricks, you call a notebook from another notebook with dbutils.notebook.run(). The concept is the same — modular, reusable units of logic that you orchestrate into a pipeline. The implementation is different enough to matter.

dbutils.notebook.run()

# Call a notebook and pass parameters to it
result = dbutils.notebook.run(
    path="/pipelines/process_region",
    timeout_seconds=3600,
    arguments={
        "region": "West",
        "processing_date": "2019-07-26",
        "environment": "prod"
    }
)

print(f"Notebook returned: {result}")

The called notebook receives the arguments as widget values — accessible via dbutils.widgets.get("region"). The timeout_seconds parameter is a hard deadline; if the notebook doesn't complete within that time, the call raises a NotebookTimedOutError.

Returning Values

The called notebook can return a value using dbutils.notebook.exit():

# In the called notebook (process_region)
region = dbutils.widgets.get("region")
processing_date = dbutils.widgets.get("processing_date")

# ... do the work ...

row_count = result_df.count()

# Exit with a return value (must be a string)
dbutils.notebook.exit(str(row_count))

The return value lands in the result variable in the calling notebook as a string. Parse it if you need a number. Complex return values can be JSON-encoded strings.

Running Notebooks in Parallel

The real power of notebook orchestration is running independent notebooks simultaneously:

from concurrent.futures import ThreadPoolExecutor, as_completed

regions = ["West", "East", "Central", "South"]

def process_region(region):
    result = dbutils.notebook.run(
        "/pipelines/process_region",
        timeout_seconds=3600,
        arguments={"region": region, "processing_date": "2019-07-26"}
    )
    return region, int(result)

with ThreadPoolExecutor(max_workers=4) as executor:
    futures = {executor.submit(process_region, r): r for r in regions}
    for future in as_completed(futures):
        region, count = future.result()
        print(f"{region}: {count} rows processed")

Each dbutils.notebook.run() call runs in a separate notebook context on the cluster. Four parallel calls process four regions simultaneously, then the orchestrating notebook collects the results and continues.

When Notebook Orchestration Is Enough and When It Isn't

For linear pipelines and simple fan-out patterns, notebook orchestration with dbutils.notebook.run() is sufficient. It's fully self-contained within Databricks and requires no additional infrastructure.

Where it falls short: complex DAG dependencies, cross-cluster orchestration, retries with exponential backoff, visibility into long-running pipeline state. For those use cases, Airflow or Databricks Jobs' multi-task configuration is the right move. Start with notebook orchestration for the simple stuff; reach for Airflow when you feel the limits. As always, I'm here to help.

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

I wrote recently about Azure Agent Mesh and OpenSharing — two infrastructure layers that between them cover how enterprises register, discover, share, and execute agents. Between them, they address a lot of the plumbing that has been missing from the enterprise agent stack. But there's a gap neither of

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

Unity AI Gateway, announced at DAIS this week, is the feature I've been waiting for since Agent Bricks shipped last year. It's a centralized governance layer for model access in Databricks — you configure which models are approved for use in your environment, who can call them,

You Don't Need Fable. You Need a Router.

The performance gap between open-weight models and closed frontier models has spent the last year collapsing faster than anyone predicted. Epoch AI's tracking puts open weights at roughly a three-to-four-month lag behind state-of-the-art closed models on average. For coding tasks, the gap has effectively closed — DeepSeek V3.2

DAIS 2026: Genie One and the Context Problem Databricks Is Solving

The central message from DAIS this week, delivered by Ali Ghodsi in the opening keynote, was direct: AI doesn't have an intelligence problem, it has a context problem. If your CFO can't get an AI system to explain why margins changed, that's not a