Installing Python Libraries in Databricks: Cluster Scope vs Notebook Scope

Shannon Lowder

07 Mar 2019 — 2 min read

At some point after you've been using Databricks for a few months, the question of how to install Python libraries comes up. Not because the built-in libraries aren't comprehensive — they are — but because your work requires something specific. A data quality library, a custom connector, an internal package your team built. The answer changes depending on where and how you need the library.

The Two Scopes

Databricks has two ways to install a Python library, and they're not interchangeable:

Cluster libraries — installed on the cluster itself, available to all notebooks attached to that cluster, persist across cluster restarts (until you remove them)
Notebook-scoped libraries — installed using %pip in a notebook cell, available only in that notebook, reset when the cluster restarts or the notebook is detached

Cluster Libraries

Cluster libraries are managed through the Databricks UI (Clusters → your cluster → Libraries tab) or via the Clusters API. You can install from PyPI, Maven, CRAN, DBFS, or a direct file upload.

# Via Databricks CLI
databricks libraries install --cluster-id 1234-567890-abc12 --pypi-package great-expectations==0.13.0

The library installs at cluster start time (or immediately if the cluster is already running, after a short wait). All notebooks on that cluster can import it without any additional steps.

When to use cluster libraries:

Libraries used by most or all notebooks on a cluster
Production jobs where the library version needs to be locked
Libraries that require native extensions or have long install times

Notebook-Scoped Libraries with %pip

%pip install pandas==1.1.0 scipy numpy==1.19.0

Run this in a notebook cell. Databricks restarts the Python kernel after %pip install runs, so put your %pip commands at the top of the notebook before any imports. The library is available for the remainder of the notebook session.

# Always put pip installs before imports
%pip install great-expectations==0.13.0 pyarrow==1.0.0

import great_expectations as ge
import pyarrow as pa

When to use notebook-scoped libraries:

Notebooks that need a different version of a library than what's on the cluster
Exploratory work where you want to try a library without touching the cluster config
Notebooks run by different teams that have different dependency requirements

Version Conflicts

This is where teams get into trouble. A cluster library installs version X of a package. A notebook installs version Y using %pip. Within that notebook, the %pip version wins. But if the cluster library version had native extensions already loaded by the Python runtime, you can end up with unexpected behavior.

The cleaner approach for production: use cluster libraries with pinned versions, and don't override them in individual notebooks. Reserve %pip for development and exploration. When you find a library combination that works, pin it in the cluster configuration and remove the %pip calls from the notebooks.

One more thing: if you're installing from a private PyPI repository (your org's internal package server), configure the cluster's init script to set up the pip index URL rather than passing it in each %pip call. Credentials in %pip commands end up in notebook output. As always, I'm here to help.

Installing Python Libraries in Databricks: Cluster Scope vs Notebook Scope

Shannon Lowder

The Two Scopes

Cluster Libraries

Notebook-Scoped Libraries with %pip

Version Conflicts

Read more

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

You Don't Need Fable. You Need a Router.

DAIS 2026: Genie One and the Context Problem Databricks Is Solving