Delta Lake Goes Open Source: What Changed and What Did Not

Shannon Lowder

11 Oct 2019 — 2 min read

Databricks just open-sourced Delta Lake last week. If you've been using it as a Databricks-internal feature (it's been available in the runtime since 2017), most of what you know still applies — the open-source release doesn't change the behavior, it changes who else can use it.

But it does change some things worth knowing about, and it signals something about where Delta is headed.

What Actually Changed

The core Delta Lake format — the _delta_log transaction log, the Parquet data files, the protocol for ACID commits — is now publicly documented and openly implemented. The source code is on GitHub. Anyone running Spark 2.4 or later can use it without Databricks.

What this means in practice:

You can run Delta Lake pipelines on EMR, on-premises Spark clusters, or any Spark environment
The spec is documented, so third-party tools can implement Delta readers and writers
The community can contribute to the Delta format itself, not just to Databricks-proprietary extensions

The Databricks-specific optimizations — the optimized writer, the Photon engine, certain monitoring hooks — those stay proprietary. What's open is the format itself and the reference implementation.

What Didn't Change for Databricks Users

If you're running Delta on Databricks, nothing about your existing code or pipelines changes. The Delta Lake version built into the Databricks Runtime is still the same version you've been using. The open-source release runs slightly behind the Databricks-internal version (the public release lags the runtime by a version or two), but the format is compatible.

# If you're running on Databricks, you don't need to install Delta separately
# It's built into the Databricks Runtime

# If you're running on vanilla Spark (EMR, on-prem, etc.)
# You now can: pip install delta-spark or use the Maven coordinate
# com.databricks:delta-core_2.12:0.4.0

Why This Matters for Architecture Decisions

Before the open-source release, adopting Delta Lake meant accepting a dependency on Databricks as a platform — the format wasn't portable. Now it is. If your organization ever decides to move off Databricks onto a different Spark platform, your Delta tables come with you.

This also opens the door to tools like Presto and Hive eventually supporting Delta reads natively (through the spec), which would let non-Spark query engines read your Delta tables directly. That's important for organizations where different teams use different tools. The lake stays open even if you go deep on Delta.

For teams currently evaluating whether to commit to Delta Lake or stick with plain Parquet, the open-source release removes the vendor lock-in argument from the "against" column. The format is now a community asset. As always, I'm here to help.

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

I wrote recently about Azure Agent Mesh and OpenSharing — two infrastructure layers that between them cover how enterprises register, discover, share, and execute agents. Between them, they address a lot of the plumbing that has been missing from the enterprise agent stack. But there's a gap neither of

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

Unity AI Gateway, announced at DAIS this week, is the feature I've been waiting for since Agent Bricks shipped last year. It's a centralized governance layer for model access in Databricks — you configure which models are approved for use in your environment, who can call them,

You Don't Need Fable. You Need a Router.

The performance gap between open-weight models and closed frontier models has spent the last year collapsing faster than anyone predicted. Epoch AI's tracking puts open weights at roughly a three-to-four-month lag behind state-of-the-art closed models on average. For coding tasks, the gap has effectively closed — DeepSeek V3.2

DAIS 2026: Genie One and the Context Problem Databricks Is Solving

The central message from DAIS this week, delivered by Ali Ghodsi in the opening keynote, was direct: AI doesn't have an intelligence problem, it has a context problem. If your CFO can't get an AI system to explain why margins changed, that's not a