ADF vs. Azure Databricks: Orchestrator or Compute, or Both?

This is the question I get most often from clients evaluating their Azure data platform options in 2018: "We're using ADF for data movement. We're using Databricks for transformations. Do we actually need both? Can't Databricks just do everything?"

The short answer: they're different tools for different jobs. The longer answer requires being clear about what each one is actually designed to do.

What ADF Is

ADF is an orchestrator. Its native operations are control flow: trigger, schedule, call another service, wait for the result, branch on the outcome, retry on failure. It excels at connecting things: read from this source, write to that target, call this API, run this stored procedure, trigger this notebook. The Copy Activity is the most commonly used ADF feature, but it's a symptom of the orchestration model, not the core of it.

ADF does not run compute in the traditional sense. When you run a Copy Activity, the compute is the Azure IR — a managed Microsoft-operated data movement service. When you run a Data Flow, the compute is a managed Spark cluster. When you run a Databricks Notebook Activity, the compute is your Databricks cluster. ADF schedules and coordinates; it delegates actual processing to other services.

What Databricks Is

Databricks is a compute platform. Its native operations are data processing: read from sources, transform with PySpark or SQL or Scala, write to targets. A Databricks notebook can do everything ADF's Copy Activity does — read from Azure SQL, write to ADLS as Parquet — plus complex transformations, ML model training, streaming, and MLflow tracking. Databricks is also developing orchestration capabilities (Databricks Jobs, and the early work that will become Databricks Workflows).

So yes, Databricks can do what ADF does. The question is whether it does it as well for your specific needs.

Where ADF Wins

Heterogeneous compute orchestration. If your pipeline involves Azure SQL stored procedures, SSIS packages running on Azure-SSIS IR, Azure Functions for API calls, AND Databricks notebooks — ADF is the natural coordinator. It has native activities for all of these. A Databricks notebook can call the others via HTTP or SDK, but you'd be writing Python glue code for something ADF does declaratively.

Connector coverage. ADF has 90+ native connectors with no code required. Reading from SAP via RFC, writing to Dynamics 365, pulling from Salesforce via bulk API — these are single-configuration operations in ADF. In Databricks, you'd be writing PySpark with JDBC connections, handling authentication code, and managing pagination logic manually.

Visual pipeline authoring for non-developers. ADF's visual editor is accessible to data engineers who aren't Python-native. For teams that are split between Python developers and SQL-native data engineers, ADF's configuration-based model works for everyone. Databricks notebooks work best for teams that are Python-first.

Azure-SSIS IR. If you have existing SSIS packages you're not ready to redesign, ADF runs them. Databricks cannot.

Built-in retry logic and trigger management. ADF has configurable retry policies, tumbling window triggers with backfill support, and event-based triggers without additional infrastructure. Implementing equivalent behavior in Databricks Jobs requires more configuration.

Where Databricks Wins

Complex Spark transformations. When your transformation logic is sophisticated — complex joins, window functions, ML feature engineering, graph processing — Databricks' native Spark environment with full Python, Scala, and SQL support is superior to ADF's expression language or even Mapping Data Flows.

Python/Scala-native teams. Teams that live in notebooks find the Databricks authoring experience faster and more flexible. The ADF UI has a learning curve that isn't worth it if your team already knows how to write PySpark.

MLflow integration. If your pipelines include model training, experiment tracking, or model serving, Databricks has native MLflow integration. ADF doesn't play in this space.

Streaming workloads. Databricks Structured Streaming is a first-class capability. ADF's event-based triggers are not a streaming solution.

The Combined Pattern

For most of my clients, the right answer is: use both, with clear responsibilities.

ADF orchestrates the overall pipeline: read data from source systems using native connectors, land raw data in ADLS, trigger Databricks notebooks for transformation, write results to the serving layer, send notifications on completion or failure.

Databricks handles the transformation compute: receives raw ADLS data from ADF, applies business logic using PySpark or SQL, writes transformed data back to ADLS or to a SQL DW.

The integration point is the ADF Databricks Notebook Activity:

{
  "name": "TransformSalesData",
  "type": "DatabricksNotebook",
  "linkedServiceName": {
    "referenceName": "AzureDatabricksLinkedService",
    "type": "LinkedServiceReference"
  },
  "typeProperties": {
    "notebookPath": "/ETL/TransformSalesData",
    "baseParameters": {
      "inputPath": {
        "value": "@concat(pipeline().parameters.adlsRoot, '/raw/sales/')",
        "type": "Expression"
      },
      "outputPath": {
        "value": "@concat(pipeline().parameters.adlsRoot, '/processed/sales/')",
        "type": "Expression"
      },
      "runDate": {
        "value": "@formatDateTime(utcNow(), 'yyyy-MM-dd')",
        "type": "Expression"
      }
    }
  }
}

ADF passes parameters to the Databricks notebook. The notebook reads them via dbutils.widgets or notebook parameters. Results flow back through ADLS, not through ADF's return values.

The Bottom Line

ADF and Databricks are not competitors for the same job. ADF is the conductor; Databricks is one of the instruments. The question isn't which one to use — it's knowing what each one is good at and assigning responsibilities accordingly.

All-in on Databricks for orchestration only makes sense if: your entire pipeline is Python-native, you have no legacy SSIS, you don't need ADF's native connectors for any source, and you don't have non-Databricks compute to coordinate. For most enterprise environments, that's not the case.

As always, if you're evaluating this split for a specific architecture, I'm here to help think it through.

ADF vs. Azure Databricks: Orchestrator or Compute, or Both?

Shannon Lowder

What ADF Is

What Databricks Is

Where ADF Wins

Where Databricks Wins

The Combined Pattern

The Bottom Line

Read more

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

You Don't Need Fable. You Need a Router.

DAIS 2026: Genie One and the Context Problem Databricks Is Solving