Azure Data Factory v2 Preview: Microsoft Rethinks Cloud ETL

Microsoft previewed ADF v2 and it's a ground-up redesign. Not an incremental update. Not a new version of the same architecture with better connectors. A different product that happens to share a name with its predecessor.

I've been spending time with the preview. Here's what's actually changed and what it means for teams running ADF v1 in production.

Integration Runtimes Replace the Data Management Gateway

The DMG — that Windows Service you install on a server inside your network to give ADF connectivity to on-premises systems — is gone in v2. In its place: Integration Runtimes.

There are three IR types. Azure IR is fully managed Microsoft compute that handles cloud-to-cloud copy and data flow execution. You don't provision anything; you configure the region and core count if you care about performance, and Microsoft handles the rest. Self-Hosted IR is the DMG's spiritual successor — install it on a server inside your network, register it with ADF, and it provides connectivity to on-premises sources. The key improvements over DMG: high availability configuration with multiple nodes, better diagnostics, and cleaner upgrade management. Azure-SSIS IR is the new one: it provisions a managed cluster of Azure VMs running SSIS, so your existing SSIS packages can run in Azure without modification.

I'll write about each IR type in detail separately. The conceptual shift is important: in v2, the question "where does this activity execute?" has an explicit answer. You pick the IR for each linked service and activity. In v1, the DMG was attached at the linked service level and there was no concept of choosing execution location for cloud activities. Explicit is better than implicit here.

Parameterization: The Feature That Should Have Been in v1

ADF v2 pipelines and datasets can have parameters. I've been asking for this since 2014. Let me show you what it looks like:

{
  "name": "CopyFromSQLToBlob",
  "parameters": {
    "SourceTable": { "type": "String" },
    "TargetPath": { "type": "String" },
    "WatermarkColumn": { "type": "String" }
  },
  "activities": [
    {
      "name": "CopyData",
      "type": "Copy",
      "typeProperties": {
        "source": {
          "type": "SqlSource",
          "sqlReaderQuery": {
            "value": "@concat('SELECT * FROM ', pipeline().parameters.SourceTable)",
            "type": "Expression"
          }
        },
        "sink": {
          "type": "BlobSink",
          "blobWriterAddHeader": true
        }
      }
    }
  ]
}

One pipeline definition. Any source table. Any target path. This is what a generic ingest pipeline looks like when the platform supports parameterization. In v1, this required separate JSON per table. Sixty tables meant sixty pipeline JSON files that were structurally identical. In v2, it's one pipeline and a config table.

The expression language that powers parameterization is worth noting: @pipeline().parameters.SourceTable accesses pipeline parameters, @activity('LookupActivity').output.firstRow.WatermarkValue accesses previous activity output. String concatenation with concat(), date formatting with formatDateTime(), arithmetic. It's not Turing complete, but it covers the patterns you actually need for metadata-driven frameworks.

Triggers: A New Scheduling Model

The v1 slice model is gone. In v2, pipelines have no intrinsic schedule — you wire triggers to them. Three trigger types:

Schedule trigger: run this pipeline on this cron schedule. Straightforward. Analogous to SQL Agent. The pipeline runs at 6am every day. Done.

Tumbling window trigger: like a schedule trigger, but it carries window start and end times as parameters that the pipeline can reference. Also handles backfill automatically — if the service goes down and misses a window, the tumbling window trigger queues it and runs it when the service recovers. This is the right trigger for time-partitioned loads.

Event-based trigger: fires when a blob lands in a specified storage container and path. File-arrival-driven processing without polling. This pattern has been a custom-code affair in v1. In v2 it's a first-class trigger type.

The v1-to-v2 Migration Reality

This is not a simple migration. The JSON schema for pipelines, datasets, and linked services is different in v2. Your v1 pipeline JSON does not port to v2 with a conversion script. It's a rebuild.

For teams running v1 with 50+ pipelines, that's a real cost. My current posture: run v1 in production for existing workloads, build new workloads in v2. Begin rebuilding v1 workloads in v2 as they need significant changes anyway. Don't migrate the entire estate at once.

What v2 Still Doesn't Have

Git integration. Still not there. The two-year running joke continues into v2 preview. This is particularly frustrating because v2 is a redesign — there was an opportunity to build git integration into the architecture from the start. Instead it's "on the roadmap" again.

ForEach activity. Parameterization exists in v2 preview, but the ForEach construct that would let you drive parameter values from a dataset (run this pipeline for each row in my config table) hasn't shipped yet. Parameters without a looping mechanism get you halfway to a metadata-driven framework. The framework I want requires both. ForEach is "coming soon."

My Overall Reaction

This is what v1 should have been. The parameterization feature alone is worth the migration effort — it changes the economics of building and maintaining ADF pipelines dramatically. The IR model is cleaner than the DMG model. The trigger model is more explicit and easier to reason about than slices.

V2 is not done. It's a preview. There are missing features, underdocumented behaviors, and a migration cost that v1 teams will need to plan for. But the direction is right and the core architectural improvements are real. I'm cautiously optimistic for the first time in two years. If you want to start experimenting with the v2 preview, I'm here to help.

Azure Data Factory v2 Preview: Microsoft Rethinks Cloud ETL

Shannon Lowder

Integration Runtimes Replace the Data Management Gateway

Parameterization: The Feature That Should Have Been in v1

Triggers: A New Scheduling Model

The v1-to-v2 Migration Reality

What v2 Still Doesn't Have

My Overall Reaction

Read more

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

You Don't Need Fable. You Need a Router.

DAIS 2026: Genie One and the Context Problem Databricks Is Solving