February 2018. Azure Data Factory v2 is officially generally available. After a preview that stretched through 2016 and 2017, the thing is real now. So what does GA actually mean for the people who've been running it in production already?
What GA Actually Delivers
Three things change with GA: SLA commitments, support tier, and pricing finalization. Microsoft now backs v2 with a 99.9% uptime SLA. You get production support. And the pricing is locked — no more preview pricing that disappears without warning.
But the more interesting story is what the v2 object model means for people coming from v1. The fundamental design is stable now:
- Pipelines — the unit of orchestration, not just a copy job wrapper
- Datasets — parameterized schemas, not hardcoded table references
- Linked Services — connection definitions, separately versioned
- Triggers — schedule, tumbling window, and event-based, replacing the v1 Scheduler
- Integration Runtimes — Azure IR, Self-Hosted IR, and Azure-SSIS IR as first-class resources
The v2 design is coherent in a way v1 never was. In v1, you had Activities in Datasets, which was conceptually backwards. Datasets weren't data definitions — they were job configuration blobs. V2 separates concerns properly: the dataset defines the data shape, the activity defines the operation, the linked service defines the connection.
What V1 Users Need to Know
There is no automated migration tool. V1 and v2 are different resource types in Azure. They coexist in the same subscription, but you cannot lift-and-shift your v1 pipelines into v2. You rebuild.
I know. I have production v1 pipelines that have been running since 2014. I'm not thrilled about rebuilding them. But I've been running v2 alongside v1 since the preview, and the rebuild investment is worth it. The v2 pattern — parameterized pipelines, ForEach, Lookup-driven config — produces something you can actually maintain.
My plan for 2018: migrate active v1 pipelines to v2 in waves, decommissioning v1 instances as their workloads move over. Run both in parallel. Don't rush it. The v1 service isn't going anywhere immediately.
The Parameterization Story
This is the thing that makes v2 worth the rebuild. In v1, you had to create a separate pipeline for each source table. Fifty tables meant fifty pipelines, all structurally identical, differing only in the source and target names. It was SSIS package proliferation all over again.
V2 pipelines take parameters. A generic ingest pipeline takes sourceTable, targetSchema, and watermarkColumn as parameters. A Lookup Activity reads from a config table. A ForEach iterates over the config rows, calling the inner pipeline with the appropriate parameters. One pipeline definition handles all your tables.
Simple, right? It took until February 2018 to get here.
Git Integration Shipped Alongside GA
I'm going to cover git integration in detail in the next post, but the timing matters: git integration shipped with or immediately before the GA release. The two together change the operating model significantly. GA without git integration would have been a much weaker milestone. With it, you have a deployable artifact model, a change history, and a path toward real CI/CD.
The Trigger Model
The v1 Scheduler is gone. V2 has three trigger types:
- Schedule Trigger — fixed cron-style schedule, fires pipeline on recurring time
- Tumbling Window Trigger — time-windowed, tracks state, retries missed windows, ideal for data partitioning by time
- Event-Based Trigger — fires on Blob Storage events (file arrives, file deleted)
The Tumbling Window Trigger is the most sophisticated and the most underused. If you're doing historical backfill or partition-by-date incremental loads, this is the trigger you want. It passes window start and end as parameters to your pipeline. No custom watermark management required for the scheduling layer — the trigger handles it.
Integration Runtime Changes
The v2 IR model replaces the v1 Data Management Gateway. Self-Hosted IR is the same conceptually — you install it on your on-premises network — but it's more stable, supports high availability clusters (two or more nodes), and integrates with Azure Key Vault for credential management.
Azure-SSIS IR is the lift-and-shift story for SSIS packages. Provision a managed SSIS runtime in Azure, deploy your existing .dtsx packages, run them from ADF as activities. If you're not ready to redesign your SSIS workloads for v2 natively, this is the bridge. I've used it with clients who have fifteen years of SSIS investment they're not abandoning.
Where Things Stand
ADF v2 GA is the version I wanted in 2016. It's parameterized, trigger-based, git-integrated, and has a coherent IR model. The rebuild cost from v1 is real. The operational improvement on the other side is also real.
If you're starting a new ADF project today, start in v2. Don't even look at v1. If you're running v1 in production, make a migration plan and start executing it. The window for running both in parallel is 2018. Don't wait for v1 end-of-life to force the issue — that's a bad time to do a rushed rebuild.
Next up: git integration — what it does, what it doesn't do, and why the adf_publish branch confused my entire team for two weeks. As always, if you've got questions, I'm here to help.