Three years ago I started writing about Azure Data Factory. At that point ADF v1 was a preview with an interesting connector library, a confusing scheduling model, and no git integration. Two years ago I was in production with it and cataloging the gaps. Last year Microsoft previewed v2 and it was, for the first time, genuinely exciting.
Here's where things stand as we enter 2017.
What's Working in v2
Parameterization is real and production-grade. I've been building new workloads with parameterized pipelines since mid-2016 and the pattern holds up. Generic ingest pipeline, config table, parameters flowing through — it's the framework I've been trying to build since 2015. The ForEach activity that would complete the pattern hasn't shipped yet, but even without it, parameterized pipelines are a step-change improvement over the copy-paste-per-table model of v1.
The Self-Hosted IR has been more stable than the DMG it replaced. The high-availability configuration — multiple nodes registered to the same IR — has proven its value twice already when a node needed maintenance. Failover was seamless. In v1, a DMG node going down meant service interruption.
For one client, I've been running the Azure-SSIS IR to execute legacy SSIS packages. After the initial provisioning work and some third-party component installation via custom setup scripts, it's been reliable. The on/off scheduling pattern — start the IR 15 minutes before the batch window, stop it when the batch completes — has controlled costs adequately.
What's Still Missing
Git integration. Three years into ADF's existence, two versions of the product, and there is still no native git integration. I've stopped being surprised. I've started treating it as an architectural constraint rather than a temporary gap. Build your own export-to-git discipline or build your own tooling around the REST API. It works. It's extra work.
ForEach activity. Expected in 2017. The metadata-driven framework pattern requires it. I'm building with stored procedure workarounds in the meantime.
Better monitoring in v2 compared to v1. This is nuanced: v2 monitoring shows activity run input and output JSON (very useful for debugging parameters), which v1 never had. But the pipeline run history view is less mature than v1's slice-based monitoring in some ways. No row count during active copy runs. No native alerting built into ADF itself — still requires Azure Monitor configuration.
The Databricks Question
I'm starting to get asked this on client engagements: should we use ADF for orchestration or just use Databricks for everything?
Databricks notebooks can orchestrate other notebooks. Databricks has its own job scheduler. Databricks has native git integration (which ADF still doesn't have). For teams that are already running Databricks as their transformation platform, the question of whether to add ADF to the stack for orchestration is legitimate.
My current thinking: ADF and Databricks are complementary, not competing. ADF handles the ingestion layer well — it has 90+ connectors, it handles on-premises connectivity via Self-Hosted IR, it has file-arrival event triggers, it handles retry and monitoring for the extract-and-land phase. Databricks handles the transformation layer — parallel compute, in-memory processing, complex Python/Scala logic. ADF triggers the Databricks notebook when the raw data is ready; Databricks does the work.
The alternative — using Databricks for everything including ingestion — means writing JDBC connectors and custom S3/ADLS copy code in notebooks for sources that ADF handles natively. That's reinventing connectors. Trust me on this one: let ADF do the ingestion, let Databricks do the transformation.
v1 vs. v2 in Production
I'm running v1 and v2 in parallel for different clients. New workloads get v2. v1 workloads stay on v1 until there's a reason to migrate (significant rework, new features needed). The two versions don't interfere with each other — they're separate ADF instances.
The migration cost from v1 to v2 remains real. Pipeline JSON schemas are different, dataset schemas are different, the scheduling model is different. For a shop with 50 v1 pipelines, migration is a multi-month project. Plan it properly and stage it by workload domain, not all at once.
What I'm Watching This Year
ForEach activity: when it ships, I'll have the framework done. I've been waiting for this one.
Git integration: I'll believe it when I see it. But if it ships in 2017, it changes the operational story significantly.
Monitoring improvements: the activity run detail JSON in v2 is good. I want alerting built into ADF directly, not as an afterthought via Azure Monitor integration.
Databricks notebook activity: this is previewed in v2 and I'm paying attention. A first-class activity for triggering Databricks notebooks from ADF pipelines, with parameter passing, would cement the ADF-as-orchestrator/Databricks-as-compute pattern.
2017 looks like the year ADF v2 becomes a production-grade platform. We're not there yet. We're close. I'm here to help if you're navigating the v1-to-v2 migration or starting fresh with v2.