ADF in 2020: Six Years In, Here's Where the Platform Actually Stands

I ran my first Azure Data Factory pipeline in 2014. It was a v1 preview, there was no git integration, the connector list was short enough to memorize, and the documentation was aspirational at best. I compared it to SSIS and found it lacking in most dimensions that mattered for production workloads. I kept using it anyway, because the client was already in Azure and the alternative was managing SSIS infrastructure in a cloud-first world.

Six years later, that calculus has completely flipped. Here's an honest assessment of where ADF actually stands heading into 2020 -- not a Microsoft marketing brief, and not a hatchet job either.

What ADF Delivers Today

Managed Infrastructure -- Genuinely

The thing that matters most about ADF in 2020 is that it runs without any server management. No patching, no capacity planning, no 3am calls because a server ran out of disk. The Azure Integration Runtime scales automatically and you pay for what you use. For organizations that spent years babysitting SSIS servers or Informatica appliances, this is not a small thing.

I have pipelines running in production today that I deployed two years ago and have never touched the infrastructure for. The data moved, the pipelines ran, the servers were someone else's problem. That's the promise of managed services, and ADF delivers it.

90+ Connectors

The early knock on ADF was the connector gap -- you'd inevitably hit a source that wasn't supported and end up writing a Custom Activity anyway. That criticism has a short shelf life in 2020. ADF now covers Azure services completely, all the major relational databases (SQL Server, Azure SQL, MySQL, PostgreSQL, Oracle, Teradata, Netezza, and yes, Greenplum), SaaS platforms (Salesforce, Dynamics, SAP, ServiceNow, Marketo, HubSpot, Zendesk, Concur), major file formats including Parquet, ORC, Avro, Delta, and the file stores (Azure Data Lake, S3, GCS).

The connector gap is essentially over. The project conversation has moved from "can ADF connect to X?" to "what's the right incremental load strategy for X?"

Parameterized Pipelines and Metadata-Driven Frameworks

ADF v2's pipeline parameters changed what's architecturally possible. Instead of one pipeline per source table (the v1 trap), you build one parameterized pipeline that takes source/sink configuration as inputs, then drive it with a metadata table and a ForEach activity. A well-built metadata-driven framework can onboard a new source table in minutes.

This pattern isn't documented as a first-class Microsoft feature, but it's the standard approach for any ADF shop doing serious volume. The community has figured it out and shared the patterns. I'll be writing about this in depth over the coming months.

Data Flows for Spark Transformation

ADF Mapping Data Flows went GA in 2019 and I've been running them in production. They execute on Apache Spark -- a managed Spark cluster that ADF provisions, runs your transformation, and terminates. You get Spark-scale transformation with a visual designer and zero cluster management.

The performance is real. A Data Flow that processes 50 million rows runs in minutes where a SQL Server stored procedure would run in hours. The cluster time-to-live feature means you can keep the cluster warm between pipeline runs, eliminating cold start costs for high-frequency workloads.

Git Integration (Imperfect, But Functional)

ADF added git integration in 2018. Azure Repos or GitHub, collaboration branch model, full resource history. It works. The confusion point is the adf_publish branch -- when you click Publish in the ADF Studio, it doesn't publish to your collaboration branch. It generates ARM templates and writes them to a separate adf_publish branch. That branch is what your CI/CD pipeline deploys from.

Every new team member who joins an ADF project gets confused by this. I still explain it in project onboarding. It's a design decision that made sense at the time and creates ongoing friction in practice. We'll live with it.

Where the Comparison to SSIS Breaks Down

I spent a couple of years mentally comparing ADF to SSIS and finding ADF lacking. That was the wrong frame. SSIS is a row-by-row transformation engine designed for SQL Server integration -- it excels at complex row-level transformations, has a mature debugging experience, and runs synchronously in a process you control completely.

ADF is a cloud orchestration platform. It coordinates the execution of tasks across services: Copy Activity for data movement, Data Flows for Spark transformation, Azure Function Activity for serverless logic, Databricks Notebook Activity for complex analytics. ADF doesn't compete with SSIS at the task level -- it operates at the orchestration level above it.

Once I reframed ADF as an orchestrator rather than a replacement for SSIS's transformation capabilities, the design decisions made sense. The right question stopped being "can ADF do what SSIS does?" and became "what does the right orchestration layer look like for a cloud data platform?"

The Gaps That Still Exist

In the spirit of not writing a marketing brief: there are genuine rough edges in ADF heading into 2020.

The adf_publish model is still confusing and the manual Publish step creates friction in CI/CD workflows. There's no way to automatically trigger ARM template generation on a PR merge without external tooling.

Native monitoring and alerting requires Azure Monitor configuration that lives outside ADF. Setting up alerts for pipeline failures isn't complicated, but it's an external dependency that new teams consistently miss.

And then there's Synapse Analytics. Azure Synapse launched GA in late 2019 and includes "Synapse Pipelines" -- a pipeline authoring experience that is, by my observation, nearly identical to ADF. Microsoft says these are different products serving different audiences. I'm skeptical. That's a conversation for a dedicated post.

The Bottom Line After Six Years

ADF in 2020 is a production-grade cloud orchestration platform. It's not perfect. The rough edges are real and I'll document them as I go. But the case for running ADF in production is strong and the case against is getting shorter every year.

The platform has found its footing. The managed infrastructure story is proven. The connector breadth covers almost every realistic use case. The architecture patterns are understood by the community even where Microsoft's documentation is thin.

Six years in, I'm not surprised by ADF anymore. That's the best thing I can say about a platform: it does what it says, consistently, without drama. I'm here to help you get there faster than I did.

Read more