End of year. Time to take stock.
2018 was the most significant year for Azure Data Factory since the service launched in 2014. Let me walk through what landed, what it means, and what's still unfinished.
What Shipped
ADF v2 GA — February
The GA release locked the v2 object model, attached an SLA, and finalized pricing. More importantly, GA marked the end of the "is this stable enough for production?" question. It's stable. The v2 design — parameterized pipelines, ForEach, Lookup, trigger-based scheduling, Integration Runtime as first-class resource — is what you build on now.
The v1-to-v2 migration is a rebuild, not a migration. I've been doing it in waves across client environments throughout 2018. It's work, but the operational improvement on the other side is real.
Git Integration — Q1
Four years of asking for this. It shipped. Every pipeline change is now a git commit. PRs are the change control mechanism. The collaboration branch is the source of truth.
The adf_publish two-step model surprised everyone on my team and probably everyone on yours. But once the mental model clicks — Save goes to git, Publish goes to adf_publish, Deploy reads from adf_publish — it works. The operational improvement over portal-drift hell is not subtle.
No more coming in Monday morning to find that someone made a Friday emergency fix in the portal that never got committed. That was a real operational hazard that is now gone.
CI/CD Story — Q2
Git integration enabled a real CI/CD workflow: ARM templates from adf_publish, Azure DevOps pipelines for multi-stage deployment, environment-specific parameter files, pre/post-deployment trigger management scripts. This is infrastructure-as-code for ADF pipelines. It exists now. Teams that weren't ready to invest in it in 2017 should be building it now.
Mapping Data Flows Preview — Mid-Year
The transformation story ADF v1 never had. Code-free drag-and-drop transformations running on Spark. The catalog — Derived Column, Aggregate, Join, Lookup, Conditional Split, Flatten, Pivot, Window — covers the majority of ETL transformation work.
The cold start latency (3-5 minutes per run on a cold cluster) is a real limitation for high-frequency pipelines. For daily batch workloads, it's manageable. TTL configuration helps. The capability is real even if the execution model has constraints.
SAP Connectors — Q3
SAP Table, SAP BW Open Hub, SAP HANA. After four years of "not natively," ADF can read from SAP without a workaround. For a significant portion of enterprise clients, this was the connector gap that blocked ADF adoption for the extraction layer. That blocker is gone.
Snowflake, REST Improvements, File Format Expansion
Snowflake native connector, improved REST pagination and OAuth support, Parquet/ORC/Avro as first-class dataset types. The connector library is now comprehensive for enterprise workloads.
What 2018 Changed Operationally
The combination of git integration and CI/CD transforms how ADF projects operate. Before 2018, ADF deployments were manual — export a JSON, import it in the portal, hope nothing breaks. After 2018, they're automated ARM deployments with environment promotion, trigger management, and audit trails.
Teams that adopt this pattern are faster, safer, and more collaborative. The PR review workflow for ADF pipeline changes is a genuine improvement over "one person makes all the changes and everyone else hopes they don't break anything."
The monitoring story improved with Azure Monitor and Log Analytics integration, but I'll cover that in detail in 2019 — there's more to say than 2018 fully delivered.
What's Still Unfinished
The adf_publish model is confusing. Manually clicking Publish after every PR merge is a footgun waiting to happen. The product needs automated publish-on-merge or a cleaner path from PR merge to deployment artifact. The workaround exists (REST API call in the ADO pipeline), but it shouldn't be a workaround.
Data Flow cold start latency. 3-5 minutes of Spark cluster spin-up per run makes Data Flows impractical for anything with a short schedule window. TTL helps for warm clusters, but the fundamental tradeoff between Spark startup cost and transformation capability hasn't been resolved.
Monitoring UX. The Azure Monitor integration is good. The native ADF Monitor view is useful for individual run debugging. What's missing is a first-class ADF health dashboard in the portal — fleet-wide error rates, throughput trends, active vs. failed pipelines at a glance. You can build this in Azure Monitor workbooks (and I have), but it shouldn't require a custom build.
ARM template verbosity for large factories. A factory with 40+ pipelines generates ARM templates that push toward the 4MB deployment limit. The auto-generated nature of these templates makes them difficult to manually manage. This will become a bigger issue as factories grow in 2019.
What 2019 Needs
Data Flows GA with better Spark startup performance (configurable cluster pooling). Improved monitoring UX. A cleaner CI/CD model that doesn't require the team to understand adf_publish branch mechanics. And — something I've been watching — the early signs of ADF converging with the rest of the Microsoft analytics platform (Power BI, the nascent Azure Synapse) deserve attention.
2018 was the year ADF became a real platform. 2019 is when we find out whether it's the platform Microsoft is actually investing in long-term, or whether Synapse is where the roadmap lives.
As always, I'm here to help if you've got questions about what to build now vs. what to wait for. Happy new year.