2014 in Cloud ETL: A Year of Azure Data Factory First Impressions

It's December 2014. ADF has been in preview most of the year. I've built pipelines with it, watched it fail in ways SSIS doesn't, and watched it succeed in ways SSIS can't. Time for an honest year-end assessment before the hype either inflates or deflates what we actually have.

What ADF Delivered in 2014

Core data movement. Copy Activity works reliably. Blob-to-SQL, SQL-to-Blob, on-premises SQL Server to Azure via the Data Management Gateway — these patterns run in production without drama. The retry logic and slice-based state tracking are genuine improvements over SQL Agent scheduling for batch ETL. If you have cloud-native sources and sinks, ADF will move your data dependably.

Managed compute. The promise of no infrastructure held up. I have not had to provision, patch, or debug an ADF server because there isn't one. For workloads I've put on ADF, the operational overhead is genuinely close to zero. The gateway machine for on-premises connectivity is the only infrastructure I manage, and that's a lightweight agent on an existing server.

JSON-deployable pipelines. Everything is a text file. My factory's linked services, datasets, and pipelines live in a git repository. I deploy with PowerShell. This workflow required building a discipline around it — the portal doesn't enforce it — but once the workflow exists, deployments are repeatable and auditable. That matters.

Azure-native integration. Blob, Azure SQL, Azure SQL Data Warehouse, HDInsight — these work without connector middleware. For greenfield Azure workloads, ADF's native integration eliminates an entire class of connector problems I've dealt with in SSIS.

What ADF Didn't Deliver in 2014

Git integration. This is the biggest gap and it's getting more expensive over time. The portal editor has no version control. No history, no diff, no commit on save. I built a manual workaround (git repo + PowerShell deployment), but it requires discipline that the tool doesn't enforce. Teams that don't establish this discipline early end up with factory state that doesn't match what's in source control — or no source control at all. A tool that generates text artifacts should link to git natively. This is not a nice-to-have; it's table stakes for production ETL development.

Transformation depth. ADF's transformation story in 2014 is: Copy Activity lands data, a stored procedure transforms it. There is no transformation engine. No Derived Column, no Lookup join, no Conditional Split. For pure data movement workloads this is fine. For any workload with non-trivial in-flight data transformation, you're writing T-SQL and running stored procedures — which works, but it means ADF is an orchestrator, not an ETL platform.

Connector breadth. Azure Blob, Azure SQL, Azure SQL Data Warehouse, SQL Server via DMG, Oracle via DMG, MySQL via DMG. That is the complete connector list. No FTP, no SFTP, no Salesforce, no SAP, no REST/HTTP sources, no ADLS. Every enterprise client I talk to has at least one source system that isn't on this list. The custom activity workaround exists but carries real development and maintenance overhead.

Monitoring depth. The ADF monitoring view shows slice status — green or red, start time, end time, error message. That's it. SSIS's SSISDB gives you execution reports with row counts per component, execution duration per task, warnings, and a full event log you can query. ADF gives you a status icon and a truncated error message. Diagnosing why a pipeline is slow or what happened during a failed run requires digging into gateway machine logs and using Azure's limited alerting tools.

Parameterization. You cannot pass parameters to an ADF pipeline at trigger time. The pipeline's active period is fixed in the JSON. Building dynamic pipelines — "run this pipeline for this date range on demand" — requires either editing the pipeline JSON and redeploying or using workarounds that involve shared state in SQL tables. This is a significant gap for operational flexibility.

The 80% Pattern

Here's what I've noticed about how Microsoft shipped ADF in 2014: they delivered the useful core and left the last 20% for customers to solve. The data movement works. The scheduling works. The JSON-based deployment model works. The gaps — git, parameterization, monitoring, transformation depth — are real but workable with discipline and supplemental tooling.

This is a recognizable pattern. SSIS shipped without a decent deployment model for years (fixed in 2012). SQL Server Reporting Services shipped without a decent subscription model for years. Microsoft tends to get the hard architectural decisions right and then leave the operational features — the things that matter at scale with teams — to subsequent releases.

I don't say this as a criticism of ADF specifically. I say it because knowing the pattern helps you plan: build the discipline and tooling to fill the 20% gap now, and trust that Microsoft will eventually ship a native solution.

What 2015 Needs to Bring

In priority order: git integration (please), expanded connectors (ADLS, FTP/SFTP, Salesforce), better monitoring (row counts, in-progress visibility, one-click rerun), and parameterization at trigger time. The first and last of those are the ones I feel most acutely in day-to-day work.

ADF is production-worthy for cloud-native Azure workloads with the right operational discipline. It is not a replacement for SSIS in 2014. It is a complementary tool for a specific problem space, and within that space it performs well.

I'll be back in January with a production report — six months of real ADF pipelines, what held up and what didn't. If you're wrapping up your 2014 ADF evaluation, I'm here to help.

2014 in Cloud ETL: A Year of Azure Data Factory First Impressions

Shannon Lowder

What ADF Delivered in 2014

What ADF Didn't Deliver in 2014

The 80% Pattern

What 2015 Needs to Bring

Read more

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

You Don't Need Fable. You Need a Router.

DAIS 2026: Genie One and the Context Problem Databricks Is Solving