I spend a fair amount of time in these posts pointing out what ADF is missing. That's appropriate — gaps matter when you're deciding whether to use a tool in production. But ADF also makes decisions that SSIS was architecturally never going to make, and those decisions matter too. This post is the other side of the ledger.
Infrastructure Elimination
Let me be concrete about what "no server to manage" actually means in practice. With SSIS in production, you have: an Integration Services server (Windows Server, SQL Server license, SSISDB, maintenance plan, backup job, Windows patching schedule), possibly a separate SQL Agent server for scheduling, a monitoring story (custom SSISDB queries or a third-party tool), and a DR plan for what happens when the SSIS server is unavailable.
With ADF, you have none of that. ADF runs in Azure's managed compute fabric. High availability, scaling, patching, and DR are Microsoft's problem. You pay per activity run and per data movement volume. The operational burden is genuinely zero — not "reduced," zero. For a consultant building pipelines for a client who doesn't have an operations team, this changes the entire engagement model.
SSIS was never going to make this decision because it's a server product. It requires SQL Server. It requires Windows Server. Infrastructure elimination was structurally impossible in that architecture.
Native Azure Service Integration
When SSIS connects to Azure Blob Storage, it goes through the Azure Storage SDK wrapped in a connector component. The authentication, retry logic, and performance characteristics are only as good as the connector implementation. When something breaks, you're debugging a third-party connector.
When ADF connects to Azure Blob Storage, it is a first-class Azure-to-Azure operation. Authentication flows through the Azure fabric. Retry and throttling handling is built into the service. Cross-service network connectivity is within Azure's backbone, not traversing the public internet. The integration is native, not mediated.
This matters most for Azure Data Lake, Azure SQL Data Warehouse, and HDInsight. ADF's HDInsight Activity provisions an on-demand cluster, runs the Hive or MapReduce job, and tears it down when done — all coordinated through Azure's resource management layer. Doing this from SSIS would require custom scripting to provision, trigger, poll, and deprovision the cluster. ADF makes it a first-class pipeline step.
Infrastructure as Code from Day One
Every ADF object — linked service, dataset, pipeline — is a JSON document. The entire factory is a collection of JSON files. This is not a feature that was bolted on; it is the native representation. When you create a linked service in the portal, you are creating a JSON document. When you deploy via PowerShell, you are deploying JSON documents.
SSIS packages are XML, but the authoring experience treats them as binary — the designer abstracts the XML. The JSON-first approach in ADF means templates, parameterization, and scripted deployment are natural. You can diff two pipeline versions. You can template a linked service for multiple environments. You can generate pipeline JSON programmatically. These are hard in SSIS; they're the default in ADF.
Automatic Backfill and Slice Tracking
This one took me a while to fully appreciate. When an SSIS job scheduled via SQL Agent misses a run — server is down, SQL Agent service is stopped, the job fails — the missed execution is gone. SQL Agent doesn't know or care that data for that time window wasn't processed. You find out when someone notices stale data in a report.
ADF's slice model changes this fundamentally. Every pipeline has a defined active period divided into slices by the output dataset's frequency. Each slice has state: Waiting, Ready, In Progress, Succeeded, Failed. If a slice fails or is skipped because the pipeline was paused, that slice remains in its non-Succeeded state. When the pipeline resumes, ADF processes outstanding slices automatically — oldest first. Your data is backfilled without manual intervention.
For data pipelines where gaps are unacceptable — daily financial summaries, compliance reporting, any process where "we missed Tuesday" has downstream consequences — this is a genuinely better model than time-based scheduling with no state tracking.
Pay-Per-Run Economics
Fixed infrastructure cost versus variable consumption cost is a real tradeoff, and ADF wins for certain workload profiles. Consider a pipeline that runs once a day, processes variable data volumes (heavy at month-end, light otherwise), and has periods of low business activity (holidays, fiscal year end freeze). An SSIS server runs continuously whether the workload is heavy or zero. ADF charges only for what runs.
The math shifts when workloads are high-frequency or continuous — a pipeline running every five minutes all day every day may cost more on ADF's per-run pricing than a fixed server. Do the math for your specific workload. But for the common pattern of scheduled batch pipelines with variable volume and occasional quiet periods, ADF's economics are genuinely better.
The Honest Summary
ADF isn't better than SSIS across the board. On transformation depth, debugging, local development, and connector maturity, SSIS still wins. But on infrastructure elimination, native Azure integration, infrastructure-as-code, automatic backfill, and operational simplicity, ADF makes decisions that SSIS's architecture was never going to make. These aren't features Microsoft added to SSIS and forgot to ship — they're decisions that required building a new tool from scratch.
Understanding what each tool genuinely does better is how you make the right choice for each workload. Next post is the 2014 year-end retrospective. If you're still working through the ADF decision for a project, I'm here to help.