Azure Data Factory: When SSIS Moves to the Cloud

Azure Data Factory version 1 launched in preview in 2014 and I started using it almost immediately. The pitch was familiar: a pipeline orchestration service for moving and transforming data, running as a managed Azure service rather than a server you operated. For someone who'd spent two years managing SSIS infrastructure, the appeal was obvious. The reality was more nuanced.

What ADF Was in 2014

ADF v1 was an orchestration and movement service, not an inline transformation engine. The primary use case: move data between supported stores — SQL Server to Azure Blob Storage, SQL Azure to Azure SQL Data Warehouse — on a schedule, with JSON-defined pipelines. The three core concepts:

  • Linked Services — connection definitions to data stores (what SSIS called connection managers)
  • Datasets — descriptions of the data structure within a linked service
  • Pipelines — orchestration logic: copy activities and stored procedure calls, scheduled by time windows

The JSON Pipeline Model

Unlike SSIS packages (binary XML you couldn't meaningfully diff), ADF pipelines were first-class JSON. You could version them, diff them, review them in pull requests — a meaningful improvement in maintainability.

{
  "name": "CopyCustomersPipeline",
  "properties": {
    "activities": [{
      "type": "Copy",
      "name": "CopyCustomersToStaging",
      "inputs": [{ "name": "SourceCustomersDataset" }],
      "outputs": [{ "name": "StagingCustomersDataset" }],
      "typeProperties": {
        "source": {
          "type": "SqlSource",
          "sqlReaderQuery": "SELECT * FROM dbo.Customers WHERE ModifiedAt > $$Text.Format('{0:yyyy-MM-dd}', SliceStart)"
        },
        "sink": {
          "type": "SqlSink",
          "writeBatchSize": 10000
        }
      }
    }],
    "start": "2014-01-01T00:00:00Z",
    "end": "2099-12-31T00:00:00Z"
  }
}

The slice model was ADF v1's time-windowing approach: pipelines processed data in time slices (hourly, daily), and each slice had its own execution status. Failed slices could be rerun independently — natural support for incremental loads.

Where ADF v1 Was Weaker Than SSIS

Monitoring was primitive. The portal showed a timeline of pipeline runs. No row counts, no transformation diagnostics, no detailed execution logs comparable to SSIS execution reports. Debugging failures often meant reading raw logs from Azure Blob Storage.

Complex transformations required external compute. SSIS had dozens of built-in transformation components. ADF v1 had Copy, Stored Procedure, and HDInsight activities. Inline transformations went into stored procedures — which was workable but added database dependencies for logic that SSIS would have handled in the data flow.

On-premises data required a Data Management Gateway. A Windows service you managed. Not quite fully managed, and a bottleneck when it needed patching.

Where ADF v1 Was Better Than SSIS

No infrastructure to operate. No SSIS server, no Integration Services catalog, no SQL Server Agent jobs to schedule packages. ADF pipelines ran in Azure; Azure managed the compute.

JSON pipelines in source control. Diffable, reviewable, branchable — the same improvement SSDT brought to schema management, ADF brought to pipeline definitions.

Built-in retry and dependency management. SSIS retry required custom scripting. ADF built in configurable retry policies per activity, with slice dependencies expressed declaratively.

The Practical Combination

For the first engagement where I used ADF extensively, the right pattern was: ADF for data movement (extract from sources, land in Azure Blob Storage), stored procedures in Azure SQL for transformations, ADF to orchestrate the stored procedure calls. SSIS stayed on-premises for legacy pipelines not being migrated.

It wasn't SSIS replaced by ADF. It was SSIS for complex in-process transformation, ADF for cloud-native movement and orchestration. Both had roles. As always, I'm here to help.

Read more