ADF Copy Activity: Moving Data Without Installing Anything

Copy Activity is ADF's engine. It is the one activity you will use in nearly every pipeline, and it has more knobs than the documentation makes obvious. I've spent enough time with it now to give you the full picture — what it can do, how to tune it, and where it runs out of road.

The Full Copy Activity Structure

Here's a working example moving data from Azure Blob (CSV) to Azure SQL Database, with column mapping:

{
  "name": "CopyBlobToSQL",
  "type": "Copy",
  "inputs": [{ "name": "BlobInputDataset" }],
  "outputs": [{ "name": "SQLOutputDataset" }],
  "typeProperties": {
    "source": {
      "type": "BlobSource",
      "skipHeaderLineCount": 1,
      "treatEmptyAsNull": true,
      "recursive": false
    },
    "sink": {
      "type": "SqlSink",
      "writeBatchSize": 10000,
      "writeBatchTimeout": "01:00:00",
      "preCopyScript": "TRUNCATE TABLE dbo.StagingTable",
      "sqlWriterStoredProcedureName": "usp_UpsertSalesData",
      "sqlWriterTableType": "SalesDataTableType"
    },
    "translator": {
      "type": "TabularTranslator",
      "columnMappings": "SourceCol1:TargetCol1, SourceCol2:TargetCol3, SourceCol4:TargetCol2"
    },
    "parallelCopies": 4,
    "cloudDataMovementUnits": 8
  },
  "policy": {
    "concurrency": 1,
    "retry": 3,
    "timeout": "02:00:00",
    "retryIntervalInSeconds": 30
  }
}

Source Options: BlobSource

skipHeaderLineCount: Skip N lines at the top of each file. Use 1 for CSV files with a header row. If your dataset has firstRowAsHeader: true, you don't need this — the dataset handles it. Mixing both causes problems; pick one.

treatEmptyAsNull: Convert empty strings to NULL on copy. Critical for numeric and date columns where an empty string would fail type conversion to the target column type.

recursive: Whether to process files in subdirectories when the folder path resolves to a directory. Default is false — files only at the specified path level.

What BlobSource cannot do: filter rows. If you only want rows where StatusCode = 'ACTIVE', you cannot express that in BlobSource. The blob lands whole. Row filtering happens downstream in SQL. This is a meaningful gap for large files where you want to push down predicates.

Sink Options: SqlSink

writeBatchSize: Rows per insert batch. Default is 10,000. Increase for large loads with simple schemas; decrease if you're hitting transaction log pressure. Experiment — the optimal value depends on row width and target hardware.

preCopyScript: A SQL statement executed once before data movement starts. Use it to truncate a staging table or delete stale data. This runs exactly once per slice, not once per batch.

sqlWriterStoredProcedureName and sqlWriterTableType: Instead of bulk inserting directly to the target table, ADF calls your stored procedure with a table-valued parameter. Your procedure handles the upsert logic.

CREATE TYPE dbo.SalesDataTableType AS TABLE (
  SaleID INT, CustomerID INT, Amount DECIMAL(18,2), SaleDate DATE
);

CREATE PROCEDURE dbo.usp_UpsertSalesData
  @SalesData dbo.SalesDataTableType READONLY
AS
BEGIN
  MERGE dbo.Sales AS target
  USING @SalesData AS source ON target.SaleID = source.SaleID
  WHEN MATCHED THEN UPDATE SET target.Amount = source.Amount, target.SaleDate = source.SaleDate
  WHEN NOT MATCHED THEN INSERT (SaleID, CustomerID, Amount, SaleDate)
    VALUES (source.SaleID, source.CustomerID, source.Amount, source.SaleDate);
END

Column Mapping

The translator block handles column name mapping when source and sink columns don't match. The mapping syntax is SourceName:TargetName comma-separated. Columns not listed in the mapping are excluded from the copy — use this for filtering source columns you don't need in the sink.

If source and sink columns have identical names and you just want all of them, omit the translator block entirely. ADF maps by name automatically.

Performance Tuning

parallelCopies: Number of parallel threads for data movement. For Blob-to-SQL: 4-8 is a reasonable starting point. For SQL-to-SQL: be careful — parallel reads on a source table with a range split can cause blocking. Test against your source before increasing this in production.

cloudDataMovementUnits: The compute allocated to the data movement service. Higher values mean more throughput for large transfers. Each unit has a cost — don't set this to 32 for a pipeline moving 1 MB/day. Match it to your actual data volume and latency requirements.

The Policy Block

concurrency: How many slices can run simultaneously. Set to 1 unless you've confirmed the source and sink can handle parallel slice execution without conflicts.

retry: Number of retry attempts on failure. 3 is standard. ADF waits retryIntervalInSeconds between attempts.

timeout: Maximum duration for a single activity run. If your copy job hasn't completed in this window, ADF marks it failed and retries (if retry count allows). Set this generously enough to accommodate off-peak runs but tight enough to detect genuine hangs.

What Copy Activity Cannot Do

No row filtering from Blob sources. No joins. No aggregation. No incremental load built in — there is no high-water mark mechanism. No derived columns or data type transformation beyond simple type conversion.

The standard pattern for anything requiring transformation: Copy to a staging table, then a Stored Procedure Activity runs the transform. This works reliably. It means your transformation logic is T-SQL, not a visual data flow — which is fine if your team is SQL-fluent, and a gap if you're used to SSIS's visual transform model.

Copy Activity is genuinely powerful for what it does. Understand its limits upfront and design your pipelines accordingly, and it won't surprise you. I'm here to help if you're hitting a Copy Activity issue that the documentation doesn't cover.

Read more