Trying to Build a Metadata-Driven ADF Framework in v2: Will Parameterization Save Us?

I've been trying to build a proper metadata-driven ingest framework in ADF since 2015. The 2015 attempt with v1 was a partial success: it worked, it was clever, and it had rough edges that required explanation every time someone new touched it. The core problem was the absence of parameterization — without it, the framework had to use stored procedure loops and dynamic SQL to simulate behavior that the platform should have provided directly.

ADF v2 has parameterization. I've spent the last few weeks building the metadata-driven framework I've been waiting two years to build. Here's what I found.

The Framework Design

The concept is straightforward: a config table in Azure SQL Database contains one row per source table. Each row specifies the source server, source database, source table name, destination table name, watermark column name, and last watermark value. A master pipeline reads the config table and executes a generic ingest pipeline once per row, passing the config values as parameters. The generic pipeline handles the actual data movement.

In v1, this required a stored procedure to loop over the config table rows, with each iteration making an ADF REST API call to trigger the generic pipeline. Fragile, underdocumented, and dependent on the pipeline trigger API behaving consistently.

In v2, parameterized pipelines make the generic ingest pipeline straightforward:

{
  "name": "GenericIngestPipeline",
  "parameters": {
    "SourceServer":    { "type": "String" },
    "SourceDatabase":  { "type": "String" },
    "SourceTable":     { "type": "String" },
    "TargetTable":     { "type": "String" },
    "WatermarkColumn": { "type": "String" },
    "LastWatermark":   { "type": "String" }
  },
  "activities": [
    {
      "name": "CopyIncrementalData",
      "type": "Copy",
      "typeProperties": {
        "source": {
          "type": "SqlSource",
          "sqlReaderQuery": {
            "value": "@concat('SELECT * FROM [', pipeline().parameters.SourceDatabase, '].[dbo].[', pipeline().parameters.SourceTable, '] WHERE [', pipeline().parameters.WatermarkColumn, '] > ''', pipeline().parameters.LastWatermark, '''')",
            "type": "Expression"
          }
        }
      }
    },
    {
      "name": "UpdateWatermark",
      "type": "SqlServerStoredProcedure",
      "dependsOn": [{ "activity": "CopyIncrementalData", "dependencyConditions": ["Succeeded"] }],
      "typeProperties": {
        "storedProcedureName": "usp_UpdateWatermark",
        "storedProcedureParameters": {
          "TableName":      { "value": "@pipeline().parameters.SourceTable" },
          "WatermarkValue": { "value": "@activity('CopyIncrementalData').output.dataWritten" }
        }
      }
    }
  ]
}

This is clean. One pipeline definition handles every source table. The SQL query is built dynamically from parameters. The watermark update passes the activity output forward. This is what I wanted to build in 2015.

The Part Where I Hit a Wall

Now I need to drive this generic pipeline from the config table. The flow: read all rows from the config table, iterate over them, execute the generic ingest pipeline for each row with the row's values as parameters.

In v2 preview, this requires the ForEach activity: iterate over an array of items, execute inner activities for each item. The inner activities would be Execute Pipeline calls with the config row values mapped to pipeline parameters.

ForEach is not in the v2 preview yet.

There's a Lookup Activity — it reads from a dataset and returns rows. You can use it to fetch the config table contents. But without ForEach, the Lookup output just sits there. You can reference the first row with @activity('LookupConfig').output.firstRow. You can't iterate over all rows.

The Workaround, Again

So the 2016 metadata-driven ADF v2 framework looks like this: a Lookup Activity reads the config table. A stored procedure activity uses the output to queue work into an Azure SQL table. An Azure Function or a Logic App polls that queue and makes ADF REST API calls to trigger the generic ingest pipeline per row. The pipeline reports completion back to the queue. The master pipeline waits via polling.

It works. It's still fragile. It still requires explanation. The core problem is the same as 2015: the platform doesn't have the looping construct the pattern requires. Parameterization without ForEach is like having a function definition without a way to call it in a loop. You're halfway there.

The Expression Language Is Good

I want to be fair to what v2 does well. The expression language for parameterized pipelines is genuinely useful:

@concat(pipeline().parameters.SourceTable, '_', formatDateTime(utcnow(), 'yyyyMMdd'))

@addDays(trigger().scheduledTime, -1)

@if(equals(pipeline().parameters.LoadType, 'Full'),
    'SELECT * FROM SourceTable',
    concat('SELECT * FROM SourceTable WHERE ModifiedDate > ''', pipeline().parameters.LastWatermark, ''''))

String manipulation, date arithmetic, conditional expressions, activity output references — the expression language covers the patterns you hit in real pipelines. It's well-designed.

Where This Leaves Us

ADF v2 parameterization is real progress. The generic pipeline pattern is now clean and maintainable without workarounds. When ForEach ships — Microsoft says it's coming before GA — the metadata-driven framework becomes fully first-class. Until then, we're still bridging the last mile with stored procedures and API calls.

My honest assessment: v2 in late 2016 is about 70% of what I need. The parameterization half of the framework works. The iteration half doesn't. The trajectory is right. I'll keep building.

If you're experimenting with v2 parameterization and want to compare notes on what works and what still needs ForEach, I'm here to help.

Read more