Five Years of ADF: What the Metadata-Driven Framework Actually Looks Like Now

I started trying to build a metadata-driven ADF framework in 2015, with ADF v1. It was miserable. V1 didn't have ForEach. It didn't have parameters. The closest you could get was a .NET custom activity that read a config table and emitted dynamic pipeline definitions — which worked, sort of, and required maintaining a custom activity alongside your pipeline definitions. I tried it. I don't recommend it.

It's 2019. ADF v2 has parameterized pipelines, ForEach Activity, Lookup Activity, and Tumbling Window Triggers. The metadata-driven framework that wasn't viable in 2015 is now straightforward to build. Here's what it actually looks like.

The Architecture

Config Tables

-- Source table configuration
CREATE TABLE etl.IngestConfig (
    ConfigId         INT IDENTITY(1,1) PRIMARY KEY,
    SourceSystem     NVARCHAR(128) NOT NULL,  -- 'SQLSERVER', 'SAP', 'SALESFORCE'
    SourceSchema     NVARCHAR(128) NOT NULL,
    SourceTable      NVARCHAR(128) NOT NULL,
    TargetSchema     NVARCHAR(128) NOT NULL,
    TargetTable      NVARCHAR(128) NOT NULL,
    WatermarkColumn  NVARCHAR(128) NULL,      -- NULL = full load
    WatermarkType    NVARCHAR(32) NULL,       -- 'DATETIME', 'INTEGER'
    LoadFrequency    NVARCHAR(32) NOT NULL,   -- 'DAILY', 'HOURLY', 'REALTIME'
    IsActive         BIT NOT NULL DEFAULT 1,
    LoadOrder        INT NOT NULL DEFAULT 100,
    LastModifiedBy   NVARCHAR(128) NOT NULL,
    LastModifiedDate DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME()
);

-- Watermark store
CREATE TABLE etl.WatermarkStore (
    SourceTable      NVARCHAR(256) PRIMARY KEY,
    LastLoadDate     DATETIME2 NULL,
    LastRunId        NVARCHAR(36) NULL,
    RowsLoaded       BIGINT NULL,
    LastUpdated      DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME()
);

-- Execution log
CREATE TABLE etl.ExecutionLog (
    LogId         BIGINT IDENTITY(1,1) PRIMARY KEY,
    PipelineName  NVARCHAR(256) NOT NULL,
    RunId         NVARCHAR(36) NOT NULL,
    SourceTable   NVARCHAR(256) NULL,
    StartTime     DATETIME2 NOT NULL,
    EndTime       DATETIME2 NULL,
    RowsRead      BIGINT NULL,
    RowsWritten   BIGINT NULL,
    Status        NVARCHAR(50) NOT NULL,  -- 'Running', 'Succeeded', 'Failed'
    ErrorMessage  NVARCHAR(MAX) NULL
);

The Pipeline Structure

MasterIngest (outer pipeline, no parameters):

  1. Lookup Activity reads IngestConfig (WHERE IsActive = 1, ORDER BY LoadOrder)
  2. ForEach Activity iterates over results (batchCount=5 for parallel execution)
  3. Inside ForEach: Execute Pipeline activity calls IngestTable with item parameters

IngestTable (inner pipeline, parameters: SourceSystem, SourceSchema, SourceTable, TargetSchema, TargetTable, WatermarkColumn, WatermarkType, RunId):

  1. If Condition: WatermarkColumn is not null → incremental path; else → full load path
  2. Incremental path: Stored Procedure Activity calls GetAndLockWatermark, gets current + new watermark
  3. Copy Activity: parameterized source query using watermark boundaries, parameterized sink
  4. Stored Procedure Activity: writes to ExecutionLog
  5. On failure: Stored Procedure Activity writes failure to ExecutionLog

The BIML Comparison

In my SSIS years, I used BIML to generate parameterized packages. BIML is a templating language that generates SSIS XML: you write a BIML template, define your sources in a config table, and BIML generates a distinct SSIS package per source at compile time.

The BIML approach gives you:

  • Isolated artifacts per source — 50 tables means 50 packages, each separately deployable
  • Compile-time errors — BIML catches configuration problems before deployment
  • Package-level logging — each package has its own log, independently debuggable
  • Explicit artifact inspection — you can open any generated package and read what it does

The ADF metadata-driven approach gives you:

  • Single pipeline definition — one pipeline drives all sources, config in the table
  • Runtime parameterization — parameters are resolved at execution, not compile time
  • Central config management — add a new source table by inserting a row, no redeploy
  • Unified monitoring — all source ingestions are activities in the same pipeline run

The Real Tradeoff

BIML's compile-time generation gives you isolated, debuggable, deployable artifacts per source. When table X fails, you open package X and debug it independently of packages Y and Z.

ADF's runtime approach gives you a single pipeline definition that's simpler to maintain — adding a source table is a config insert, not a BIML regeneration and redeploy. But when table X fails inside a ForEach iteration, you're looking at a failed activity in a ForEach iteration, and the activity logs show you what happened for that specific item. It's debuggable; it's just a different model.

The practical difference: BIML generates 50 packages that are independently testable in SSIS. ADF generates one pipeline that you test with different config rows. BIML is more work to set up and regenerate; ADF is more work to debug per-source failures because everything runs through the same pipeline path.

Neither is objectively better. They reflect different execution models. Understand the tradeoff before you build.

What the Framework Looks Like Running

A client environment I set this up for last quarter: 127 source tables across three SQL Server instances and one SAP system. The IngestConfig table has 127 rows. The MasterIngest pipeline runs daily at 1 AM. ForEach with batchCount=8 runs 8 tables in parallel. Runtime is 42 minutes. Adding a new source table: insert a row in IngestConfig, restart the trigger. No pipeline redeploy.

That's the metadata-driven framework in production. The investment is one weekend to build the config tables, stored procedures, and pipeline structure. The return is a source table addition workflow that takes 30 seconds instead of 30 minutes.

Trust me on this one: build it properly once, with proper locking on the watermark store and proper error handling in the ExecutionLog, and it runs itself. As always, if you want to talk through the config table design for a specific environment, I'm here to help.

Read more