I started trying to build a metadata-driven ADF framework in 2015, with ADF v1. It was miserable. V1 didn't have ForEach. It didn't have parameters. The closest you could get was a .NET custom activity that read a config table and emitted dynamic pipeline definitions — which worked, sort of, and required maintaining a custom activity alongside your pipeline definitions. I tried it. I don't recommend it.
It's 2019. ADF v2 has parameterized pipelines, ForEach Activity, Lookup Activity, and Tumbling Window Triggers. The metadata-driven framework that wasn't viable in 2015 is now straightforward to build. Here's what it actually looks like.
The Architecture
Config Tables
-- Source table configuration
CREATE TABLE etl.IngestConfig (
ConfigId INT IDENTITY(1,1) PRIMARY KEY,
SourceSystem NVARCHAR(128) NOT NULL, -- 'SQLSERVER', 'SAP', 'SALESFORCE'
SourceSchema NVARCHAR(128) NOT NULL,
SourceTable NVARCHAR(128) NOT NULL,
TargetSchema NVARCHAR(128) NOT NULL,
TargetTable NVARCHAR(128) NOT NULL,
WatermarkColumn NVARCHAR(128) NULL, -- NULL = full load
WatermarkType NVARCHAR(32) NULL, -- 'DATETIME', 'INTEGER'
LoadFrequency NVARCHAR(32) NOT NULL, -- 'DAILY', 'HOURLY', 'REALTIME'
IsActive BIT NOT NULL DEFAULT 1,
LoadOrder INT NOT NULL DEFAULT 100,
LastModifiedBy NVARCHAR(128) NOT NULL,
LastModifiedDate DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME()
);
-- Watermark store
CREATE TABLE etl.WatermarkStore (
SourceTable NVARCHAR(256) PRIMARY KEY,
LastLoadDate DATETIME2 NULL,
LastRunId NVARCHAR(36) NULL,
RowsLoaded BIGINT NULL,
LastUpdated DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME()
);
-- Execution log
CREATE TABLE etl.ExecutionLog (
LogId BIGINT IDENTITY(1,1) PRIMARY KEY,
PipelineName NVARCHAR(256) NOT NULL,
RunId NVARCHAR(36) NOT NULL,
SourceTable NVARCHAR(256) NULL,
StartTime DATETIME2 NOT NULL,
EndTime DATETIME2 NULL,
RowsRead BIGINT NULL,
RowsWritten BIGINT NULL,
Status NVARCHAR(50) NOT NULL, -- 'Running', 'Succeeded', 'Failed'
ErrorMessage NVARCHAR(MAX) NULL
);
The Pipeline Structure
MasterIngest (outer pipeline, no parameters):
- Lookup Activity reads IngestConfig (WHERE IsActive = 1, ORDER BY LoadOrder)
- ForEach Activity iterates over results (batchCount=5 for parallel execution)
- Inside ForEach: Execute Pipeline activity calls IngestTable with item parameters
IngestTable (inner pipeline, parameters: SourceSystem, SourceSchema, SourceTable, TargetSchema, TargetTable, WatermarkColumn, WatermarkType, RunId):
- If Condition: WatermarkColumn is not null → incremental path; else → full load path
- Incremental path: Stored Procedure Activity calls GetAndLockWatermark, gets current + new watermark
- Copy Activity: parameterized source query using watermark boundaries, parameterized sink
- Stored Procedure Activity: writes to ExecutionLog
- On failure: Stored Procedure Activity writes failure to ExecutionLog
The BIML Comparison
In my SSIS years, I used BIML to generate parameterized packages. BIML is a templating language that generates SSIS XML: you write a BIML template, define your sources in a config table, and BIML generates a distinct SSIS package per source at compile time.
The BIML approach gives you:
- Isolated artifacts per source — 50 tables means 50 packages, each separately deployable
- Compile-time errors — BIML catches configuration problems before deployment
- Package-level logging — each package has its own log, independently debuggable
- Explicit artifact inspection — you can open any generated package and read what it does
The ADF metadata-driven approach gives you:
- Single pipeline definition — one pipeline drives all sources, config in the table
- Runtime parameterization — parameters are resolved at execution, not compile time
- Central config management — add a new source table by inserting a row, no redeploy
- Unified monitoring — all source ingestions are activities in the same pipeline run
The Real Tradeoff
BIML's compile-time generation gives you isolated, debuggable, deployable artifacts per source. When table X fails, you open package X and debug it independently of packages Y and Z.
ADF's runtime approach gives you a single pipeline definition that's simpler to maintain — adding a source table is a config insert, not a BIML regeneration and redeploy. But when table X fails inside a ForEach iteration, you're looking at a failed activity in a ForEach iteration, and the activity logs show you what happened for that specific item. It's debuggable; it's just a different model.
The practical difference: BIML generates 50 packages that are independently testable in SSIS. ADF generates one pipeline that you test with different config rows. BIML is more work to set up and regenerate; ADF is more work to debug per-source failures because everything runs through the same pipeline path.
Neither is objectively better. They reflect different execution models. Understand the tradeoff before you build.
What the Framework Looks Like Running
A client environment I set this up for last quarter: 127 source tables across three SQL Server instances and one SAP system. The IngestConfig table has 127 rows. The MasterIngest pipeline runs daily at 1 AM. ForEach with batchCount=8 runs 8 tables in parallel. Runtime is 42 minutes. Adding a new source table: insert a row in IngestConfig, restart the trigger. No pipeline redeploy.
That's the metadata-driven framework in production. The investment is one weekend to build the config tables, stored procedures, and pipeline structure. The return is a source table addition workflow that takes 30 seconds instead of 30 minutes.
Trust me on this one: build it properly once, with proper locking on the watermark store and proper error handling in the ExecutionLog, and it runs itself. As always, if you want to talk through the config table design for a specific environment, I'm here to help.