I've been building parameterized ADF v2 pipelines since the feature landed in preview. Two years of production workloads, multiple clients, dozens of factory deployments. Some patterns have survived. Some haven't. Here are the ones that hold up.
Pattern 1: Generic Ingest Pipeline
The foundational metadata-driven pattern: one parameterized pipeline drives all your table ingestions. A config table defines the sources; a ForEach iterates over the config and calls the inner pipeline with the appropriate parameters.
The config table:
CREATE TABLE etl.IngestConfig (
ConfigId INT IDENTITY(1,1) PRIMARY KEY,
SourceSchema NVARCHAR(128) NOT NULL,
SourceTable NVARCHAR(128) NOT NULL,
TargetSchema NVARCHAR(128) NOT NULL,
TargetTable NVARCHAR(128) NOT NULL,
WatermarkColumn NVARCHAR(128) NULL, -- NULL = full load
IsActive BIT NOT NULL DEFAULT 1,
LoadOrder INT NOT NULL DEFAULT 100
);
The outer pipeline has one Lookup Activity reading from IngestConfig (filtered to IsActive = 1, ordered by LoadOrder), followed by a ForEach Activity over the result set.
Inside the ForEach, a single Execute Pipeline activity calls the inner pipeline with parameters:
{
"SourceSchema": "@item().SourceSchema",
"SourceTable": "@item().SourceTable",
"TargetSchema": "@item().TargetSchema",
"TargetTable": "@item().TargetTable",
"WatermarkColumn": "@item().WatermarkColumn",
"RunId": "@pipeline().RunId"
}
The inner pipeline has a Copy Activity with parameterized source and sink datasets. The source dataset takes SourceSchema and SourceTable as parameters; the sink takes TargetSchema and TargetTable. The Copy Activity's source query uses an If Condition Activity upstream to choose between a watermark query (if WatermarkColumn is not null) and a full load query.
Pattern 2: Environment-Aware Linked Services
All linked service connection strings are parameterized. No environment-specific values are hardcoded in the ADF resource itself. The ARM parameter file for each environment supplies the actual values.
/* Linked Service JSON in git */
{
"name": "AzureSqlLinkedService",
"type": "AzureSqlDatabase",
"typeProperties": {
"connectionString": "@{linkedService().connectionString}"
},
"parameters": {
"connectionString": {
"type": "secureString"
}
}
}
The ARM parameter file for the test environment:
{
"AzureSqlLinkedService_connectionString": {
"value": "Server=sql-test.database.windows.net;Database=SourceDB;Authentication=Active Directory Managed Identity"
}
}
Production parameter file has the production server. Same pipeline definition, environment-specific connections. This is table stakes for any multi-environment ADF deployment.
Pattern 3: Watermark-Based Incremental Load
The stored procedure that manages watermarks correctly under concurrent execution:
CREATE PROCEDURE etl.GetAndLockWatermark
@SourceTable NVARCHAR(256),
@NewWatermark DATETIME2 OUTPUT
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON;
BEGIN TRANSACTION;
-- Get current watermark with update lock
DECLARE @CurrentWatermark DATETIME2;
SELECT @CurrentWatermark = LastLoadDate
FROM etl.WatermarkStore WITH (UPDLOCK, ROWLOCK)
WHERE SourceTable = @SourceTable;
-- Set new watermark to current time (captured before query)
SET @NewWatermark = SYSUTCDATETIME();
-- Update watermark
UPDATE etl.WatermarkStore
SET LastLoadDate = @NewWatermark,
LastRunId = NEWID()
WHERE SourceTable = @SourceTable;
COMMIT TRANSACTION;
-- Return old watermark for use in source query
SELECT @CurrentWatermark AS CurrentWatermark,
@NewWatermark AS NewWatermark;
END
The pipeline calls this stored procedure first (via Stored Procedure Activity), captures CurrentWatermark and NewWatermark from the output, then uses them in the Copy Activity source query:
SELECT * FROM @{pipeline().parameters.SourceSchema}.@{pipeline().parameters.SourceTable}
WHERE @{pipeline().parameters.WatermarkColumn} > '@{activity('GetWatermark').output.firstRow.CurrentWatermark}'
AND @{pipeline().parameters.WatermarkColumn} <= '@{activity('GetWatermark').output.firstRow.NewWatermark}'
Pattern 4: Tumbling Window Trigger for Partitioned Loads
For time-partitioned data (daily partitions in a data lake), the Tumbling Window Trigger passes window boundaries into the pipeline as parameters. No custom watermark management needed for the scheduling layer.
{
"WindowStart": {
"value": "@formatDateTime(trigger().outputs.windowStartTime, 'yyyy-MM-dd')",
"type": "Expression"
},
"WindowEnd": {
"value": "@formatDateTime(trigger().outputs.windowEndTime, 'yyyy-MM-dd')",
"type": "Expression"
}
}
The pipeline partitions output to ADLS using the window dates in the path:
@concat(
'processed/sales/year=',
formatDateTime(trigger().outputs.windowStartTime, 'yyyy'),
'/month=',
formatDateTime(trigger().outputs.windowStartTime, 'MM'),
'/day=',
formatDateTime(trigger().outputs.windowStartTime, 'dd'),
'/'
)
Tumbling Window Triggers handle backfill automatically — if the factory is down for 12 hours, the trigger creates 12 pending windows and processes them sequentially on restart. You don't write backfill logic; the trigger handles it.
Anti-Patterns Learned the Hard Way
Don't put business logic in expressions. If it takes more than three function calls to express, it belongs in a stored procedure. I've seen expressions that span 15 lines of nested function calls. They work until they don't, and when they fail, the error message tells you the expression evaluated to an unexpected type but not which sub-expression caused it. Push complex logic to SQL.
Don't nest ForEach loops. ADF does not support nested ForEach activities. The outer ForEach iterates, the inner Execute Pipeline calls an inner pipeline that has its own ForEach. The indirection is necessary and it adds a layer of complexity to follow. If you're reaching for nested ForEach inside a single pipeline, you've found the boundary — introduce the inner pipeline.
Don't over-parameterize early. Start with the parameters you actually need for environment differences (connection strings, storage account names). Add parameters when you encounter a real need for runtime variation, not in anticipation of hypothetical future needs. Every parameter you add is a thing that can be misconfigured in the ARM parameter file. Minimize surface area until you need to expand it.
Don't use ADF activities for computation that belongs in a database. ADF is an orchestrator. The ForEach is a flow control mechanism, not a data processing loop. If you're iterating over 10,000 rows with a ForEach because you want to call a stored procedure for each row, you're doing it wrong — write a set-based stored procedure that processes all 10,000 rows at once and call it once from ADF.
These patterns have survived two years and multiple client environments. They're not clever; they're the opposite of clever. Boring, maintainable, debuggable. That's what production looks like. As always, if you're designing a metadata-driven framework and want a second opinion, I'm here to help.