Generating Azure Data Factory Assets From a Metadata Store
The metadata-driven generation pattern I built for SSIS packages applied almost directly to ADF. The metadata store already described source tables, column mappings, and destination tables. ADF pipelines were JSON. The generator just needed to target a different output format. Here's what was different about generating for ADF versus generating for SSIS.
What Needed to Be Generated
For each source table in the metadata store, the ADF representation required three JSON artifacts:
- Linked Service (one per data source) — the connection definition
- Dataset (one per source table, one per destination table) — the data structure
- Pipeline (one per source table) — the copy activity and its schedule
Linked services were shared across tables from the same source, so the generator deduplicated — one linked service per unique source, not one per table.
The Generator
Because ADF's format was JSON, the ADF generator was simpler than the SSIS generator — no ManagedDTS dependency, no binary format concerns. The generator read from the same metadata store and emitted JSON files:
import json
def generate_linked_service(data_source):
return {
"name": f"ls_{data_source.name}",
"properties": {
"type": "AzureSqlDatabase",
"typeProperties": {
"connectionString": f"Data Source={data_source.server};Initial Catalog={data_source.database};..."
}
}
}
def generate_pipeline(source_table, column_mappings):
column_mapping_str = ",".join(
f"{m.source_column}:{m.dest_column}" for m in column_mappings
)
return {
"name": f"pl_copy_{source_table.schema}_{source_table.name}",
"properties": {
"activities": [{
"type": "Copy",
"name": f"Copy_{source_table.schema}_{source_table.name}",
"inputs": [{"name": f"ds_{source_table.schema}_{source_table.name}_Src"}],
"outputs": [{"name": f"ds_{source_table.schema}_{source_table.name}_Dest"}],
"typeProperties": {
"source": {"type": "SqlSource"},
"sink": {"type": "SqlSink", "writeBatchSize": 10000},
"translator": {
"type": "TabularTranslator",
"columnMappings": column_mapping_str
}
}
}],
"start": "2014-01-01T00:00:00Z",
"end": "2099-12-31T00:00:00Z",
"isPaused": False
}
}Deploying Generated Assets
ADF v1 deployment required strict ordering: linked services first, then datasets, then pipelines. The deploy script uploaded them in that order using the ADF .NET SDK.
def deploy_adf_assets(adf_client, resource_group, factory_name, output_dir):
# 1. Linked services (no dependencies)
for ls in load_json_dir(f"{output_dir}/linked_services"):
adf_client.linked_services.create_or_update(
resource_group, factory_name, ls["name"], ls["properties"])
# 2. Datasets (depend on linked services)
for ds in load_json_dir(f"{output_dir}/datasets"):
adf_client.datasets.create_or_update(
resource_group, factory_name, ds["name"], ds["properties"])
# 3. Pipelines (depend on datasets)
for pl in load_json_dir(f"{output_dir}/pipelines"):
adf_client.pipelines.create_or_update(
resource_group, factory_name, pl["name"], pl["properties"])The Meta-Point
The SSIS generator targeted DTSX. The ADF generator targeted JSON. Both read from the same metadata store. The metadata store was the single point of truth for what the pipelines should do.
When the company later needed 50 new pipelines, it took an afternoon to populate the metadata tables and run the generator. The alternative — 50 hand-written pipelines, each slightly different — would have taken weeks and produced an inconsistent mess. The value of the metadata-driven approach compounds over time. As always, I'm here to help.