Generating Azure Data Factory Assets From a Metadata Store

Shannon Lowder

15 Sep 2014 — 2 min read

The metadata-driven generation pattern I built for SSIS packages applied almost directly to ADF. The metadata store already described source tables, column mappings, and destination tables. ADF pipelines were JSON. The generator just needed to target a different output format. Here's what was different about generating for ADF versus generating for SSIS.

What Needed to Be Generated

For each source table in the metadata store, the ADF representation required three JSON artifacts:

Linked Service (one per data source) — the connection definition
Dataset (one per source table, one per destination table) — the data structure
Pipeline (one per source table) — the copy activity and its schedule

Linked services were shared across tables from the same source, so the generator deduplicated — one linked service per unique source, not one per table.

The Generator

Because ADF's format was JSON, the ADF generator was simpler than the SSIS generator — no ManagedDTS dependency, no binary format concerns. The generator read from the same metadata store and emitted JSON files:

import json

def generate_linked_service(data_source):
    return {
        "name": f"ls_{data_source.name}",
        "properties": {
            "type": "AzureSqlDatabase",
            "typeProperties": {
                "connectionString": f"Data Source={data_source.server};Initial Catalog={data_source.database};..."
            }
        }
    }

def generate_pipeline(source_table, column_mappings):
    column_mapping_str = ",".join(
        f"{m.source_column}:{m.dest_column}" for m in column_mappings
    )
    return {
        "name": f"pl_copy_{source_table.schema}_{source_table.name}",
        "properties": {
            "activities": [{
                "type": "Copy",
                "name": f"Copy_{source_table.schema}_{source_table.name}",
                "inputs": [{"name": f"ds_{source_table.schema}_{source_table.name}_Src"}],
                "outputs": [{"name": f"ds_{source_table.schema}_{source_table.name}_Dest"}],
                "typeProperties": {
                    "source": {"type": "SqlSource"},
                    "sink": {"type": "SqlSink", "writeBatchSize": 10000},
                    "translator": {
                        "type": "TabularTranslator",
                        "columnMappings": column_mapping_str
                    }
                }
            }],
            "start": "2014-01-01T00:00:00Z",
            "end": "2099-12-31T00:00:00Z",
            "isPaused": False
        }
    }

Deploying Generated Assets

ADF v1 deployment required strict ordering: linked services first, then datasets, then pipelines. The deploy script uploaded them in that order using the ADF .NET SDK.

def deploy_adf_assets(adf_client, resource_group, factory_name, output_dir):
    # 1. Linked services (no dependencies)
    for ls in load_json_dir(f"{output_dir}/linked_services"):
        adf_client.linked_services.create_or_update(
            resource_group, factory_name, ls["name"], ls["properties"])

    # 2. Datasets (depend on linked services)
    for ds in load_json_dir(f"{output_dir}/datasets"):
        adf_client.datasets.create_or_update(
            resource_group, factory_name, ds["name"], ds["properties"])

    # 3. Pipelines (depend on datasets)
    for pl in load_json_dir(f"{output_dir}/pipelines"):
        adf_client.pipelines.create_or_update(
            resource_group, factory_name, pl["name"], pl["properties"])

The Meta-Point

The SSIS generator targeted DTSX. The ADF generator targeted JSON. Both read from the same metadata store. The metadata store was the single point of truth for what the pipelines should do.

When the company later needed 50 new pipelines, it took an afternoon to populate the metadata tables and run the generator. The alternative — 50 hand-written pipelines, each slightly different — would have taken weeks and produced an inconsistent mess. The value of the metadata-driven approach compounds over time. As always, I'm here to help.

Generating Azure Data Factory Assets From a Metadata Store

Shannon Lowder

What Needed to Be Generated

The Generator

Deploying Generated Assets

The Meta-Point

Read more

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

You Don't Need Fable. You Need a Router.

DAIS 2026: Genie One and the Context Problem Databricks Is Solving