Metadata-Driven Pipelines: The Pattern That Keeps Paying Off
By late 2014, I had built metadata-driven generators for SSIS packages, Azure Data Factory pipelines, and the early pieces of a metadata-driven approach to SSDT publication profiles. The same pattern kept emerging in different contexts, and at some point it was worth stepping back and articulating why the pattern worked and where it applied — so I could recognize the next opportunity to use it instead of waiting to hit the pain first.
The Core Pattern
Metadata-driven generation follows a consistent structure:
- A metadata store — a database or structured file that describes the things you're going to build. Tables to extract. Columns to map. Transformations to apply. The metadata store knows the what, not the how.
- A generator — code that reads the metadata and produces output artifacts. DTSX files, JSON pipelines, SQL scripts. The generator knows the how, applying a consistent pattern to every entry in the metadata.
- Generated artifacts — checked into source control but treated as generated, not hand-authored. When you change how all pipelines work, change the generator. When you change what a specific pipeline does, change the metadata.
When to Apply It
The metadata-driven pattern is worth the investment when:
- You have more than ~20 instances of the same pattern. Below that, the generator pays off slowly. Above it, the ROI is clear.
- The pattern has a standard structure with per-instance variations. "Extract a table from source A and load it to destination B" is a pattern. The variations are which table, which columns, which source. That's exactly what a metadata store captures.
- The pattern evolves. If you'll never need to change how all pipelines behave, the generator doesn't save you anything. If you'll evolve the pattern over time — add audit logging, change error handling, update transformation logic — the generator multiplies that change across all instances automatically.
- New instances are added regularly. If you're adding 10 new sources per quarter, each new source is a metadata entry rather than a half-day of work.
Where It Falls Short
Not everything is a pattern. One-off pipelines with unique transformation logic don't benefit from generation — you'd spend more time making the generator flexible enough to express the uniqueness than you'd save. The judgment call: is a pipeline "mostly standard with slight variation" (metadata-driven) or "genuinely unique" (hand-written)?
The metadata store also introduces indirection. When something goes wrong with a generated pipeline, you need to look at both the generator code and the metadata. This is manageable if the generator is well-tested and the metadata is clean. It's a nightmare if the generator is tangled and the metadata has undocumented exceptions. The metadata store needs the same care as production code.
The Broader Application
I've applied this pattern to: SSIS packages, ADF pipelines, database deployment scripts, SQL Server Agent job configurations, Azure Resource Manager templates for repeating resource patterns, and eventually LangGraph workflow definitions. The tool changed every time. The metadata-store-plus-generator structure stayed the same.
Once you see this pattern working in one context, you start seeing the opportunities in other contexts. The code generator you build for SSIS packages isn't just solving the SSIS problem — it's building the mental model you'll apply the next time you're facing 50 instances of the same pattern and the choice between writing them by hand or generating them from a metadata store.
Generate, don't write. It takes longer the first time and pays off for years after. As always, I'm here to help.