Building a Metadata-Driven SSIS Generator: The Architecture
After inheriting the 200-package SSIS environment, the solution was clear: generate the packages programmatically from a metadata store rather than writing them by hand. The resulting system was one of the highest-leverage things I built during this period — adding a new source to the warehouse went from a half-day manual task to a metadata entry and a generation run. Here's how the architecture worked.
The Metadata Store Design
The metadata store was a SQL Server database (on Azure SQL by this point) with tables describing the data sources, transformations, and destinations. The core tables:
-- Source connection details
CREATE TABLE meta.DataSources (
DataSourceId INT NOT NULL IDENTITY PRIMARY KEY,
DataSourceName NVARCHAR(100) NOT NULL,
ConnectionString NVARCHAR(500) NOT NULL,
DatabaseName NVARCHAR(100) NOT NULL,
IsActive BIT NOT NULL DEFAULT 1
);
-- Tables to extract from each source
CREATE TABLE meta.SourceTables (
SourceTableId INT NOT NULL IDENTITY PRIMARY KEY,
DataSourceId INT NOT NULL REFERENCES meta.DataSources,
SchemaName NVARCHAR(100) NOT NULL,
TableName NVARCHAR(100) NOT NULL,
IncrementalColumn NVARCHAR(100) NULL,
LoadType NVARCHAR(20) NOT NULL -- 'Full' or 'Incremental'
);
-- Column-level mappings and transformations
CREATE TABLE meta.ColumnMappings (
ColumnMappingId INT NOT NULL IDENTITY PRIMARY KEY,
SourceTableId INT NOT NULL REFERENCES meta.SourceTables,
SourceColumnName NVARCHAR(100) NOT NULL,
DestColumnName NVARCHAR(100) NOT NULL,
DataType NVARCHAR(50) NOT NULL,
IsNullable BIT NOT NULL DEFAULT 1,
TransformExpression NVARCHAR(500) NULL
);The Generation Approach
SSIS packages are DTSX files — which are XML. The Microsoft.SqlServer.ManagedDTS library exposes a managed API for creating SSIS packages in code without touching XML directly.
The generator was a C# console application that queried the metadata store for all active source tables, then for each source table instantiated a Package object, applied standard control flow elements from a template, added the data flow task configured from the metadata, and saved the resulting DTSX file.
The Standard Control Flow Template
The key insight was the template object. Rather than writing code to add logging, error handling, and standard parameters to every generated package individually, a method applied a standard control flow template to any package:
private static void ApplyStandardTemplate(Package package, string packageName,
string logConnectionString)
{
// Standard package logging
var logProvider = package.LogProviders.Add("DTS.LogProviderSQLServer.3");
logProvider.ConfigString = logConnectionString;
package.LoggingMode = DTSLoggingMode.Enabled;
// Package-level variables for audit tracking
package.Variables.Add("PackageName", false, "User", packageName);
package.Variables.Add("LoadStartTime", false, "User", DateTime.MinValue);
package.Variables.Add("RowsExtracted", false, "User", 0);
package.Variables.Add("RowsLoaded", false, "User", 0);
// Standard OnError event handler
var errorHandler = package.EventHandlers.Add("OnError");
// ... add email task to errorHandler
}Every generated package got this template applied. Logging: consistent. Error handling: consistent. Audit variables: consistent. When we needed to change the logging schema six months later, we changed the template and regenerated — not 200 individual packages.
What Regeneration Meant
The packages were generated artifacts. They lived in source control but were generated from the metadata store, not authored by hand. When requirements changed — add the audit column, change the merge logic, update the error handler email address — the change was made once (in the metadata store or in the generator's template), and all affected packages were regenerated in a single run.
Adding a new source table: add a row to meta.SourceTables, add the column mappings to meta.ColumnMappings, run the generator. One new package, consistent with the other 200. That's the payoff. As always, I'm here to help.