Building a Metadata-Driven SSIS Generator: The Implementation

The previous post covered the architecture: metadata store design, generator structure, template approach. This one gets into the implementation details — specifically, how the Microsoft.SqlServer.ManagedDTS API works for programmatically building data flow tasks, and the gotchas I hit along the way.

Setting Up the Data Flow Task

// Add a data flow task to the package
var dataFlowTask = package.Executables.Add("STOCK:PipelineTask") as TaskHost;
dataFlowTask.Name = $"DFT Extract {sourceTable.SchemaName}.{sourceTable.TableName}";

// Get the pipeline object to add components
var pipeline = dataFlowTask.InnerObject as MainPipe;

// Add OLE DB Source
var sourceComponent = pipeline.ComponentMetaDataCollection.New();
sourceComponent.ComponentClassID = "DTSAdapter.OleDbSource.5";
sourceComponent.ValidateExternalMetadata = true;

var sourceDesigntime = sourceComponent.Instantiate();
sourceDesigntime.ProvideComponentProperties();

Configuring the Source Component

// Set the connection manager
sourceComponent.RuntimeConnectionCollection[0].ConnectionManagerID =
    sourceConnectionManager.ID;

// Build the source query (supports incremental loads)
var sqlQuery = sourceTable.LoadType == "Incremental"
    ? $"SELECT {columnList} FROM [{sourceTable.SchemaName}].[{sourceTable.TableName}] " +
      $"WHERE [{sourceTable.IncrementalColumn}] > ?"
    : $"SELECT {columnList} FROM [{sourceTable.SchemaName}].[{sourceTable.TableName}]";

sourceDesigntime.SetComponentProperty("AccessMode", 2); // SQL Command mode
sourceDesigntime.SetComponentProperty("SqlCommand", sqlQuery);

// CRITICAL: connect to source to populate column metadata
sourceDesigntime.AcquireConnections(null);
sourceDesigntime.ReinitializeMetaData();
sourceDesigntime.ReleaseConnections();

The ReinitializeMetaData() call is critical. Without it, the component's output column collection is empty and you can't configure downstream mappings. The generator must connect to the actual source database during package generation.

Adding the Destination and Path

// Add OLE DB Destination
var destComponent = pipeline.ComponentMetaDataCollection.New();
destComponent.ComponentClassID = "DTSAdapter.OleDbDestination.5";
var destDesigntime = destComponent.Instantiate();
destDesigntime.ProvideComponentProperties();

destDesigntime.SetComponentProperty("OpenRowset",
    $"[{destTable.SchemaName}].[{destTable.TableName}]");

destDesigntime.AcquireConnections(null);
destDesigntime.ReinitializeMetaData();
destDesigntime.ReleaseConnections();

// Create path connecting source to destination — MUST happen before column mapping
var path = pipeline.PathCollection.New();
path.AttachPathAndPropagateNotifications(
    sourceComponent.OutputCollection[0],
    destComponent.InputCollection[0]);

// Now map columns
foreach (var mapping in columnMappings)
{
    var sourceCol = sourceComponent.OutputCollection[0]
        .OutputColumnCollection[mapping.SourceColumnName];
    var destCol = destComponent.InputCollection[0]
        .InputColumnCollection[mapping.DestColumnName];

    if (destCol != null && sourceCol != null)
        destCol.LineageID = sourceCol.ID;
}

The Gotchas

Column mapping order matters. Map columns after the path is created, not before. Path propagation populates the destination input column collection. Mapping before path creation results in a package that opens but fails validation.

Data type mismatches. The API won't warn you if you map a DECIMAL to an INT at generation time — you'll find out at runtime. Add explicit type checking: if mapped types are incompatible, throw during generation rather than at package execution.

The Application object must be created once. Creating the SSIS Application object per package causes memory leaks. Create it once in Main() and pass it through.

// In Main()
var app = new Application();
var generator = new PackageGenerator(app, metadataConnectionString);
generator.GenerateAll(); // processes all active SourceTables

The complete generator ran in under 10 minutes for 200 packages. Adding it to the CI pipeline meant every metadata change automatically regenerated and checked in the affected packages. The packages were always consistent with the metadata. That's the goal. As always, I'm here to help.

Read more