The Problem With Hand-Written SSIS Packages at Scale
SSIS packages are not bad. I want to be clear about that before what follows, because it might sound like a condemnation. SSIS is a mature, capable ETL tool that handles complex data transformations, handles errors gracefully, and integrates well with the SQL Server ecosystem. The problem isn't SSIS. The problem is what happens when you write SSIS packages by hand, at scale, over time.
In 2013, I inherited a data warehouse environment with 200+ SSIS packages. I want to tell you what that looked like.
The Patterns That Emerge Without Discipline
Each package handled the extraction and loading of one table or one data source. At some point, someone had decided on a standard pattern: extract from source, apply transformations, load to staging, then merge to the warehouse. Sound design. The problem was that this pattern was implemented 200 times, by hand, with slight variations each time.
Logging inconsistency. Some packages logged to a SQL table. Some wrote to a text file. Some had no logging at all. When a package failed, the first 20 minutes of investigation were often spent figuring out whether it had any logging and where that logging went.
Error handling variations. Some packages had onfailure handlers that sent email. Some failed silently. Some had error handling on the data flow tasks but not on the control flow. There was no way to know how a given package would behave on failure without reading it.
Connection manager sprawl. Each package had its own connection managers. When the source database server changed, someone had to open every package, find the connection manager, update it. 200 packages. Every package. By hand.
No version history. SSIS packages in 2013 were binary DTSX files. Diffing them in source control was meaningless — you couldn't tell what changed between commits. The package was effectively un-versioned.
The Maintenance Cost
Adding a new source table to the warehouse took a developer half a day. Create the package from scratch (or copy an existing one and modify it — the copy often brought along subtle bugs from the original). Configure the connection managers. Set up the transformations. Add the logging. Test it. At 200 tables, that's 200 half-days of work that produced 200 slightly different implementations of the same pattern.
More concretely: we needed to add an audit column to every staging table. This required modifying every package that loaded those staging tables. 200 packages. Each one opened individually, the column added to the destination transformation, the mapping updated, the package saved and tested. Two weeks of work for one column.
What This Points At
When you find yourself doing the same thing 200 times with minor variations, that's a signal. Not a signal to stop doing it — those packages needed to exist — but a signal that you're solving the wrong problem. The right problem isn't "how do I write these 200 packages efficiently." It's "how do I maintain a metadata description of these 200 patterns and generate the packages from it."
That's the insight that led to building a metadata-driven SSIS generation framework. I'll cover the architecture and implementation in the posts that follow. The key point for this post: the pain of maintaining hand-written SSIS packages at scale isn't an SSIS problem. It's a code generation problem. You need the right tool for each job. As always, I'm here to help.