Last year I finished a project where we delivered a data warehouse with over forty SSIS packages, a dozen SQL Agent jobs, and about two hundred stored procedures. Each piece worked. The integration testing passed. We signed off and moved on.
Three months later the client called. Someone had added a column to their source CRM system. Four packages broke silently. Data stopped flowing. Nobody noticed for two weeks because the reports were still running — they were just running on stale data.
We fixed it. But the conversation I had with myself on the drive home stuck with me: we had built pipelines, not a pipeline system. The distinction matters more than I'd thought.
What the Client Actually Wanted
The client didn't want SSIS packages. They wanted accurate, timely analytics so their sales team could make better decisions. The pipeline was infrastructure — the plumbing that made the analytics possible. But somewhere in the project, the pipeline became the deliverable. We optimized for delivery rather than for durability.
This is a trap I've seen on almost every data project. The ETL work is measurable and visible: packages, procedures, jobs. The analytics is what the stakeholders actually care about. So you end up building impressive plumbing that nobody upstairs thinks about until it breaks.
Infrastructure Needs to Behave Like Infrastructure
Real infrastructure — power grids, water systems, highway networks — is built for reliability and change absorption, not for one-time delivery. You don't rebuild a road every time someone wants to drive a different vehicle on it.
Data pipelines should work the same way. When a column gets added to the source, the pipeline should adapt or at least fail loudly. When a business rule changes, the impact should be contained and the fix should be straightforward. When something goes wrong, you should know about it before your users do.
None of that is automatic. It requires intentional design decisions early in the project:
- How does the pipeline handle unexpected schema changes?
- What does failure look like, and who finds out first?
- What's the recovery procedure when a load fails halfway through?
- Can you re-run a load without duplicating data?
These aren't questions about SSIS. They're questions about architecture. And the answers have to be baked into the design, not bolted on after the fact.
A Concrete Example: Idempotency
The simplest principle that separates infrastructure from a collection of scripts: idempotency. Run it once, you get the right result. Run it again, you still get the right result. Run it ten times, same answer.
Most ad-hoc ETL scripts aren't idempotent. They append data. Run them twice and you get duplicates. This seems like a small problem until a network hiccup causes your SQL Agent job to retry at 2am and you show up to work with double-counted revenue figures.
-- Non-idempotent (the common first attempt)
INSERT INTO dw.FactSales
SELECT * FROM staging.SalesLoad;
-- Idempotent: clear and reload the target window
DELETE FROM dw.FactSales
WHERE SaleDate = CAST(GETDATE() AS DATE);
INSERT INTO dw.FactSales
SELECT * FROM staging.SalesLoad
WHERE SaleDate = CAST(GETDATE() AS DATE);
-- Or with MERGE for keyed incremental loads
MERGE dw.FactSales AS target
USING staging.SalesLoad AS source
ON target.SaleId = source.SaleId
WHEN MATCHED THEN
UPDATE SET target.Amount = source.Amount
WHEN NOT MATCHED THEN
INSERT (SaleId, SaleDate, Amount)
VALUES (source.SaleId, source.SaleDate, source.Amount);
This is a small design decision. It takes maybe ten extra minutes the first time you think it through. And it will save you hours of debugging in the months after delivery.
The Mindset Shift
If you think of the pipeline as the product, you optimize for completion. You're done when the packages run clean. If you think of the pipeline as infrastructure, you optimize for what happens after delivery. You're not done until it can survive the first six months of production life on its own.
That's a harder standard. It requires more upfront thinking. It requires conversations with stakeholders about failure modes and retry logic that nobody particularly enjoys having.
But it's the difference between delivering a system and delivering a project. Systems are still working three years later. Projects are the things you get called back to fix.
If you've navigated this — figured out how to design data pipelines that behave more like infrastructure — I'd love to compare notes. As always, I'm here to help.