The Pipeline Isn't the Product

Last year I finished a project where we delivered a data warehouse with over forty SSIS packages, a dozen SQL Agent jobs, and about two hundred stored procedures. Each piece worked. The integration testing passed. We signed off and moved on.

Three months later the client called. Someone had added a column to their source CRM system. Four packages broke silently. Data stopped flowing. Nobody noticed for two weeks because the reports were still running — they were just running on stale data.

We fixed it. But the conversation I had with myself on the drive home stuck with me: we had built pipelines, not a pipeline system. The distinction matters more than I'd thought.

What the Client Actually Wanted

The client didn't want SSIS packages. They wanted accurate, timely analytics so their sales team could make better decisions. The pipeline was infrastructure — the plumbing that made the analytics possible. But somewhere in the project, the pipeline became the deliverable. We optimized for delivery rather than for durability.

This is a trap I've seen on almost every data project. The ETL work is measurable and visible: packages, procedures, jobs. The analytics is what the stakeholders actually care about. So you end up building impressive plumbing that nobody upstairs thinks about until it breaks.

Infrastructure Needs to Behave Like Infrastructure

Real infrastructure — power grids, water systems, highway networks — is built for reliability and change absorption, not for one-time delivery. You don't rebuild a road every time someone wants to drive a different vehicle on it.

Data pipelines should work the same way. When a column gets added to the source, the pipeline should adapt or at least fail loudly. When a business rule changes, the impact should be contained and the fix should be straightforward. When something goes wrong, you should know about it before your users do.

None of that is automatic. It requires intentional design decisions early in the project:

How does the pipeline handle unexpected schema changes?
What does failure look like, and who finds out first?
What's the recovery procedure when a load fails halfway through?
Can you re-run a load without duplicating data?

These aren't questions about SSIS. They're questions about architecture. And the answers have to be baked into the design, not bolted on after the fact.

A Concrete Example: Idempotency

The simplest principle that separates infrastructure from a collection of scripts: idempotency. Run it once, you get the right result. Run it again, you still get the right result. Run it ten times, same answer.

Most ad-hoc ETL scripts aren't idempotent. They append data. Run them twice and you get duplicates. This seems like a small problem until a network hiccup causes your SQL Agent job to retry at 2am and you show up to work with double-counted revenue figures.

-- Non-idempotent (the common first attempt)
INSERT INTO dw.FactSales
SELECT * FROM staging.SalesLoad;

-- Idempotent: clear and reload the target window
DELETE FROM dw.FactSales
WHERE  SaleDate = CAST(GETDATE() AS DATE);

INSERT INTO dw.FactSales
SELECT * FROM staging.SalesLoad
WHERE  SaleDate = CAST(GETDATE() AS DATE);

-- Or with MERGE for keyed incremental loads
MERGE dw.FactSales AS target
USING staging.SalesLoad AS source
ON    target.SaleId = source.SaleId
WHEN MATCHED THEN
    UPDATE SET target.Amount = source.Amount
WHEN NOT MATCHED THEN
    INSERT (SaleId, SaleDate, Amount)
    VALUES (source.SaleId, source.SaleDate, source.Amount);

This is a small design decision. It takes maybe ten extra minutes the first time you think it through. And it will save you hours of debugging in the months after delivery.

The Mindset Shift

If you think of the pipeline as the product, you optimize for completion. You're done when the packages run clean. If you think of the pipeline as infrastructure, you optimize for what happens after delivery. You're not done until it can survive the first six months of production life on its own.

That's a harder standard. It requires more upfront thinking. It requires conversations with stakeholders about failure modes and retry logic that nobody particularly enjoys having.

But it's the difference between delivering a system and delivering a project. Systems are still working three years later. Projects are the things you get called back to fix.

If you've navigated this — figured out how to design data pipelines that behave more like infrastructure — I'd love to compare notes. As always, I'm here to help.

The Pipeline Isn't the Product

Shannon Lowder

What the Client Actually Wanted

Infrastructure Needs to Behave Like Infrastructure

A Concrete Example: Idempotency

The Mindset Shift

Read more

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

You Don't Need Fable. You Need a Router.

DAIS 2026: Genie One and the Context Problem Databricks Is Solving