Start With What You Want It To Do

Shannon Lowder

15 Mar 2014 — 3 min read

Every data project I have ever been on started with someone telling me what they needed the system to do. Not how — what. "We need to load customer orders from the source system into the warehouse by 6 a.m. every day." "We need to flag records that failed validation so the ops team can review them." "We need to be able to reprocess any day's data if the source sends a correction."

That is the outcome. The system that delivers it does not exist yet. Your job is to decompose it into pieces small enough to build.

This is the reverse of the last two posts. We built up from small reliable units. Now we are going to start from the top and work down — the same instinct behind breaking an epic into user stories into tasks, applied directly to a data pipeline project.

Start With the Outcome Statement

Write it down in plain language. One sentence. No implementation detail.

Load validated customer orders from SourceSystem into Warehouse.Orders daily by 6 a.m., with the ability to reprocess any prior date on demand.

That sentence is your epic. Everything else is decomposition.

Break It Into Stories

Stories answer: what does the system need to be able to do? Not how it does it — what it does.

Extract raw orders from SourceSystem for a given date
Validate that extracted orders meet quality rules
Transform validated orders into the Warehouse schema
Load transformed orders into Warehouse.Orders
Run the full pipeline for a scheduled date
Reprocess any prior date without duplicating records
Surface failures with enough context to diagnose them

Seven stories. Each one is a capability the finished system must have. Notice that none of them say "stored procedure" or "SSIS package" or "scheduled job." The implementation is not the story.

Break Each Story Into Tasks

Tasks answer: what do you need to build to deliver that story? Now the implementation enters.

Take "Extract raw orders from SourceSystem for a given date":

Create Staging.RawOrders table with the source schema plus an ExtractedAt audit column
Write dbo.ExtractRawOrders @BatchDate DATE — inserts from SourceSystem, filters deleted records
Test: run for a known date, verify row count matches source

Three tasks. The first two are buildable in an afternoon. The third tells you when you are done.

Do this for every story. You now have a complete task list for the pipeline, derived directly from the outcome statement — not from someone's intuition about what the system should look like.

The Shape You End Up With

When you map those tasks back to code, you will notice something: you have naturally arrived at the layered structure from the previous two posts. Extract, validate, transform, load — each is a unit with one job. The pipeline coordinator — the story "Run the full pipeline for a scheduled date" — is the layer that calls them in sequence.

This is not a coincidence. The decomposition from the top and the composition from the bottom converge on the same shape, because both are following the same principle: one thing, done well, with a clear boundary.

Reprocessing Is a Story, Not an Afterthought

Notice that "Reprocess any prior date without duplicating records" is a first-class story, not a feature someone asks for six months after launch. Naming it early forces the design question: how does ExtractRawOrders behave when rows for that date already exist in staging?

Answer it now, in a task:

Add a DELETE FROM Staging.RawOrders WHERE BatchDate = @BatchDate at the top of ExtractRawOrders so reruns are idempotent

One task. Solves the story. Does not change any other unit.

The Gotcha: Stories That Are Already Tasks

The decomposition breaks down when stories are written at the wrong altitude. "Write a stored procedure to extract orders" is a task masquerading as a story. It describes implementation, not capability.

If you find yourself writing stories that already specify a table name, a procedure name, or a technology, you have skipped a level. Back up. What does the system need to be able to do? Write that first. The procedure is a task that delivers the story.

Keeping the altitude right is what lets you swap implementation without rewriting requirements — and in data engineering, you will swap implementation more than you expect.

The Series in One Paragraph

Build small, reliable units. Coordinate them with layers. Start from the outcome you need and decompose down to the tasks that deliver it. The top-down decomposition and the bottom-up composition arrive at the same architecture, because they are both expressions of the same idea: one thing, done well, with a clear boundary. That is the whole game.

If you have been using this kind of decomposition on your data projects and found a way to make it stick with your team, I would love to hear how. As always, I am here to help.

Start With What You Want It To Do

Shannon Lowder

Start With the Outcome Statement

Break It Into Stories

Break Each Story Into Tasks

The Shape You End Up With

Reprocessing Is a Story, Not an Afterthought

The Gotcha: Stories That Are Already Tasks

The Series in One Paragraph

Read more

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

You Don't Need Fable. You Need a Router.

DAIS 2026: Genie One and the Context Problem Databricks Is Solving