One-Off Scripts Are Technical Debt in Disguise

Every data team I've worked with has a folder somewhere — usually called something like "Utilities" or "Scripts" or "Misc" — full of .sql files with names like fix_customer_ids_2011.sql and reprocess_orders_march.sql and TEMP_DO_NOT_DELETE.sql. Nobody knows which ones are safe to delete. Nobody knows which ones have been run and which haven't. Nobody knows what any of them do without opening them and reading through three hundred lines of uncommented SQL.

This is technical debt. But it's sneaky technical debt, because each individual script seemed completely reasonable when someone wrote it.

How the Folder Gets Created

It starts with a legitimate need. A source system pushed bad data. A vendor sent a corrected file. A business rule changed retroactively. You write a SQL script to fix it, run it, and it works. You save the script "just in case" and move on.

This happens again. And again. Over a year you've accumulated forty scripts. Over three years, two hundred. Most of them are irrelevant — the fix they applied is baked into the data and the scenario will never recur. Some are still relevant, addressing ongoing issues that never got properly resolved. You can't tell which is which without reading all of them.

The cost isn't the disk space. It's the mental overhead every time someone asks "did we already fix this?" or "where's that script from last April?" or "is it safe to delete these?"

The Three Fates of a Good Script

Every one-off script should end up in exactly one of three places.

Deleted. If the fix is done, the scenario is closed, and it will never recur, the script should be deleted. Not archived — deleted. You have source control; if you ever need to recover it, you can. An archive folder is just a graveyard where debt accumulates.

Operationalized. If you're running the same type of script regularly, it should be a stored procedure or a proper ETL job, not a .sql file you run manually. Manual steps are failure points. If the step is worth doing, it's worth automating.

Documented and version-controlled. If it genuinely needs to be preserved — a one-time migration, a significant data correction with regulatory implications — it goes in source control with a clear name, a comment block explaining what it does and why it was run, and a record of when it was executed.

-- Good: self-documenting one-time data correction script
-- Date:        2012-11-14
-- Author:      Shannon Lowder
-- Reviewed by: [colleague name]
-- Reason:      Vendor data feed duplicated order records for 2012-10-01 through 2012-10-15
--              due to a bug in their export process. Confirmed with vendor 2012-11-13.
--              This script removes duplicate records identified by the audit query below.

BEGIN TRANSACTION;

DELETE FROM staging.OrderImport
WHERE  ImportBatchId IN (
    SELECT ImportBatchId
    FROM   staging.DuplicateOrderBatches  -- pre-populated by the audit query
);

-- Verify the count before committing
SELECT COUNT(*) AS RemovedCount FROM staging.OrderImport
WHERE  ImportBatchId IN (SELECT ImportBatchId FROM staging.DuplicateOrderBatches);

-- COMMIT TRANSACTION;  -- uncomment after verifying count
ROLLBACK TRANSACTION;   -- remove this line after verifying

Notice the pattern at the bottom: the script rolls back by default. You have to deliberately change it to commit. This discipline is worth building into every data correction script you write.

The Deeper Problem

One-off scripts are symptoms of something larger: fixes made to the data instead of to the system. When you run a script to reprocess records because the ETL logic had a bug, you're treating the effect rather than the cause. The right fix was to fix the ETL logic and re-run it cleanly.

Not always possible, I know. Sometimes you're under deadline, sometimes the data correction is faster, sometimes the root cause is in a system you don't control. But every time you patch data instead of patching the system, you're accumulating obligation. Future you — or your successor — will inherit data that reflects a history of patches rather than a clean, auditable process.

The folder of scripts is the visible evidence. The discipline to keep it clean — delete what's done, operationalize what recurs, document what must be preserved — is how you pay the debt down instead of letting it compound.

If you've found a better system for managing this — a workflow, a template, a team convention that actually sticks — I genuinely want to know. As always, I'm here to help.

Read more