Delta Lake Performance: Five Table Properties Worth Setting Before You Go to Production

Five table properties, applied consistently, turn a Delta table that's "working" into one that performs well over time. Most teams discover them through performance problems — the table was fine for six months and then something changed. The changes I'm describing here prevent most of those problems, and they take about five minutes to apply.

1. Auto-Optimize Write

ALTER TABLE silver.orders
SET TBLPROPERTIES ('delta.autoOptimize.optimizeWrite' = 'true')

Before writing, Delta automatically coalesces small shuffle partitions into larger output files. Without it, a job with 200 shuffle partitions writes 200 output files, many of which may be tiny. With it, the output is consolidated into fewer, appropriately-sized files without you having to run OPTIMIZE afterward.

There's a slight write overhead (the consolidation takes time), but the read performance benefit outweighs it for tables that are read frequently.

2. Auto-Compact

ALTER TABLE silver.orders
SET TBLPROPERTIES ('delta.autoOptimize.autoCompact' = 'true')

After writes, Delta compacts small files in the background. Less aggressive than a full OPTIMIZE (doesn't do data ordering), but keeps file counts manageable without requiring a scheduled maintenance job. Right choice for tables that get frequent small appends (streaming consumers, hourly incremental loads).

3. Log Retention

ALTER TABLE silver.orders
SET TBLPROPERTIES ('delta.logRetentionDuration' = 'interval 30 days')

Default is 30 days. For production tables where you might need to investigate data issues going back weeks, this is fine. For compliance or audit requirements that go back months, increase it. For development tables you're iterating quickly on, you can decrease it. Set it explicitly so the behavior is documented in the table properties rather than depending on workspace defaults.

4. Data Skipping Column Stats

-- By default, Delta collects stats on the first 32 columns
-- For tables with many columns where your queries only filter on a few,
-- specify which columns matter for data skipping
ALTER TABLE silver.orders
SET TBLPROPERTIES ('delta.dataSkippingNumIndexedCols' = '5')

Delta collects min/max/null statistics per file for the first N columns to enable data skipping. Collecting stats on 32 columns for a 200-column table is wasteful if your queries only filter on order_date and region. Reduce it to the columns you actually filter on to speed up write time and reduce metadata overhead.

5. Target File Size for OPTIMIZE

ALTER TABLE silver.orders
SET TBLPROPERTIES ('delta.targetFileSize' = '134217728')  -- 128MB in bytes

When you run OPTIMIZE, this is the target size for output files. The default (128MB) is usually correct, but if your queries typically process 1–2 files at a time, larger files (256MB or 512MB) reduce the number of files scanned. If your queries are highly selective and read small fractions of the table, smaller target sizes improve predicate pushdown efficiency.

Apply all five to new tables at creation time, not retroactively when you're already troubleshooting. The 30-second setup cost is negligible; the maintenance cost of not doing it accumulates quietly. As always, I'm here to help.

Read more