Delta Lake Concurrent Writes: Optimistic Locking in Practice
Delta Lake's optimistic concurrency model means most concurrent write conflicts resolve automatically. But "most" isn't "all," and the cases that don't resolve automatically tend to happen at the worst possible time — two high-priority jobs running simultaneously during a critical load window.
Understanding when conflicts are avoidable (by design) and when they're unavoidable (by data structure) is what lets you architect around them rather than debugging them in production.
What Triggers a Conflict
A write conflict occurs when two transactions both attempt to commit at the same version. Delta uses optimistic concurrency: both start reading at version N, both do their work, and both try to commit at version N+1. One wins; one gets a conflict error and must retry.
Whether Delta can auto-resolve the conflict depends on whether the two operations touched overlapping data:
- Two appends to different partitions — no overlap, Delta resolves automatically
- Two appends to the same partition — no overlap (both just adding rows), Delta resolves automatically
- A MERGE and a concurrent write to the same rows — conflict, one transaction fails
- Two concurrent MERGEs on the same table with overlapping keys — conflict, one fails
- A DELETE and a concurrent MERGE on the same rows — conflict, one fails
Designing to Avoid Conflicts
The most effective way to avoid write conflicts is to design your pipeline so concurrent jobs write to non-overlapping partitions:
# Jobs partitioned by region process their own partition
# and never conflict with each other
west_df.write.format("delta").mode("append") \
.option("replaceWhere", "region = 'West'") \
.save("/mnt/myproject/silver/orders")
# Concurrent with the above — different partition, no conflict
east_df.write.format("delta").mode("append") \
.option("replaceWhere", "region = 'East'") \
.save("/mnt/myproject/silver/orders")Handling Conflicts With Retry Logic
When conflicts are unavoidable (e.g., multiple jobs all updating the same set of rows), implement retry logic in the job:
import time
from delta.exceptions import ConcurrentModificationException
def merge_with_retry(target, source_df, merge_condition, max_retries=3):
for attempt in range(max_retries):
try:
(target.alias("t").merge(source_df.alias("s"), merge_condition)
.whenMatchedUpdateAll()
.whenNotMatchedInsertAll()
.execute())
return
except ConcurrentModificationException:
if attempt == max_retries - 1:
raise
wait_seconds = 2 ** attempt # exponential backoff: 1s, 2s, 4s
print(f"Conflict on attempt {attempt+1}, retrying in {wait_seconds}s")
time.sleep(wait_seconds)The Isolation Level Choice
If you're running an environment where reads need to be fully serializable (reads that feed into subsequent decisions must see a consistent view of all concurrent writes), change the table's isolation level:
ALTER TABLE silver.orders
SET TBLPROPERTIES ('delta.isolationLevel' = 'Serializable')This is stronger than the default WriteSerializable but increases conflict rates. Use it only when you have a specific correctness requirement that demands it — not as a general "be safe" setting. The default is fine for the vast majority of data pipeline use cases. As always, I'm here to help.