Delta Lake + Structured Streaming: ACID for Your Kafka Consumer
Writing a Spark Structured Streaming job that reads from Kafka and writes to Parquet files sounds straightforward until you watch it crash mid-write and leave your output directory in an ambiguous state. Did it write all the records for that micro-batch? Half of them? None? The checkpoint committed, but the