The Real Cost of Keeping a Streaming Consumer Running 24/7 — And When You Don't Have To

When people build their first streaming pipeline, there is a tacit assumption embedded in the word "streaming": the consumer must be running all the time. Always on. Watching for events. Reacting in real time. The picture is a process that never sleeps.

This assumption costs money. Sometimes a lot of money. And it is often wrong.

The Cost of Always-On

Let us put some numbers on it. A minimal Spark Structured Streaming job on a cloud cluster needs at least a driver and one executor. In 2017, a pair of reasonable cloud VMs running 24/7 costs roughly $300–600/month depending on instance type and provider. That is before monitoring, storage for checkpoints, and the operational cost of keeping the cluster healthy.

For a pipeline that processes 100,000 messages per hour, does that spend make sense? Maybe. If the downstream system absolutely needs those messages processed within seconds of arrival — fraud detection, inventory reservation, live dashboard — yes, you pay for always-on. The latency requirement justifies the cost.

If the requirement is "data should be available within 15 minutes," you are paying cluster-hours for a latency tolerance that a scheduled micro-batch could meet for a fraction of the cost.

The Alternative: Scheduled Structured Streaming

Spark Structured Streaming with trigger(once=True) (available since Spark 2.2) processes all currently available data and then stops. Run it on a cron schedule — every 15 minutes, every hour — and you pay only for the compute time actually used.

# This job runs, processes all available Kafka data, and exits
# Run it via cron or a job scheduler every 15 minutes

raw_query = raw_landed.writeStream     .format("delta")     .outputMode("append")     .option("checkpointLocation", "/mnt/checkpoints/sensor-raw")     .trigger(once=True)        # process current backlog and exit
    .start("/mnt/datalake/raw/sensor_readings")

raw_query.awaitTermination()    # block until the one-shot job completes
# job exits here

The checkpoint records exactly which Kafka offsets were committed. On the next run, Spark continues from where it left off. No data is reprocessed, no data is skipped. The semantics are identical to an always-on consumer; the cost model is radically different.

The Math

Let us say each micro-batch run takes 3 minutes (startup, processing, shutdown). Running every 15 minutes: 3/15 = 20% utilization. You pay for 20% of the compute you would spend on always-on. Your data is 15 minutes stale instead of seconds stale. If your SLA allows 15 minutes, you just cut your streaming infrastructure cost by 80% with zero architectural sacrifice.

Even running every 5 minutes (3 minutes to process, 2 minutes of actual idle time), you are at 60% utilization — still a meaningful savings, and your data is 5 minutes stale. For most business intelligence and analytics use cases, 5-minute freshness is indistinguishable from real-time from the user's perspective.

When You Actually Need Always-On

Some use cases genuinely require it. If you are running fraud scoring against transaction events and the window to block a fraudulent transaction is measured in seconds, you cannot run a 5-minute micro-batch. If you are powering a real-time operations dashboard that an on-call engineer is watching during an incident, stale data causes bad decisions. If your pipeline feeds a customer-facing feature with an explicit latency SLA, you honor that SLA.

The discipline is: know your actual latency requirement before you design the pipeline. Do not default to always-on because it sounds like the right approach for a "streaming" system. Ask the question: how stale is too stale for this use case? The answer will tell you which trigger interval to use — and whether a cluster needs to run all day or just a few minutes per hour.

This distinction becomes more important as cloud costs compound. I have seen teams spending $2,000/month on always-on streaming infrastructure for a pipeline that reports into a dashboard refreshed every 30 minutes. The math on that is not defensible. Start with the latency requirement. Let the architecture follow. I am here to help.

Read more