If you are doing data engineering in an Azure shop and you have been eyeing Kafka, Microsoft has an answer for you: Azure Event Hubs. It shipped into general availability earlier this year and it is worth understanding both what it is and where it differs from the open-source alternative.
The pitch is familiar: a scalable, durable event ingestion service. You publish messages, subscribers consume them, the service handles the infrastructure. The difference is that Event Hubs is fully managed — no Zookeeper to run, no broker configuration to tune, no cluster to resize manually.
The Core Model
Event Hubs uses a partition model that closely mirrors Kafka. Each Event Hub (the Azure term for what Kafka calls a topic) has a configurable number of partitions. Events are distributed across partitions by partition key, exactly as in Kafka. Consumers track their position using offsets per partition.
Instead of consumer groups (the Kafka term), Event Hubs calls them consumer groups too — the concept maps directly. Each consumer group gets an independent view of the event stream and tracks its own offsets.
Retention is configurable from 1 to 7 days. Unlike Kafka, you cannot extend this arbitrarily — 7 days is the maximum for standard tier. If you need longer retention, you need to capture events to Azure Blob Storage or ADLS using the Event Hubs Capture feature, which writes Avro files on a configurable schedule.
Throughput Units: The Capacity Model
Kafka capacity is about broker count, partition count, and disk. Event Hubs capacity is about throughput units (TUs). One TU gives you 1 MB/s ingress and 2 MB/s egress. You can have up to 20 TUs on standard tier (more on premium/dedicated).
# Azure CLI: create an Event Hubs namespace and hub
az eventhubs namespace create --resource-group analytics-rg --name mycompany-events --location eastus --sku Standard --capacity 2 # 2 throughput units
az eventhubs eventhub create --resource-group analytics-rg --namespace-name mycompany-events --name order-placed --partition-count 8 --message-retention 7 # days
Throughput units auto-inflate up to a configured maximum if you enable Auto-Inflate, which saves you from manually scaling during traffic spikes. This is one of the legitimate managed-service advantages over self-hosted Kafka, where a spike that exceeds broker capacity requires a human to notice and respond.
Sending and Receiving With the .NET SDK
// Producer (Azure.Messaging.EventHubs SDK)
var producer = new EventHubProducerClient(connectionString, "order-placed");
var batch = await producer.CreateBatchAsync(new CreateBatchOptions {
PartitionKey = orderId.ToString() // route by order ID
});
var eventData = new EventData(Encoding.UTF8.GetBytes(
JsonSerializer.Serialize(new { order_id = orderId, total = 129.99 })
));
batch.TryAdd(eventData);
await producer.SendAsync(batch);
// Consumer (EventProcessorClient — the managed consumer with checkpointing)
var processor = new EventProcessorClient(
checkpointStore, // Azure Blob Storage for offset tracking
"$Default", // consumer group
connectionString,
"order-placed"
);
processor.ProcessEventAsync += async args => {
var raw = args.Data.Body.ToArray();
await rawZone.WriteAsync(args.Data.Offset, args.Data.PartitionId, raw);
await args.UpdateCheckpointAsync(); // commit offset to blob storage
};
await processor.StartProcessingAsync();
Event Hubs vs. Kafka: The Decision Points
Event Hubs wins when: you are all-in on Azure and want zero infrastructure management, your throughput fits within the TU model, and you do not need more than 7 days of retention in-stream. The managed service overhead (TU pricing) is worthwhile if it replaces dedicated engineering time for cluster operations.
Kafka wins when: you need longer in-stream retention, you need the ecosystem (Kafka Connect, Kafka Streams, Schema Registry), you are running multi-cloud or on-premises, or your volume is high enough that the per-TU pricing becomes more expensive than running your own cluster. Kafka also gives you more control over exactly how messages are routed, compacted, and retained.
Neither is wrong. They solve the same problem with different operational tradeoffs. I have used both, and the deciding factor is almost always "how much do you want to own?" If your team's strength is application code and not infrastructure, the managed service is worth the premium. If you have solid ops capability and the scale to justify it, self-hosted Kafka gives you more room to optimize. I am here to help if you want to think through the math for your specific situation.