MQTT to Kafka: Bridging IoT Devices Into Your Enterprise Pipeline

Your IoT devices speak MQTT. Your analytics pipeline speaks Kafka. The bridge between them is not complicated, but there are enough configuration choices that it is worth walking through a concrete setup.

The goal: every message published to your MQTT broker gets written as a Kafka message, with the MQTT topic mapped to a Kafka topic. From there, your existing Kafka consumers handle the rest — raw landing, deserialization, transformation — without needing to know or care that the data came from a sensor rather than an application.

Option 1: Mosquitto Bridge Configuration

If you are running Mosquitto as your MQTT broker, it has built-in bridge support. A bridge is a connection from one broker to another — you configure it to forward messages from MQTT topics to a remote broker. Kafka is not a native target, but you can bridge from Mosquitto to another MQTT broker that has a Kafka connector, or use Mosquitto in combination with a bridge plugin.

For simpler setups: Mosquitto can bridge to itself (useful for multi-broker environments) or to any MQTT broker. You then run a separate MQTT-to-Kafka bridge process that consumes from the MQTT broker and produces to Kafka.

Option 2: Kafka Connect MQTT Source Connector

The more robust solution for production use is a Kafka Connect source connector that subscribes to MQTT topics and writes messages to Kafka. The Confluent MQTT Source Connector (commercial) or the open-source mqtt-kafka-connector handle this pattern.

{
  "name": "mqtt-building-sensors-source",
  "config": {
    "connector.class": "io.confluent.connect.mqtt.MqttSourceConnector",
    "mqtt.server.uri": "tcp://mqtt-broker.local:1883",
    "mqtt.topics": "building/#",
    "kafka.topic": "mqtt_building_sensors_raw",
    "mqtt.qos": "1",
    "mqtt.clean.session.enabled": "true",
    "tasks.max": "1",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter"
  }
}

The mqtt.topics wildcard building/# subscribes to all topics under the building/ hierarchy. All matching MQTT messages go to the single Kafka topic mqtt_building_sensors_raw. The original MQTT topic name is preserved in the Kafka message key, so downstream consumers can route by source topic if needed.

Option 3: A Custom Bridge in Python

For full control over topic mapping, message enrichment, and error handling, a custom bridge is sometimes the right call. It is more code, but it gives you complete visibility into what is happening.

import paho.mqtt.client as mqtt
from kafka import KafkaProducer
import json
import time

producer = KafkaProducer(
    bootstrap_servers=['kafka-broker-01:9092'],
    key_serializer=lambda k: k.encode('utf-8'),
    value_serializer=lambda v: v  # raw bytes passthrough
)

def mqtt_topic_to_kafka_topic(mqtt_topic):
    # building/floor2/hvac/zone3/temp -> kafka topic: mqtt_building_sensors_raw
    # Could also do fine-grained mapping here
    return 'mqtt_building_sensors_raw'

def on_message(client, userdata, msg):
    kafka_topic = mqtt_topic_to_kafka_topic(msg.topic)
    # Wrap raw payload with routing metadata
    envelope = json.dumps({
        'mqtt_topic': msg.topic,
        'arrived_at_ms': int(time.time() * 1000),
        'payload': msg.payload.decode('utf-8', errors='replace'),
        'qos': msg.qos
    }).encode('utf-8')
    producer.send(topic=kafka_topic, key=msg.topic, value=envelope)

def on_connect(client, userdata, flags, rc):
    client.subscribe('building/#', qos=1)

client = mqtt.Client(client_id='kafka-bridge-01')
client.on_connect = on_connect
client.on_message = on_message
client.connect('mqtt-broker.local', 1883, keepalive=60)
client.loop_forever()

Topic Mapping Strategy

One MQTT topic hierarchy can produce a very large number of topics. If you have 200 sensors each publishing to their own MQTT topic (building/floor1/sensor/001/temp, etc.), you need to decide whether to map each to its own Kafka topic (200 topics) or funnel all of them into one Kafka topic with the MQTT topic preserved as the message key.

One Kafka topic per sensor is operationally painful at scale — consumer group management, partition allocation, and monitoring overhead multiply. A single Kafka topic with the MQTT topic as the message key (or embedded in the payload) is almost always simpler. Downstream consumers that need sensor-level routing can filter on the key without needing per-sensor Kafka topics.

The exception: if different sensors have wildly different throughput, retention, or access control requirements, separate topics may be worth it. But start with the simplest mapping and add complexity only when the operational need is clear. I am here to help with the specifics for your setup.

Read more