Connecting smart appliances requires a robust and lightweight protocol that facilitates efficient M2M (Machine-to-Machine) communication.
Such protocols are expected to work in conditions of low bandwidth and intermittent connectivity. The protocol implementation should also require a small code footprint to run on devices having limited computational power. See this post for further details about the desired capabilities of M2M protocols.
The MQTT (MQ Telemetry Transport) Protocol was designed nearly 15 years ago to meet such constraints and to facilitate efficient transportation of telemetry data from embedded devices. With the emergence of IoT today, this protocol has risen to prominence – MQTT is an ISO standard and the open source community offers an extensive set of SDKs and Libraries that support MQTT.
If you’re familiar with Enterprise Messaging Systems, you are already familiar with some of the essential concepts in MQTT.
A Pub-Sub Delivery Model
MQTT uses a ‘publish-subscribe’ model for message delivery. All parties interested to communicate with each other, connect to a centralized message broker. This broker acts as a mediator, and all messages are routed via the broker itself.
Parties which connect to the broker can be smart appliances, sensors, actuators, and application services.
Senders such as a temperature sensor, will publish messages (temperature values) to the broker. To ensure that messages reach only the relevant recipients, MQTT uses the concept of topics. A topic is a string that represents a virtual address. One or more recipients can indicate their interest in a topic by subscribing to it (beforehand).
A furnace controller may be interested in receiving messages that contain temperature information, but not in receiving messages having acceleration information. Whereas, a logging service may be interested in receiving all messages. In this case, topics could be ‘temperature’ and ‘acceleration’.
Each message thus represents an independent package of information and also contains the name of the ‘topic’ that this message pertains to.
When the broker receives any message, it determines the topic to which a message belongs. It then determines the recipients who are interested in that topic at that point in time. The broker is then responsible to forward the message to each of the interested recipients.
The number of subscribers attached to a topic may vary over time as the system evolves. This hub-and-spoke architecture of MQTT ensures that senders and recipients are de-coupled from each other and we have a 1-to-N communication capability between parties.
Simply put, you could think of four stages here:
Connect: All parties ‘connect’ to a message broker over a TCP connection.
Subscribe: Each party informs the broker about specific topics of its interest. Each topic is represented by a unique string.
Publish: Parties publish messages to the broker from time-to-time. Say, when a temperature sensor has sampled temperature information.
Delivery: Broker inspects the topic specified within each message and routes this message to everyone who has subscribed to that topic.
As we shall see later, there is a bit more to it – If the broker determines that an interested recipient is currently not reachable (disconnected), it may queue the messages internally for future delivery.
MQTT uses TCP/IP as the underlying transport mechanism. MQTT devices establish a persistent TCP connection with the broker at all times. This acts as a two-way point-to-point messaging channel between the device and the broker itself.
If the TCP connection is broken, the device attempts a reconnection with the broker. While the device is offline, the broker will buffer (queue) messages until that device comes back online.
Note that devices do not have any direct transport layer connectivity with each other. The broker acts as a central hub to which everybody connects.
Every MQTT message contains a ‘topic’ within the message header. The topic is the primary means of routing messages to intended recipients (subscribers). You can think of a topic as a virtual address to which a message is destined to.
- A topic is a simple string such as ‘temperature’. A sensor may be publishing messages (current temperature information) to this topic periodically.
- Topics can be hierarchically organized by specifying ‘/’ as the separator. For example, if a building has multiple sensors, they could be organized as:
- building / floor-1 / temperature
- building / floor-1 / humidity
- building / floor-2 / temperature
- building / floor-2 / humidity
- Topic strings also support wildcard characters ‘+’ and ‘#’
- A ‘+’ represents a single step within the hierarchy. And it can occur at any step within the hierarchy.
- A ‘#’ represents multiple steps within the hierarchy. And it can occur only at the end.
- For Example: A client who subscribes to ‘building/+/temperature’ would receive messages from both: ‘building/floor-1/temperature’ and ‘building/floor-2/temperature’
- For Example: A client who subscribes to ‘building/floor-1/#’ would receive messages from all topics under ‘building/floor-1/’.
Transient vs Durable Subscriptions
- When a client connects to the broker using a TCP connection. Due to network conditions, this TCP may get dropped from time-to-time, and the client could reconnect each time. Each such connection represents a temporary physical session between the two parties.
- However, the logical association between a client and its subscribed topics, could outlast these temporary session outages. This concept is called as durable subscriptions.
- In effect, the client has informed the broker: “Please keep my messages for this topic, while I’m offline. I’ll come back and pick those up later”.
- When a client connects to the broker, it can specify if this connection is Transient (Clean Session Flag = 1) or Persistent (Clean Session Flag = 0).
- If the Clean Session Flag = 1, the broker considers this to be a transient connection. If this client abruptly disconnects later, the broker would not ‘keep’ any messages on behalf of this client for the topics of interest.
- If the Clean Session Flag = 0, the broker considers this to be a durable connection. In this case, if the client abruptly disconnects later, the broker would ‘keep’ all messages, on topics subscribed by this client, with the assumption that the client would come back in the future to pick those up. When the client connects again (with Clean Session Flag = 0), the queued messages are delivered to this client.
Supporting durable connections means, the broker has to track additional ‘state’ information on behalf of the clients.
Concept of Retained Messages
- A client can publish a message to a specific topic, and flag this message to be ‘retained’ by the broker.
- The broker then delivers this message to the current subscribers of this topic and also retains this message for future use.
- In the future, when any new clients subscribe to this topic, the broker will automatically deliver this ‘retained message’ to those new subscribers right away.
This model is very useful in scenarios where a subscriber should receive the ‘last known good value’ of something. Say, a subscriber is interested to receive temperature information from a sensor and it can receive the ‘last known temperature’ from the broker (instead of waiting for the publisher to publish this information again).
Now, the publisher only has to publish temperature values to the broker if there is a change in the temperature. This approach helps reduce the network chatter and conserves energy of the publisher as well.
Will and Testament
Given the intermittent network connectivity, it would be useful if interested recipients can be automatically notified when a particular device goes offline. This capability is achieved using a ‘Last Will & Testament’ (LWT) as follows:
- When a device connects to a broker, it informs the broker about it’s ‘Last Will’, and the broker remembers this information for the future.
- Later, if that device abruptly goes offline, the broker automatically dispatches this ‘Will’ (message) to any interested parties who may have previously subscribed to this.
LWT is thus a useful way of notifying all interested parties when an IoT appliance goes offline.
For example: When a security camera abruptly disconnects, an interested Service may receive the LWT message from the broker, and this Service can further send an SMS notification to the home owner.
Keep Alive Messages
Sometimes client devices abruptly crash, or have an abrupt network disconnection. This can often result in a half-open TCP connections on the broker – The broker continues to think that it has an active TCP socket with this client.
The ‘keep alive’ is introduced as a timeout mechanism by which parties can determine if the connection is still alive.
- During the establishment of the connection, the client specifies the ‘keep alive’ duration to the broker. The broker remembers this value.
- The ‘keep alive’ interval is the longest duration of time that the client and broker can endure without exchanging any message between themselves.
- The broker maintains a ‘keep alive’ timer for each client:
- If a broker does not receive any publish messages from the client for a duration of 1.5 X the ‘keep alive duration’, it assumes that the client has disconnected.
- Upon receipt of a message from the client within the ‘keep alive duration’, the broker resets the ‘keep alive’ timer for that client.
- However, if the client does not really have any new information to publish within the ‘keep alive’ interval, it can simply publish a PINGREQ message to the broker, and receives a PINGRESP back. This serves as a heartbeat mechanism between the two parties.
- It is the responsibility of the client to keep publishing messages within the ‘keep alive’ interval.
- If the ‘keep alive’ threshold for a client is exceeded, the broker will do the following:
- Forcibly close the TCP connection with this client.
- If the client had specified any LWT (Last Will and Testament) that will be Published to all interested recipients.
- If the client has a durable subscription, it will retain all QoS 1 and QoS 2 messages pertaining to the client’s subscriptions until this client connects again.
Quality of Service
MQTT deals with each message as an individual package of information. What are the delivery guarantees for a message to reach its intended recipients? MQTT brokers offer three QoS levels as explained below:
Level 0: At Most Once Delivery: There is no guarantee of message delivery by the broker. This needs minimal overhead, but applications need to be designed with the assumption that a message may not be delivered as intended.
Level 1: At Least Once Delivery: There is a guarantee that a message will be delivered to each listener at least once (but it could be more than once). In this case, the handling of the message on the recipient needs to be idempotent – since a particular message may be received more than once.
Level 2: Exactly Once Delivery: There is a guarantee that a message will be delivered to each interested listener exactly once. Ensuring this requires additional overhead in the broker.
Getting Started with MQTT
MQTT Version 3.1.1 is the latest standard and is an OASIS specification today. Below are some additional references to get you started:
MQTT Cloud Brokers: Most IoT Cloud providers, such as Amazon AWS IoT, provide a device gateway which supports a scalable MQTT message broker that devices can connect to.
- Asynchronous model of communication using discrete messages (events).
- Hub-and-Spoke architecture using a centralized broker.
- Instead of creating new network standards, MQTT uses ubiquitous IP networks with persistent TCP connections as the underlying transport mechanism.
- Publish-Subscribe mechanism to decouple data producers (publishers) and data consumers (subscribers) using message queues (topics).
- Topics represent ‘virtual addresses’ to which messages get delivered.
- Low protocol overhead with a 2 byte header. Low network bandwidth.
- Supports durable connections which outlast temporary TCP sessions.
- Broker caches messages for each durable connection until the device reconnects.
- Supports multiple Quality of Service levels for message delivery.
- Supports keep alive timeout to detect if a device goes offline.
- Supports Last Will and Testament (LWT) to notify parties if a device goes offline.