MQTT relies on long-lived TCP connections to push messages efficiently in both directions without polling. That design works well when connections behave the way TCP says they will, and breaks down badly when they do not. On mobile, cellular, and satellite links, a connection can quietly stop carrying data while still looking, to both sides, as if it is open. Without a mechanism to detect this, a broker would keep “delivering” messages to a client that has been off the network for an hour, and a client would keep waiting for messages that are being silently discarded.
The keep alive mechanism is MQTT’s answer to that. It is a simple, lightweight contract between client and broker: stay in regular contact, or be considered gone. This article explains the half-open connection problem keep alive addresses, the PINGREQ and PINGRESP packets that implement it, the timing rule the broker uses to decide a client has vanished, how to tune the keep alive interval, and the related client take-over mechanism that handles the case where a stale connection is still hanging around when the client reconnects. It is a technical reference; the connection handshake and Last Will and Testament have their own dedicated articles in this category.
Table of Contents
Keep alive at a glance
| Aspect | Detail |
|---|---|
| Purpose | Detect dead or half-open connections promptly |
| Set by | Client, in the CONNECT packet, in seconds |
| Maximum value | 65,535 seconds (just over 18 hours) |
| Special value | 0 disables the mechanism entirely |
| Packets used | PINGREQ (client → broker), PINGRESP (broker → client) |
| Broker’s detection threshold | Up to 1.5× the keep alive interval without a packet |
| What triggers when client is declared gone | Connection closed; will published if registered |
| MQTT 5 addition | Server Keep Alive in CONNACK can override the client’s value |
The half-open connection problem
MQTT is built on TCP, which is meant to be a reliable, ordered, and error-checked byte stream. In theory, TCP notifies you when a socket breaks. In practice, it often does not, especially on the kinds of networks MQTT is designed for.
The pathological case is the half-open connection: a state in which one side of the link has lost the connection without the other side being told. The functioning side keeps sending data and waiting for acknowledgements, while its writes go nowhere and its reads return nothing. The connection looks alive at the socket level; it is just no longer carrying any traffic.
This problem is far worse on mobile and satellite networks, where intermediate equipment frequently terminates TCP sessions at each hop and reassembles them on the other side. As Andy Stanford-Clark, MQTT’s inventor, put it:
Although TCP/IP in theory notifies you when a socket breaks, in practice, particularly on things like mobile and satellite links, which often “fake” TCP over the air and put headers back on at each end, it’s quite possible for a TCP session to “black hole”, i.e. it appears to be open still, but in fact is just dumping anything you write to it onto the floor.
The “black hole” phrase captures the failure mode exactly. The broker thinks the connection is healthy and keeps trying to push messages. The client thinks the connection is healthy and waits for inbound traffic. Neither has any indication that the line between them has gone silent. Without a higher-level mechanism, this state could persist indefinitely. Keep alive is that mechanism.
MQTT keep alive is not TCP keepalive
A point of confusion worth clearing up before going further: MQTT keep alive and TCP keepalive are different mechanisms, despite the similar name, and many engineers conflate them. TCP keepalive is an optional feature of the TCP stack itself, controlled at the operating-system level (often with very long default intervals, on the order of two hours), that sends empty TCP segments to probe whether the connection is still alive at the transport layer. MQTT keep alive is an application-layer mechanism defined by the MQTT protocol, controlled by the client, exchanging actual MQTT control packets (PINGREQ and PINGRESP) to confirm liveness at the protocol layer.
The two are independent and solve overlapping but distinct problems. MQTT keep alive is the one that matters for the half-open scenario described above, because it is the one MQTT clients and brokers actually use and configure, and it operates on the right time scale for IoT use. TCP keepalive can supplement it but should not be confused with it; when an MQTT article talks about “keep alive,” it nearly always means the MQTT mechanism unless it explicitly says otherwise.
How keep alive works
The principle behind keep alive is straightforward: the two sides agree that they must exchange a packet at least every so often, regardless of whether there is application traffic to send. If that schedule is missed, the silent side is presumed gone.
When the client connects, it tells the broker a keep alive interval in seconds, as part of the CONNECT packet. This interval defines the maximum time the broker and client are allowed to go without exchanging a packet. The MQTT 3.1.1 specification puts the obligation squarely on the client:
The Keep Alive is the maximum time interval that is permitted to elapse between the point at which the Client finishes transmitting one Control Packet and the point at which it starts sending the next. It is the responsibility of the Client to ensure that the interval between Control Packets being sent does not exceed the Keep Alive value. In the absence of sending any other Control Packets, the Client MUST send a PINGREQ Packet.
In plain terms: as long as the client is sending packets often enough (publishes, subscribes, anything), keep alive is satisfied implicitly. If the client has nothing else to say, it must send a PINGREQ packet to fill the gap. The broker, for its part, considers a silent client to have potentially gone away once the interval has been exceeded.
This design means keep alive imposes almost no overhead in busy systems, because real traffic already satisfies it, and only kicks in for clients that have nothing else to send. The bare PINGREQ and PINGRESP exchanged when nothing else is happening are exactly two bytes each.
The PINGREQ and PINGRESP packets
Keep alive uses two of the simplest packets in the protocol, both with no payload at all.
PINGREQ is sent by the client to indicate that it is still alive. If the client has not sent any other packet within the keep alive interval, it must send a PINGREQ. It can also send a PINGREQ at any other moment when it wants to confirm that the network connection is still working, for example after waking from a low-power state. PINGREQ carries no payload; its purpose is purely to be a packet, evidence that the connection is still carrying traffic.
PINGRESP is the broker’s response. When the broker receives a PINGREQ, it must reply with a PINGRESP to show that it is still available. PINGRESP also carries no payload. The round trip costs each side a single packet with two bytes of MQTT overhead (the fixed header), one of the cheapest pieces of network traffic you will see anywhere.
The two packets together let each side confirm the other’s presence. The client knows the broker is alive when it sees the PINGRESP; the broker knows the client is alive when it receives any packet, PINGREQ or otherwise.
The 1.5× detection rule
The number that most often surprises people about keep alive is not the keep alive value itself but the threshold the broker uses to declare a client gone. The specification gives the broker a grace period:
If the Keep Alive value is non-zero and the Server does not receive a Control Packet from the Client within one and a half times the Keep Alive time period, it MUST disconnect the Network Connection to the Client as if the network had failed.
That 1.5× rule is what determines how quickly a silent client is actually detected. With a keep alive of 60 seconds, the broker will keep the connection alive for at most 90 seconds of silence before declaring it dead, not 60. With a keep alive of 30 seconds, the maximum silence is 45 seconds. The 1.5× factor is a maximum the broker is required to enforce; in practice many brokers detect failures sooner when the operating system reports them, but they will not be slower than this.
Likewise, the client is expected to close the connection if it does not receive a PINGRESP (or any other packet) from the broker within a reasonable amount of time after sending its PINGREQ. The specification is less prescriptive on the client side, leaving the exact timeout to the implementation, but the spirit is the same: a side that stops hearing back should not wait forever.
When the broker disconnects a silent client this way, it closes the network connection and, if the client had registered a will message, publishes that will. The keep alive timeout is in fact the most common path by which a will gets published, because most ungraceful disappearances are silent ones detected via keep alive rather than reported via TCP errors. The Last Will and Testament article covers the will side of this in detail.
Choosing a keep alive value
Setting a keep alive value is a trade-off between detection latency and traffic overhead, and the right value depends on the network and the application.
A shorter keep alive means faster detection of dead connections. If you need the system to react quickly when a client drops (for example, because the will message must fire promptly), the keep alive interval is the primary lever. Keep alive of 30 seconds gives at most 45 seconds of post-disconnect silence; keep alive of 15 seconds gives at most 22.5 seconds. The cost is more frequent PINGREQ traffic when the client is otherwise quiet, which is usually negligible in absolute terms but can be measurable on metered or battery-constrained links if there are many clients.
A longer keep alive reduces traffic but slows detection. A keep alive of 10 minutes means the broker can take up to 15 minutes to notice that a client has dropped, which is too long for most status-tracking use cases. On the other hand, a constrained device on a metered cellular plan, sending a reading every hour, may legitimately want a long keep alive to avoid waking the radio just to send a ping.
Two boundaries are worth knowing. The maximum keep alive value is 65,535 seconds, just over 18 hours. The special value 0 disables the keep alive mechanism entirely, with the consequence that the broker will not time out a silent client at all on its own. Setting keep alive to 0 is rarely the right choice; it removes the protocol’s safety net on the assumption that something else, lower in the stack, will eventually report a broken connection, which is precisely the assumption that keep alive exists because it cannot be trusted.
A client can also tune its keep alive based on conditions. For example, an MQTT client on a cellular link might use a shorter keep alive while its signal is good and a longer one when it weakens, or it might pick a value based on the latency it has been measuring. The flexibility is there; using it well is a matter of knowing the network the client is on.
What counts as a packet for keep alive
A subtle but useful point: the specification says the broker is satisfied if it receives “any Control Packet” from the client within the keep alive window, not just a PINGREQ. That means anything the client sends (a PUBLISH, a SUBSCRIBE, an UNSUBSCRIBE, a PUBACK or other QoS acknowledgement) resets the broker’s keep alive timer for that client. A busy client almost never sends bare PINGREQs at all, because its normal application traffic already keeps the connection alive implicitly. The PINGREQ exists only to fill silence; it is the fallback for clients with nothing else to say.
This has two practical implications. First, you do not need to artificially keep a client “active” if it is already publishing or acknowledging messages frequently; the protocol already does. Second, in the direction the specification cares about (client to broker), only packets from the client count. PINGRESPs and live messages flowing from the broker to the client do not satisfy the broker’s keep alive timer, because the broker is checking whether the client is alive. Conversely, a client checking whether the broker is alive watches for any packet flowing the other way, including PUBLISHes delivered by the broker.
Keep alive and sleeping or low-power devices
Battery-powered devices often want to sleep between messages, sometimes for minutes or hours, to extend battery life. Keep alive complicates this because a sleeping device cannot send PINGREQ packets, and if the keep alive interval elapses while it is asleep, the broker concludes the device is gone and closes its connection.
There are two basic strategies for handling this within plain MQTT, neither of them perfect:
- Set keep alive comfortably longer than the sleep period. If the device sleeps for 5 minutes between publishes, a keep alive of 10 or 15 minutes ensures the broker will not time it out, at the cost of the broker also being slow to detect a real failure. The device’s regular publishes serve as the keep alive traffic; PINGREQ is not needed in normal operation.
- Disconnect before sleeping and reconnect on wake. The device sends a DISCONNECT, lets the broker close cleanly, sleeps without holding any connection, and reconnects when it has something to publish. This is cheaper on broker resources for long sleeps and avoids the timing trade-off entirely, but it costs a full reconnect (TCP, TLS if used, MQTT CONNECT, plus re-subscribing unless using a persistent session) on every wake. For long sleeps and short bursts of activity, this is usually the right pattern.
Neither approach is ideal for every device. The first pattern keeps the session permanently open but stretches the detection latency for real failures; the second is responsive to failures but costs connection overhead on every cycle. The right choice depends on how often the device wakes, how long it sleeps, how quickly the system needs to react to failures, and how much battery the connection setup costs relative to staying connected. MQTT-SN, a separate variant designed for very constrained networks, addresses sleeping devices more directly, but that is outside the scope of plain MQTT covered here.
MQTT 5: Server Keep Alive
MQTT 3.1.1 lets only the client set the keep alive value. The broker honors whatever the client asks for, or rejects the connection. This is sometimes inconvenient for operators, who may want to enforce a minimum keep alive across all clients (to control detection latency) or a maximum (to limit traffic from very busy clients).
MQTT 5 addresses this with the Server Keep Alive property, which the broker can include in its CONNACK. When present, this value overrides the keep alive the client proposed in CONNECT, for that connection. The client uses the server’s value for the rest of the session. The mechanic is small but useful: it lets the broker enforce a deployment-wide policy on keep alive without forcing every client to be reconfigured to match.
When the broker accepts the client’s proposed value as-is, it simply omits Server Keep Alive from the CONNACK, and the original CONNECT value applies. Clients should always read Server Keep Alive from the CONNACK in MQTT 5 sessions and use the server’s value if one is provided, rather than assuming their proposed value won.
Client take-over
A different but related problem appears when a client tries to reconnect while the broker still holds an apparently-alive but actually-dead connection for the same ClientId. This can happen when a client drops ungracefully but the broker has not yet noticed (for example, mid-way through the keep alive grace period), and the client reconnects from elsewhere or after rebooting. From the broker’s point of view, it now has two connections claiming to be the same ClientId.
MQTT’s response is the client take-over: when a new connection arrives with a ClientId that already has an existing connection on the broker, the broker closes the existing connection and accepts the new one. The new connection effectively takes over the ClientId. This behavior is what guarantees that a returning client can always reconnect, even when the broker has not yet caught up with the fact that the previous connection is dead.
A few aspects of take-over are worth understanding clearly:
- It is keyed on ClientId. The broker decides who is “the same client” by comparing ClientIds. Two clients that share a ClientId will repeatedly take each other over, fighting for the slot indefinitely. This is one of the most common diagnoses for the symptom “my client keeps getting disconnected”: somewhere there is another client using the same ClientId.
- The old connection is closed by the broker. From the perspective of whoever held the previous connection, it just received a disconnect from the broker. If that previous connection actually was a still-alive client, this is how it learns it has been replaced.
- Take-over interacts with the will. Closing the old connection counts as the broker closing the network connection, which the specification treats like the network failing. In MQTT 3.1.1, this broker-forced closure of the previous connection generally results in the previous session’s will being published, though some brokers suppress the will during deliberate take-over handling. MQTT 5 gives finer control here, including the Will Delay Interval covered in the Last Will and Testament article. The exact behavior is worth verifying on the broker you are running, not assumed.
- The new connection chooses its own session behavior. Whether the new connection inherits the previous session’s subscriptions and queued messages depends on its clean session (MQTT 3) or clean start (MQTT 5) flag, exactly as for any other connect. Take-over does not change the session rules; it just resolves the conflict over the ClientId.
The take-over mechanism is what allows MQTT to recover cleanly from cases where keep alive has not yet timed out a dead connection. Rather than forcing the returning client to wait until the broker eventually realizes its predecessor is gone, the new connection simply asserts the ClientId and the broker yields.
A worked example of keep alive in action
To make the timing concrete, suppose a client connects with a keep alive of 60 seconds and is otherwise quiet (no publishes, no subscribes).
- T+0s: Connection established with keep alive = 60s.
- T+50s: The client is approaching the keep alive interval. Its library sends a PINGREQ.
- T+50s (a moment later): The broker receives the PINGREQ, replies with a PINGRESP. Both sides have confirmed presence.
- T+110s: A second PINGREQ goes out around 60 seconds after the first, and is answered.
- T+170s: The network drops. The client cannot send anything. The broker has not yet noticed.
- T+230s: Around 60 seconds after the last PINGREQ, the client’s library tries to send the next PINGREQ. It fails (or appears to succeed at the socket level but in fact goes into a black hole). The client’s library starts its own timeout for the missing PINGRESP.
- T+260s (90 seconds after the broker’s last received packet at T+170s, which was the PINGRESP to the second PINGREQ): the broker hits 1.5× keep alive with no packet from the client. It closes the connection and publishes the will if one was registered.
The timing is approximate (libraries differ in exactly when they send PINGREQ, often slightly earlier than the deadline to allow for RTT), but the shape is real: from the moment of the network failure to the broker recognizing it, somewhere between one and 1.5× the keep alive interval elapses. Tuning keep alive moves that delay proportionally.
How keep alive connects to the rest of MQTT
Keep alive sits at the foundation of MQTT’s reliability story and interacts with several other features:
- The CONNECT packet carries the keep alive value, as covered in the connection handshake article. In MQTT 5 the CONNACK can override it with Server Keep Alive.
- PINGREQ and PINGRESP are two of MQTT’s 15 control packet types, the simplest packets in the protocol.
- The Last Will and Testament mechanism relies on keep alive for detection: most ungraceful disconnects are silent ones recognized only when the keep alive interval expires.
- Client take-over handles the case where a stale connection still appears alive to the broker when its owner reconnects, and is closely related to ClientId discipline in the connection handshake article.
- MQTT 5 Server Keep Alive lets the broker enforce a keep alive policy across clients, covered in the MQTT 5 articles.
Used deliberately, keep alive is one of the cheapest mechanisms in MQTT and one of the most important: it is what turns the protocol’s long-lived connections from a liability on unreliable links into the strength they are designed to be.
Frequently asked questions
What is the MQTT keep alive interval? It is the maximum time, in seconds, that the client and broker are allowed to go without exchanging a packet. The client sets it in the CONNECT packet. If no application traffic has been sent in that interval, the client must send a PINGREQ to keep the connection alive.
What are PINGREQ and PINGRESP? They are the keep alive packets. PINGREQ is sent by the client to confirm it is still alive; PINGRESP is the broker’s reply. Both carry no payload and consist of just the MQTT fixed header.
How long before the broker decides a client is gone? Up to 1.5× the keep alive interval without receiving any packet from the client. With a keep alive of 60 seconds, that is up to 90 seconds. Many brokers detect failures faster when the OS reports them, but they will not be slower than this.
What happens when the broker disconnects a silent client? It closes the network connection and publishes the client’s will message if one was registered. This is the most common path by which a will is published, since most ungraceful disconnects are silent.
What value should I set for keep alive? A shorter interval gives faster detection at the cost of more frequent PINGREQ traffic when the client is quiet; a longer interval reduces traffic but delays detection. Match the value to how quickly you need to detect a lost client. Values from 30 seconds to several minutes are common; the maximum is 65,535 seconds (about 18 hours), and 0 disables the mechanism.
What is client take-over? When a new connection arrives with the same ClientId as an existing connection, the broker closes the existing connection and accepts the new one. This lets a returning client reconnect cleanly even if the broker has not yet noticed its previous connection is dead. In MQTT 3.1.1, broker-forced closure of the previous connection generally results in the previous session’s will being published, though some brokers suppress the will during deliberate take-over handling.
What changes in MQTT 5? MQTT 5 adds a Server Keep Alive property in the CONNACK, which lets the broker override the client’s proposed keep alive value for that connection. This enables operators to enforce a deployment-wide keep alive policy without reconfiguring every client.
