MQTT, Sparkplug B, and the Unified Namespace

The panel told the story before anyone said a word. Behind the HMI in the control room sat a managed switch, and out of it ran a fan of cables to a half-dozen converters, each one a private handshake between two boxes that were never designed to talk to each other. The PLC fed the SCADA. The SCADA fed a historian. A separate gateway scraped the same registers a second time for the MES, because nobody trusted the first path. A laptop under the desk ran a Python script that pulled CSVs off a shared drive at 2 a.m. Every one of those links was somebody's afternoon, years ago, and every one of them is a thing that breaks at 3 a.m. and pages a person who has long since left the company.

This is the problem a unified namespace is meant to solve, and it's worth being precise about what that means before the acronyms pile up. We see the same wiring closet in food plants, in waste-to-energy lines, in metals. The data exists. It's just trapped in point-to-point integrations that grow like ivy.

What a unified namespace actually is

A unified namespace (UNS) is one place where the current state of the whole operation lives, structured so a human can read the address and know exactly what they're looking at. It is not a database. It's a message broker holding the latest value of every tag, organized as a hierarchy of topics, with every producer publishing into it and every consumer subscribing to the parts it cares about. The historian, the MES, the dashboard, the anomaly model — they all read from the same place instead of reaching into the PLC directly.

The hierarchy is the part people underestimate. If you name your topics well, the namespace becomes self-documenting. The natural structure to borrow is the equipment hierarchy from ISA-95, the international standard for enterprise-control system integration that ISA describes as an architecture built on the Purdue model and applicable regardless of the technology underneath. ISA-95 gives you a role-based chain — enterprise, site, area, work center, work unit — that maps cleanly onto a topic path like acme/tallinn/pasteurizer/line2/cip-pump. Anyone who knows the plant can read that address. So can a script.

That readability is the whole point. The UNS turns a tangle of bilateral integrations into a single hub-and-spoke pattern. Add a consumer, and it subscribes to what it needs. Add a producer, and it publishes once. The number of connections you maintain stops growing with the square of the number of systems and starts growing in a straight line. Nobody has to rewire the plant to add a new dashboard.

Why MQTT carries it

The transport almost everyone reaches for is MQTT, and the reasons are structural rather than fashionable. MQTT is a publish/subscribe protocol: clients don't address each other, they publish to topics on a broker and subscribe to topics from it. The producer and consumer never need to know the other exists. That decoupling is exactly what a namespace wants.

It's also a real standard with a long track record in constrained networks. MQTT v3.1.1 was published as ISO/IEC 20922:2016, and the OASIS technical committee released MQTT version 5.0 in March 2019 with features like shared subscriptions, message expiry, and reason codes on acknowledgements. The protocol was built for links that are slow, expensive, or flaky — cellular backhaul from a remote pump house, a saturated plant Wi-Fi segment, a satellite hop. Header overhead is small and the connection model survives drops.

MQTT defines three delivery guarantees, and the choice matters more than it looks. The OASIS standard lays them out plainly: QoS 0 is "at most once," where "message loss can occur"; QoS 1 is "at least once," where "messages are assured to arrive but duplicates can occur"; QoS 2 is "exactly once," reserved by the spec for cases "with billing systems where duplicate or lost messages could lead to incorrect charges." For high-rate telemetry where the next sample is along in a second anyway, QoS 0 is usually right. For a command, or for a status flag the whole system keys off, you pay for QoS 1.

Two MQTT mechanisms do quiet, heavy lifting in a namespace. The retained message lets the broker hold the last value published on a topic, so a consumer that connects late gets the current state immediately instead of waiting for the next change. And the Will message — MQTT's last-will mechanism — is a message the client hands the broker at connect time, which the broker publishes automatically if the client drops without saying goodbye. The OASIS spec defines it as an application message "published by the Server after the Network Connection is closed in cases where the Network Connection is not closed normally." Hold that one. It's the seed of the hardest problem in the whole architecture.

So why not just push JSON over a broker and call it a namespace? Plenty of plants do, and for a single line with one integrator it can be fine. The trouble starts at the second site. Text payloads are verbose, every team invents its own field names, and there's no agreed signal for "this node is alive." You end up rebuilding, badly, the exact machinery a standard already specifies.

What plain MQTT leaves on the table

Raw MQTT is a transport. It says nothing about what your topics should be called, what the payloads look like, or how a consumer is supposed to know whether the data it's seeing is current or stale garbage from a node that died ten minutes ago. Two integrators given the same plant and only MQTT will build two incompatible namespaces. Both will work until you try to join them.

This is the gap Eclipse Sparkplug fills. Sparkplug is an open specification that puts an opinionated layer on top of MQTT: a defined topic structure, a binary payload format, and — the part that earns its keep — a state management model. In November 2023 the Eclipse Foundation announced that Sparkplug 3.0 had been published as an international standard, ISO/IEC 20237:2023, transposed through the Publicly Available Specification process of ISO/IEC's Joint Technical Committee. So both layers of a Sparkplug UNS now sit on top of formally ratified standards.

Sparkplug fixes the topic namespace to a known shape — namespace/group_id/message_type/edge_node_id/device_id — and encodes payloads with Google Protocol Buffers rather than text JSON, which keeps each message compact on the wire. It also defines a small, fixed vocabulary of message types: birth and death certificates for nodes and devices (NBIRTH, NDEATH, DBIRTH, DDEATH), data messages (NDATA, DDATA), commands (NCMD, DCMD), and a host STATE message. That vocabulary is what makes the next part possible.

Here's the part that surprised us

The first time you watch a Sparkplug edge node lose its link and recover, the behavior is counterintuitive in a good way. The hard problem in any pub/sub telemetry system isn't moving data. It's knowing, on the consuming side, whether the last value you received is still true. A historian showing 412 °C looks identical whether the sensor reported 412 °C two seconds ago or whether the gateway fell off the network an hour ago and 412 is a corpse. Operators have made bad calls off stale tags. We've seen it.

Sparkplug answers this with report by exception built on top of the Will message. An edge node, on connecting, registers an NDEATH as its MQTT Will. Then it publishes an NBIRTH that declares every metric it owns and their current values. From there it sends only changes — report by exception — so a tag that isn't moving costs nothing. If the node drops, the broker fires the pre-registered NDEATH on its behalf, and every subscriber learns in the same breath that all of that node's data is now suspect. State is explicit. A consumer never has to guess whether silence means "unchanged" or "dead," because the protocol tells it which.

Report by exception also does real work on bandwidth and on broker load. A pasteurizer line might carry hundreds of tags that are flat most of the time. Polling them on a timer moves the same numbers over and over; publishing only on change moves almost nothing until something happens. On a metered cellular link, that's a line item. On a busy broker, it's headroom.

The quality-of-service rules underneath are specific, and they exist for exactly this reason. Sparkplug's normative statements require that all non-STATE messages be published at QoS 0 with the retain flag set false, while every STATE message from a primary host must go out at QoS 1 with retain set true. The NDEATH Will is registered at QoS 0, retain false. The asymmetry is deliberate: ordinary telemetry is cheap and self-correcting, but the signals that tell the system who is alive — the host's STATE — must be reliable and must be there for any client that connects late.

Message type	Purpose	QoS	Retain
NBIRTH / DBIRTH	Declare a node/device and all its metrics on connect	0	false
NDATA / DDATA	Report changed values (report by exception)	0	false
NDEATH / DDEATH	Signal a node/device is gone (NDEATH via MQTT Will)	0	false
NCMD / DCMD	Commands to a node/device	0	false
STATE	Primary host availability	1	true

Sparkplug 3.0 message types and their required QoS and retain settings, per the specification's normative statements.

Where it collides with the Purdue model

Here's the tension nobody mentions in the architecture diagrams. The UNS pattern wants a flat, central broker that everyone reaches. The security model your plant is built on wants the opposite. ISA-95 is layered on the Purdue model, and the discipline that secures operational technology — ISA/IEC 62443 — divides a plant into zones and conduits: groups of systems with shared security requirements, connected only through controlled, defined pathways. The whole point is that a breach in one zone doesn't take the plant down. Defense in depth lives in those boundaries.

A broker that every device on every level publishes to can quietly become a way around those boundaries. Do it carelessly and you've built a single node that touches Level 1 sensors and Level 4 business systems at once — a beautiful target. So the broker is a zone, and the links into it are conduits, and they get treated that way: the broker in a hardened DMZ, edge nodes publishing outward through a defined conduit rather than every consumer reaching down into the cell.

The transport won't save you here, and you shouldn't expect it to. MQTT does not mandate encryption. The OASIS standard only says implementations "SHOULD offer Authentication, Authorization and secure communication options" — a recommendation, not a requirement. Out of the box, MQTT will happily carry your entire plant in clear text to anyone who can reach the broker. TLS, per-client credentials or certificates, and topic-level access control so an edge node can publish its own branch and nothing else, are not optional extras. They're the difference between a UNS and an open microphone in the plant. Sparkplug's clean topic structure actually helps: because publish rights map onto the topic tree, you can write authorization rules that read like the org chart.

The economics, honestly

The case for a UNS isn't really about any single feature. It's about what stops costing you. Every point-to-point integration in that opening wiring closet is a thing somebody maintains: a credential that expires, a schema that drifts, a script that silently stops. Collapse them into one hub and the per-integration tax falls, because new systems join by subscribing instead of by commissioning another custom link. Report by exception trims the recurring bandwidth bill on metered links. None of that shows up as a dramatic number on day one. It shows up as the 3 a.m. pages that stop happening, and the new analytics project that ships in a sprint instead of a quarter because the data was already addressable.

It is not free. You're introducing a broker that has to be sized, made highly available, secured, and monitored like the production-critical asset it now is. If the broker is the single source of truth, the broker going dark is the plant going blind, so redundancy and clustering move from nice-to-have to baseline. The honest trade is one well-understood dependency you can engineer around, in place of a dozen fragile ones you can't see until they fail.

There's a quieter cost too, and it's organizational. A namespace forces agreement. Two teams that have spent years naming the same pump three different ways now have to pick one address for it, and that conversation is harder than any broker config. We've watched the technical rollout finish in a week and the naming argument run for a month. It's worth the month. The hierarchy is the asset that outlives every piece of software bolted onto it, so it's the one part you cannot afford to get wrong in a hurry.

What we check at commissioning

When we stand a UNS up on a line, the build is less about the broker and more about discipline around it. So what separates a namespace that ages well from one that quietly rots back into spaghetti? A handful of habits, none of them glamorous:

Name the hierarchy before the first tag. The ISA-95 path is the contract every team builds against. Renaming topics after consumers exist is how you recreate the spaghetti you came to remove.
Decide QoS per signal, not per system. High-rate telemetry at QoS 0, the signals state depends on at QoS 1. Sparkplug already encodes this; respect it rather than overriding it everywhere "to be safe," which just floods the broker.
Treat liveness as data. If you're on plain MQTT, you build birth/death and last-known-state yourself. Sparkplug gives it to you. Either way, no consumer should ever display a value it can't prove is current.
Put the broker in a zone and gate the conduits. TLS on, anonymous access off, per-client topic permissions on. The namespace is only as trustworthy as the access control around it.
Make it redundant before you make it the source of truth. The moment the rest of the plant relies on the broker, its availability is the plant's availability.

And resist the urge to model everything on day one. The temptation, once the broker is live and the hierarchy is drawn, is to publish every tag in the plant because you can. Start with the signals an operator or a model actually consumes, prove the pattern, then grow the namespace branch by branch. A UNS earns trust the same way an instrument does: by being right about the few things that matter before it's asked to be right about everything.

The wiring closet that opened this note didn't need a bigger switch or another gateway. It needed one address space the whole plant could agree on, a transport that survives a bad link, and a state model honest enough to admit when a node has gone quiet. MQTT and Sparkplug give you the second and third. The first is the hierarchy you draw — and that's still engineering, not download. The work we do at Zoniax sits on exactly this seam: the Zoniax edge telemetry and analytics platform publishes instrumented plant data into a structured namespace, and if you're weighing where sensors and a UNS fit on your own lines, that's the kind of problem our industrial AI deployment services are built around.

MQTT, Sparkplug B, and the Unified Namespace

What a unified namespace actually is

Why MQTT carries it

What plain MQTT leaves on the table

Here's the part that surprised us

Where it collides with the Purdue model

The economics, honestly

What we check at commissioning

References

Reuse & license

Disclaimer

Cite this article

What a unified namespace actually is

Why MQTT carries it

What plain MQTT leaves on the table

Here's the part that surprised us

Where it collides with the Purdue model

The economics, honestly

What we check at commissioning

References

Reuse & license

Disclaimer

Cite this article

Related articles