Copilots in the Control Room: What the Demo Skips

The control room sat behind two doors and a pressure differential, the way they often do on a process line that runs hot. Eight screens on the wall. Two operators on shift, one of them new. The mimic for the dryer train flickered amber in the corner, an alarm that had been standing for most of the morning and that everybody had learned to read past. We were there to scope telemetry, not software. But the question came up before we'd put our bags down: could one of those AI copilots just tell the new operator what the amber means?

It's a fair question, and it's the one driving most of the procurement decks crossing operations desks right now. The pitch is clean. Put a large language model next to the operator, feed it the manuals and the historian, and let people ask in plain words what a machine error code means or why a loop is hunting. Siemens ships exactly this as its Industrial Copilot, built with Microsoft, and frames it as natural-language help for engineering and maintenance work across the line. The promise is real enough that it's worth taking apart carefully, because the gap between the demo and the enclosure is where the money and the risk both live.

What a copilot is, once it's on the floor

Strip the branding and an operations copilot is three things bolted together. A language model that turns questions into text. A retrieval layer that pulls in your documents, P&IDs, and tag history so the model answers about your plant and not the internet's. And a set of connectors into the systems where the data lives — the historian, the CMMS, sometimes the control network itself. The model is the part everyone talks about. The connectors are the part that decides whether the thing is useful or dangerous.

That ordering surprises people who come at this from the IT side. Swapping one language model for a better one is a config change. Wiring a copilot safely into a historian, a CMMS, and a control network — deciding what it may read, what it may never touch, and which account it runs under — is plant engineering, and it's where the weeks go. The intelligence is more or less a commodity you rent. The integration is yours to get right, and nobody can sell it to you off the shelf because it's specific to your architecture, your protocols, and your risk tolerance. Keep that straight and the rest of the decisions fall into place.

On the floor the copilot is sold against a specific, measurable pain: the operator is drowning in information and short on time. That pain is not invented. It has a literature and a number attached to it, which is more than you can say for most of what the AI vendors claim.

The problem it's actually sold against

Alarm overload is the honest justification for putting a copilot in the control room. The standards that govern alarm systems — ANSI/ISA-18.2, harmonised internationally as IEC 62682, and the older EEMUA Publication 191 that most plants still benchmark against — put hard numbers on what a human can absorb. The guidance, as documented in Chemical Engineering's review of the standards, is that a long-term average of up to about 12 alarms per hour is the most an operator can reasonably manage, and that a ten-minute window with more than 10 new alarms marks the start of an alarm flood, the condition where the annunciator stops being a tool and becomes noise.

Anyone who has stood in a control room during an upset knows what a flood feels like. The list scrolls faster than you can acknowledge it. The one alarm that matters — the seal pot losing level, say — sits three pages down under forty nuisance trips that all chained off the same root cause. So the copilot pitch lands: let the model read the flood, rank it, and tell the operator the probable root cause in a sentence. If it worked every time, it would be worth a great deal. The trouble is the phrase "every time."

It's worth being clear about what the standards actually ask for, because the copilot is often sold as a substitute for work the standards already prescribe. ISA-18.2 lays out an alarm-management lifecycle: identify, rationalise, classify, and document every alarm so that each one is genuine, actionable, and worth an operator's attention. A plant that has run that lifecycle — pruned the duplicates, suppressed the consequential trips, set deadbands properly — has already cut its flood at the source. A plant that hasn't is proposing to paste a probabilistic summariser over a problem that better engineering would have removed. The order of operations is the point. Rationalise the alarm system first; a copilot that summarises a rationalised system is helpful, but one that papers over a broken one just adds a second thing that can be wrong.

Where the demo skips: the model makes things up

Here's the part the procurement deck glosses. Language models confabulate. They produce confident, fluent, wrong answers, and they do it with exactly the same tone they use when they're right. This isn't a rough edge that better prompting sands off. NIST treats it as a defining hazard of the technology: its Generative AI Profile, NIST AI 600-1, published July 2024 as a companion to the AI Risk Management Framework, names confabulation as one of the risk categories specific to generative systems — outputs that are stated confidently but are false. A 2024 survey of LLM factuality out of MBZUAI and collaborators reaches the same blunt conclusion: instruction-tuned models frequently generate factually incorrect responses despite how convenient and fluent they are.

Think about what that means in the enclosure. A deterministic system — a PLC running ladder logic, an interlock wired to a relay — fails in ways you can trace. The logic is the logic. A language model has no such property. Ask it the same question twice and you can get two answers. Ask it about a failure mode that isn't well represented in its training or your documents, and it will not say "I don't know." It will give you a plausible paragraph. In a spreadsheet that's an annoyance. Standing in front of a furnace, a plausible paragraph about why the draft is dropping is worse than silence, because silence at least sends the operator back to the gauges.

And the failure compounds with how the answer is delivered. The model writes well. Fluency reads as competence to a tired human at 3 a.m., and a confident wrong answer is harder to catch than a hesitant one.

Retrieval helps, and your drawings are the bottleneck

The standard answer to confabulation is retrieval — ground the model in your own documents so it quotes the manual instead of inventing one. It's the right instinct, and it's why the connector layer matters more than the model. Point the copilot at the operating procedures, the equipment manuals, the maintenance history, and a question like "what's the trip setpoint on this compressor" stops being a guess and becomes a lookup. NIST's framework treats this kind of grounding and provenance as part of managing generative risk rather than an optional nicety.

But retrieval only moves the problem. It doesn't dissolve it. The model still chooses what to retrieve, still stitches passages together, and still phrases the result in its own words, which means it can misread a table, merge two pieces of equipment, or quote a revision of a procedure that was superseded three turnarounds ago. And the grounding is only as good as the documents underneath it. Walk into most plants and the P&IDs are a mix of a clean PDF set, a few marked-up prints taped to a desk, and tribal knowledge that lives in one fitter's head. Feed a model scanned drawings and it inherits every OCR error in them. So before anyone scopes a copilot, the honest first question isn't about the model. It's whether your document set is current, machine-readable, and trustworthy — because that, not the AI, is usually the long pole.

This is also where the work pays back regardless of the copilot. Cleaning up the document and tag layer so a machine can read it is the same work that makes a control room legible to a new operator. The copilot is one consumer of that effort, not the reason for it.

The operator failure mode nobody scopes

So the real risk isn't only that the model is wrong. It's that the operator believes it. This has a name in human-factors work — automation bias — and it's the tendency to favour a machine's recommendation over your own read of the instruments, especially when you're tired, unsure, or under load. The conditions in a control room during an upset are precisely the conditions that produce it.

There's measurement here, not just theory. A 2024 study in International Studies Quarterly by Horowitz and Kahn ran a controlled experiment on when people defer to AI advice. According to their results, the deference is non-linear: people with the least AI background switched their answer to follow the AI roughly 19% of the time, those with the most switched about 22%, and the people in the middle — the ones with some familiarity, enough to trust it — peaked around 29%. The frightening cohort isn't the novice who distrusts the tool or the expert who checks it. It's the competent operator with moderate confidence who has learned the copilot is usually right. That's most of your shift.

Here's what surprised us on that visit, and it's why I keep coming back to this. The new operator wasn't the one we worried about. He didn't trust the screens yet, so he walked the line and read the local gauges. The risk sat with the experienced hand who'd been using a chatbot at home for a year, found it usually right, and had started treating "usually" as "always." A copilot doesn't only answer questions. It quietly reshapes who checks the instruments and who doesn't. That's a change to the control room that no one wrote into the scope.

Read-only first, or don't bother

None of this means the copilot has no place. It means you wire it like you'd wire any new node onto a plant network: assume it can fail, and make sure failure can't reach the process. The single most important design decision is the boundary between reading and writing.

A copilot that reads is a reference librarian. It pulls the manual, summarises the last six trips on a pump, surfaces the procedure for a startup. If it's wrong, the operator catches it against the gauges and the cost is a few wasted seconds. A copilot that writes — that pushes a setpoint, acknowledges an alarm, or touches a control loop — is a new actuator on your process with a non-deterministic brain behind it. Those are different machines with different risk profiles, and the gap between them is the whole game.

Capability	Read-only assistant	Write / action-taking
Worst-case failure	Bad advice the operator can check	Unsafe setpoint or action on live plant
Recovery	Ignore it, read the gauge	Interlock, trip, or incident
Network placement	Reads from historian / data diode	Needs a path into the control zone
Verification burden	Operator judgement	Formal safety case
Honest maturity today	Deployable with guardrails	Not yet, for most plants

The network architecture follows from that boundary, and the standard to reach for is ISA/IEC 62443, the consensus series for industrial automation and control system security. Its model of zones and conduits is exactly the right frame here. The control system is a zone with its own security requirements. A copilot, with its model weights somewhere and its retrieval layer talking to a historian, does not belong in that zone. It belongs outside it, reading a copy of the data across a tightly defined conduit, ideally one that only flows one way. Least privilege isn't a slogan in 62443; it's the default posture — allow only the communication the process needs and block the rest.

In ISA-95 terms, keep the copilot at Level 3 or above, reading from the historian and the manufacturing systems, and keep it off Levels 1 and 2 where the controllers and instruments live. If you must give it visibility into live process values, give it a read-only feed over OPC UA with its own role and credentials — never a Modbus write register, never a shared engineering account. The protocols that move plant data were not designed assuming a probabilistic agent at the other end. Treat the copilot as untrusted until your architecture makes it harmless when it's wrong.

The governance you'll have to show

Two years ago you could deploy a tool like this and tell no one. That's no longer true, and the change matters for whoever signs off. If a copilot starts informing how operators run a plant, it's an AI system doing real work, and there's now a stack of expectations around that.

The EU AI Act, Regulation (EU) 2024/1689, entered into force in 2024 with its obligations phasing in over the following years. Whether a given control-room assistant lands in its high-risk tier depends on the specific use, and that's a determination to make with counsel rather than from a blog. But the direction is set: an AI system that affects safety-relevant operation is the kind of system regulators now expect you to govern, document, and keep a human meaningfully in charge of. The Act's logic and the human-factors evidence point the same way — the operator has to stay the decision-maker, not the rubber stamp.

For the management side there's ISO/IEC 42001:2023, the first international standard for an artificial-intelligence management system, which gives you a recognised structure for governing AI tools the way ISO 9001 structures quality. And NIST's AI Risk Management Framework, released January 2023, lays out the Govern–Map–Measure–Manage loop that turns "we should be careful" into specific, auditable actions. None of these are heavy if you start early. They're a description of a well-run deployment: know what the tool does, measure whether it's right, keep a person accountable, and write it down. The plant that can show that has very little to fear from an audit. The plant that bolted a chatbot onto its historian over a weekend has a lot to explain.

What we'd actually put in

So where does this leave a real operations team weighing the pitch? Not at "no." At a narrower, more honest "yes."

Deploy the copilot read-only, against your documents and your historian, as a faster way to find what's already true. Let it answer "what's the startup procedure for this line," "what changed on this pump in the last month," "what does fault code E-217 mean on this drive." Those are retrieval questions with a checkable answer, and on those it earns its keep — the same kind of maintenance-support task where Siemens reports its Industrial Copilot saved an average of 25% of reactive maintenance time in early pilots, per its March 2025 announcement. That figure is a vendor's pilot number, not a guarantee for your plant, but the shape of the win is plausible: less time hunting through binders, more time on the tool.

Keep it out of the alarm-acknowledgement loop and off the setpoints until the technology and your safety case both mature, which for most plants is not today. Train operators on the failure mode, not just the feature — show them the model confabulating so they learn to distrust the fluent paragraph.

And measure the thing the way you'd measure any instrument. Before it goes live, build an acceptance test: a few hundred real questions with known, document-backed answers, scored for how often the copilot is right, how often it's confidently wrong, and how often it correctly admits it doesn't know. Run it again after every model or document change, because an update that helps general questions can quietly break a plant-specific one. NIST's Measure function exists for this — you don't get to claim a tool is trustworthy, you have to show it with numbers you can reproduce. Then watch the second-order effect in the control room itself: are operators still cross-checking the gauges, or has the copilot become the gauge? If acknowledgement times drop but near-misses tick up, the tool isn't helping, it's anaesthetising. That's a measurement you can only take on the floor.

Who owns it once it's running? That question decides as much as any architecture choice. A copilot that informs operations needs a named owner, a change log, and a retirement plan for the day a better model arrives — the same lifecycle discipline you'd apply to any instrument or control narrative. Treat it as a permanent unmonitored fixture and it will drift out of step with the plant it's meant to describe.

Building that measurement and the read-only telemetry path underneath it is the unglamorous part, and it's the part our edge telemetry and analytics platform exists to handle, so the model sits on a clean, governed feed rather than a tap improvised into the control network.

The copilot is a genuinely useful reference librarian for the control room. It is not yet an operator, and it is nowhere near an engineer. Wire it for the job it can do, fence it from the jobs it can't, and keep the human reading the gauges. The amber alarm on that dryer train still needed a person to walk over and look. A copilot could have told the new operator faster what it probably meant. It could not have told him it was right.

Copilots in the Control Room: What the Demo Skips

What a copilot is, once it's on the floor

The problem it's actually sold against

Where the demo skips: the model makes things up

Retrieval helps, and your drawings are the bottleneck

The operator failure mode nobody scopes

Read-only first, or don't bother

The governance you'll have to show

What we'd actually put in

References

Reuse & license

Disclaimer

Cite this article

What a copilot is, once it's on the floor

The problem it's actually sold against

Where the demo skips: the model makes things up

Retrieval helps, and your drawings are the bottleneck

The operator failure mode nobody scopes

Read-only first, or don't bother

The governance you'll have to show

What we'd actually put in

References

Reuse & license

Disclaimer

Cite this article

Related articles