Industrial Data Governance: Owning OT Data

What do we actually mean by industrial data governance?

Governance is the set of rules that decide who owns a piece of data, who may read or change it, how long it is kept, and how its quality is guaranteed. It's not the same as storing data, and it's not the same as locking it down. On a plant floor you already govern physical assets. Every valve carries a tag number, an owner, a maintenance history, and an access procedure. Industrial data governance applies that same discipline to the signals those assets throw off.

The confusion starts because the data is invisible. A flow transmitter produces a reading every second. That reading lands in a PLC register, gets pulled into a historian, copied into a shift report, fed to a model, and maybe shipped to an OEM's cloud for warranty analysis. At each hop someone is quietly deciding who owns the number and who gets to see it. Governance just makes those decisions explicit and writes them down before a dispute, an audit, or a breach forces the question.

So the working definition we use is narrow on purpose: ownership, access, retention, and quality, applied to operational data from the sensor up to the boardroom dashboard. Security is a neighbour, not a synonym. And the two only work when they are designed together, which we will get to.

Who owns the data coming off our machines?

Legally, it depends, and that answer used to be the end of the conversation. Raw machine data isn't protected by copyright the way a database or a piece of software is, and "ownership" in the property sense barely applies to a stream of numbers. What you really have is a tangle of contracts: the supply agreement with the equipment OEM, the service contract with your integrator, the terms of whatever cloud platform the telemetry flows into. Whoever wrote those terms first usually wrote them in their own favour.

That's changing in the EU. The Data Act, Regulation (EU) 2023/2854, applies from 12 September 2025 and gives the user of a connected product a direct right to access the data that product generates, and to share it with a third party of their choosing (European Commission, Data Act). For a plant that runs OEM-supplied compressors, turbines, or packaging lines, that's a real shift. The readings your own machines produce are no longer something the manufacturer can simply wall off behind a service portal.

But there's a catch worth knowing before you celebrate. The Data Act lets a data holder protect trade secrets and apply proportionate safeguards, so access isn't unconditional, and the "access by design" obligation, which requires products to be built so the data is reachable by default, only bites on products placed on the market after 12 September 2026. Older kit on your floor today may still need a contractual fight to open up. So read your service agreements now, while you have leverage at renewal, not after an incident.

How is governance different from cybersecurity? Do we need both?

You need both, and they fail in different ways. Security answers "can an attacker get in and what can they touch." Governance answers "who is allowed to touch it, who is accountable, and is the data fit to act on." A perfectly secured historian full of mislabelled, undated tags is a governance failure that no firewall will fix.

The standards bodies have started stitching the two together. When NIST released Cybersecurity Framework 2.0 on 26 February 2024, the headline change was a new Govern function added alongside Identify, Protect, Detect, Respond, and Recover. It puts strategy, roles, policy, and oversight at the centre of the model rather than treating them as an afterthought (NIST, 2024). Read that as the security world admitting what plant managers already knew: controls without accountability drift.

On the OT side, NIST SP 800-82 Revision 3, the Guide to Operational Technology Security published in September 2023, spends its length on exactly the constraints that make plant data hard to govern: availability and safety come first, latency budgets are tight, and you can't reboot a furnace to patch it (NIST SP 800-82r3). Governance has to live inside those constraints. A retention rule that throttles the control network, or a classification scheme that delays an alarm, is a rule that will be ignored within a week.

Where does governance actually start in the stack?

It starts at the tag, which means it starts with a model of the plant. The reference most operators already use is ISA-95, published internationally as IEC 62264, which organises an enterprise into levels that map cleanly onto the old Purdue model (ISA, ISA-95). Knowing which level a number was born at tells you almost everything about how to govern it.

ISA-95 level	What lives there	Governance concern
Level 0	The physical process: sensors and actuators	Calibration, units, timestamp source
Level 1	Sensing and manipulation: field devices, drives	Tag naming, signal quality flags
Level 2	Supervisory control: PLCs, DCS, SCADA	Who can write setpoints; read vs write rights
Level 3	Operations management: MES, MOM, historians	Contextualisation, retention, lineage
Level 4	Business planning: ERP, supply chain	Aggregation, reporting, external sharing

ISA-95 / IEC 62264 levels and where each governance concern bites.

Most governance debt is created at Level 1 and never repaid. A tag named FT_2103 with no engineering unit, no description, and a timestamp stamped by whatever clock the gateway happened to have is a liability the moment it leaves the panel. Fix it at the source and every downstream copy inherits the fix. Fix it in the data warehouse instead and you've signed up to fix it forever, once per report.

This is the unglamorous core of the work, and it's where our own crews spend most of their commissioning time before any model gets trained. Get the naming convention, the units, and the time base right at the edge, and governance higher up becomes bookkeeping. Skip it, and you're reconstructing meaning from memory three years later when the engineer who knew what FT_2103 meant has left.

How do we classify and control who sees what?

Borrow the architecture the security standard already gives you. ISA/IEC 62443, the horizontal standard series for industrial automation and control system security, asks you to break the plant into zones (groups of assets with shared protection needs) and conduits (the controlled channels between them). Its seven foundational requirements include data confidentiality and restricted data flow, which are governance ideas wearing a security badge (ISA/IEC 62443).

Practically, that means classifying data before you classify users. A reasonable starting taxonomy is three tiers: process telemetry that can stay inside the OT zone, contextualised operational data that can cross into the IT zone under controls, and aggregated reporting data that can leave the building. Each tier gets a default flow rule. And the default for the first tier is "nothing leaves without a reason," not "everything is shared unless someone objects." Defaults decide outcomes, because nobody re-reads the policy.

And keep write access on a far shorter leash than read access. Reading a setpoint is an analytics question. Writing one is a safety question. 62443 separates use control from confidentiality for this reason, and your governance register should too. The plant manager who can pull every trend in the historian should not, by the same credential, be able to push a value back to a Level 2 controller.

Resist the urge to make the taxonomy clever. Three tiers that everyone understands beat nine tiers that only the author understands. A scheme is only as good as the operator who has to apply it at three in the morning during an upset, so the test of a classification isn't whether it looks complete on a slide. It's whether the night-shift engineer can place a new tag in the right tier without phoning anyone. If they can't, the scheme is decoration, and decoration doesn't survive contact with a running plant.

How do we share data with vendors and OEMs without leaking the plant?

This is the question that gets governance funded, because everyone has felt the pull both ways. The OEM's predictive-maintenance model genuinely works better with live data. The plant genuinely can't hand a third party an open pipe into the control network. You resolve it with transport, scope, and direction, in that order.

For transport, OPC UA is the workhorse, and it was built with a security model rather than having one bolted on later. It offers three message security modes, including Sign and Encrypt, authenticates clients and servers with X.509 certificates, and exchanges a session key so traffic is both signed for integrity and encrypted in transit (OPC Foundation, Part 2 Security Model). Its information model also lets you publish a machine's data with consistent semantics through companion specifications, so the OEM consumes "motor temperature in degrees C" rather than guessing what AI_07 means.

For scope and direction, give the vendor a read-only view of a named subset, served from a Level 3 zone, never a connection into Level 2. A data diode or a tightly governed conduit beats a VPN that quietly grants more than anyone audited. Where the EU framework helps is the Data Governance Act, Regulation (EU) 2022/868, applicable since 24 September 2023, which created a class of regulated data intermediation services meant to broker exactly this kind of sharing in a neutral, trusted way (European Commission, Data Governance Act). You don't have to use one, but the model is a useful template even for a bilateral deal: a clear intermediary, explicit terms, no silent secondary use.

How do we keep the data trustworthy over time?

Trust comes from metadata, not from good intentions. Every governed signal should carry, at minimum, an engineering unit, a source identity, a quality flag, and a timestamp you can defend. Was that timestamp set when the sensor sampled, or when the record finally hit the database after a network stall? On a fast loop the difference reorders events and quietly poisons any model trained on them. So pin the time base at the lowest level you can.

Lineage is the other half. When a number lands on a director's dashboard, can you trace it back through every transformation to the transmitter that produced it? If a regulator, an auditor, or your own incident review asks "where did this figure come from," the honest answer should take minutes, not a week of spreadsheet archaeology. A disciplined historian and a consistent information model earn their cost right here. Our own work building out an edge telemetry and analytics platform keeps coming back to the same point: a model is only as honest as the lineage of the data feeding it, and most plants discover their lineage gaps the first time a number is challenged.

Retention deserves a real policy, not a default of "keep everything." High-resolution process data is cheap to generate and expensive to keep at full fidelity forever. Decide, per data class, how long you hold raw values, when you downsample, and when you delete. Storage that grows without a rule becomes a liability under both data-protection law and basic discovery risk. And the deletion rule matters as much as the keep rule. A clear, defensible retention schedule is the difference between "we removed it on policy" and a far worse conversation about why a number nobody could explain was still sitting on a server years after it stopped meaning anything.

How does governance connect to the AI we want to run on the plant?

Directly, because a learning model is a governance amplifier. Feed it clean, well-labelled, well-timed data and it produces decisions you can defend. Feed it the contents of an ungoverned historian and it learns your labelling mistakes, then repeats them at machine speed across every line. The model doesn't know that FT_2103 drifted out of calibration in March; it only knows the numbers it was given.

So the governance work pays for itself twice. The first return is the obvious one: cleaner audits, clearer ownership, fewer arguments about whose number is right. The second is that the same metadata, units, quality flags, timestamps, and lineage, is exactly what a model needs to be trustworthy. A plant that has done the governance groundwork can deploy analytics in weeks. One that hasn't spends those weeks reconstructing what its own tags mean, and tends to ship a model nobody trusts. Governance isn't a tax on analytics. It's the substrate they run on.

What does a workable program look like for a mid-sized plant?

Start with a register, not a platform. List your significant data flows, and for each one name an owner, a classification, an access rule, and a retention period. A plant with a few hundred meaningful tag groups can do the first pass in a couple of workshops with the people who actually run the lines. The register is the program. The tooling comes after, to enforce what the register already decided.

Assign two roles even if they sit on one person at first. A data owner is accountable for a domain and makes the call on who gets access and why. A data steward does the daily work: naming, quality checks, fixing the broken tags. Without an owner, access requests stall or get rubber-stamped. And without a steward, quality rots. These aren't new full-time hires for most plants; they're responsibilities written next to names that already exist on the org chart.

Then tie it to the regime you already answer to. Operators of essential services in the EU sit under the NIS2 Directive, and the cybersecurity measures for several entity types are set out in Commission Implementing Regulation (EU) 2024/2690. ENISA's NIS360 2024 assessment found the electricity subsector among the most mature, while gas and several other sectors sit in what it calls the risk zone, with maturity lagging the criticality of the assets, and it flagged operational technology as a persistent weak point across the board (ENISA, NIS360 2024). Governance is how you turn that obligation into something a plant can actually run, rather than a binder nobody opens.

Where do most governance efforts go wrong?

The common failure is treating it as an IT project handed to people who have never stood in a control room. The rules come back elegant and unworkable: a classification scheme with nine tiers, an approval workflow that adds a day to every analytics request, a retention policy that ignores the safety case for keeping certain trends. So the plant routes around all of it, and you are back to invisible, ungoverned data, now with a binder that lies about it.

The second failure is collecting before deciding. It's tempting to historise every tag at full resolution "in case we need it," then sort out governance later. But later never comes, and you've built a large, undocumented liability that gets harder to classify the bigger it grows. Decide the rule for a data class before you open the firehose, not after.

The third is forgetting that governance isn't a project with an end date. Plants change. A new line, a new OEM contract, a new model in production all create data flows the register has never seen. If nobody owns the register, it's stale within a quarter. So pick a cadence, review it like you review a safety system, and keep the ownership real. Governance that isn't maintained is just documentation of how things used to be.

None of this is exotic. It's the same accountability you already apply to physical assets, pointed at the signals they produce, and written down before someone else writes the rules for you.

Industrial Data Governance: Owning OT Data

What do we actually mean by industrial data governance?

Who owns the data coming off our machines?

How is governance different from cybersecurity? Do we need both?

Where does governance actually start in the stack?

How do we classify and control who sees what?

How do we share data with vendors and OEMs without leaking the plant?

How do we keep the data trustworthy over time?

How does governance connect to the AI we want to run on the plant?

What does a workable program look like for a mid-sized plant?

Where do most governance efforts go wrong?

References

Reuse & license

Disclaimer

Cite this article

What do we actually mean by industrial data governance?

Who owns the data coming off our machines?

How is governance different from cybersecurity? Do we need both?

Where does governance actually start in the stack?

How do we classify and control who sees what?

How do we share data with vendors and OEMs without leaking the plant?

How do we keep the data trustworthy over time?

How does governance connect to the AI we want to run on the plant?

What does a workable program look like for a mid-sized plant?

Where do most governance efforts go wrong?

References

Reuse & license

Disclaimer

Cite this article

Related articles