How to Measure ROI on Industrial AI

One number frames every industrial AI conversation we have: $1.4 trillion. That's the combined annual cost of unplanned equipment downtime across Fortune Global 500 companies, according to Siemens' True Cost of Downtime 2024 study.

According to the same Siemens study, that loss equals roughly 11% of those firms' total revenue, up from 8% before the pandemic, and a stalled line in a large automotive plant can run to $2.3 million an hour. So the prize is enormous and well documented. The problem is measuring how much of it AI actually claws back.

Here's the uncomfortable companion figure. In McKinsey's State of AI survey published in November 2025, just 39% of organizations reported any enterprise-level EBIT impact from AI, and most of those said it accounted for less than 5% of EBIT. McKinsey also found that roughly two-thirds of respondents hadn't yet scaled AI across the business at all. The technology is no longer the bottleneck. The measurement is.

This piece is about closing that gap on a plant floor: how to baseline before you deploy, what each value lever is honestly worth, how to attribute a result to a model rather than to the weather, and what the full cost of ownership looks like once you count the parts nobody budgets for. Every load-bearing figure below traces to a government, standards, or named vendor source, listed at the end.

Why the ROI is hard to see

Industrial AI rarely produces a single, clean line item. A vibration model doesn't write you a cheque. It changes when a bearing gets replaced, which changes how often a line stops, which changes throughput, scrap, and overtime. The value is genuine but diffuse, scattered across maintenance, production, and quality budgets that each have their own owner.

Three things make the measurement genuinely difficult.

First, the counterfactual. To claim a saving you have to know what would have happened without the model. A pump that didn't fail this quarter is not, by itself, evidence the model worked. Maybe it wouldn't have failed anyway.

Second, the baseline. Most plants can't state last year's unplanned downtime, scrap rate, or specific energy use to a defensible number. If you can't measure the before, you can't measure the after. And we see this constantly: the business case gets built on a remembered figure, not a recorded one.

Third, confounding. Output goes up the same quarter you commission a model, change a supplier, retrain operators, and hit a warmer-than-usual month. Untangling AI's share from everything else is the whole job, and it's where most ROI claims quietly fall apart.

But none of this is a reason to skip the math. It's a reason to do it with the same rigour you'd apply to a capital project, because that's what it is.

Measure the plant before you model it

The first deliverable of an AI project should be a measurement system, not a model. This isn't a new idea and it isn't AI-specific. The U.S. National Institute of Standards and Technology built its AI Risk Management Framework (AI RMF 1.0, released January 2023) around four functions: Govern, Map, Measure, and Manage. The Measure function exists precisely because you can't manage what you haven't quantified. NIST calls for quantitative and qualitative metrics, applied continuously, because systems drift as live data diverges from training data.

On the floor, that translates to a few concrete steps before any model goes live:

Instrument the baseline. Record the target metric — unplanned downtime hours, first-pass yield, kWh per tonne, OEE — for a representative period, ideally a full production cycle including seasonal swings. A quarter is thin. A year is defensible.
Define the metric to the decimal. "Downtime" means nothing until you've fixed whether it counts changeovers, micro-stops under five minutes, and planned maintenance. Two engineers will give you two numbers otherwise.
Capture the cost basis. What is an hour of downtime worth on this line, in lost contribution margin rather than revenue? What does a point of scrap cost in material plus reprocessing plus the energy already spent on it?
Log the confounders. Ambient temperature, feedstock grade, crew, product mix. You'll need these later to defend the result.

This work is unglamorous, and it's where most of the value gets decided. A model deployed onto a plant that already knows its numbers can prove its worth. A model deployed onto a plant that doesn't will generate dashboards nobody believes.

What each value lever is actually worth

Industrial AI pays off through a small set of levers, and each one demands a different measurement. The table below is a measurement-design checklist: for each lever, the primary metric you'd track, how you convert it to money, and the cleanest way to attribute a change to the model. Fill this in before you deploy, not after.

Value lever	Primary metric	Cost conversion	Attribution method
Predictive maintenance	Unplanned downtime hours; reactive-to-planned work order ratio	Lost contribution margin per stopped hour; emergency parts and overtime	Asset-level holdout or staged rollout
Quality / yield	First-pass yield; scrap and rework rate	Material + reprocessing + sunk energy per defect	Line-by-line A/B, same product mix
Throughput	OEE; effective rate vs. nameplate	Saleable output at realised price	Before/after with logged confounders
Energy	Specific energy use (kWh per tonne)	Energy price × tonnage	Weather- and load-normalised baseline

The maintenance lever is the most rigorously documented, so it's worth grounding. According to the U.S. Department of Energy's Operations & Maintenance Best Practices guidance, maintained by Pacific Northwest National Laboratory, a functioning predictive maintenance program saves 8% to 12% over a preventive program. Measured against a mostly reactive baseline — run it till it breaks — a facility could, per DOE, "easily recognize savings opportunities exceeding 30% to 40%." Those are condition-monitoring numbers that predate the current AI wave, which matters. AI doesn't invent the saving. It extends the reach of condition monitoring to assets that were never economic to instrument by hand.

The quality and yield lever is often larger than maintenance in process plants, but harder to source cleanly because it's so site-specific. Treat the public figures with care here. McKinsey's respondents reported 10–20% cost reductions in manufacturing and supply chain use cases, but that's self-reported survey data, not audited results, and it's an analyst figure rather than a primary measurement. So use it as a sanity check on your own business case, not as a target to copy.

Notice what the table forces. Every lever has a named metric and a named attribution method. And if you can't fill the row, you can't honestly claim the lever — that blank cell is the tell that the value is a guess.

Leading and lagging indicators

Maintenance value is lumpy, which trips up a lot of business cases. The lagging indicator — failures that didn't happen — shows up as nothing visible, punctuated by the occasional dramatic catch. So a quarter of "no failures" proves almost nothing on its own; the asset might simply have had a quiet quarter.

The fix is to track leading indicators alongside the lagging one. How many true alerts did the model raise, how many were acted on, and how many were false? What was the lead time between alert and the maintenance window? A model that reliably flags a degrading bearing two weeks out is delivering value you can see immediately, long before the avoided-failure tally accumulates enough events to be statistically convincing. Pair the two and you can argue the ROI on month three instead of waiting for year two.

The false-alarm rate deserves its own line in the report, because it has a direct cost. Every nuisance alert sends a technician to a healthy asset, burns a maintenance window, and chips away at the operators' trust in the system. A model with a great catch rate and a poor precision can still be a net loss once you price the wasted call-outs. So track precision and recall together, convert both to hours and money, and report the net. A number that only celebrates the saves and hides the false trips isn't an ROI figure. It's marketing.

Building the business case

A defensible industrial AI business case has three parts: the benefit side, the cost side, and the attribution method that connects a measured change to the model. Most cases get the first, half the second, and skip the third entirely.

The benefit side

Express benefits in contribution margin, not revenue, and in the currency the metric naturally produces. Avoided downtime converts through your per-hour margin figure. Reduced scrap converts through material plus reprocessing plus sunk energy. Yield improvement converts through saleable output at the realised price. Keep every conversion factor explicit and documented, because finance will challenge each one, and they should.

Be conservative on the benefit and honest about its shape. As the lumpiness point above shows, you need enough observation time to see the pattern before you annualise a saving. Extrapolating a full-year number from one lucky month is the fastest way to lose credibility with the people who sign off capital.

The cost side

The total cost of ownership for an industrial model runs well past the data-science effort. We routinely see budgets that count the model and forget the plant. A full accounting includes:

Sensing and connectivity: ruggedized sensors, edge gateways, and the cabling or wireless to move data off the line, often via OPC-UA or Modbus into a historian.
Integration: wiring telemetry into existing control and MES layers across the ISA-95 stack. This is usually the largest and most underestimated line.
Data engineering: cleaning, aligning timestamps, and labelling. Industrial data is messy, and labels for rare failures are scarce by definition.
MLOps and retraining: monitoring for drift, retraining as process conditions move, and the people who own that loop. A model is a perishable asset, not a fixed one.
OT security: connecting previously isolated assets expands the attack surface. The ISA/IEC 62443 series exists for exactly this, and conformance work is a recurring cost, not a one-off.
Governance and compliance: documentation, impact assessment, and audit, which the next section covers.

The attribution method

This is the part that separates a measured ROI from a hopeful one. The cleanest approach an industrial setting allows is a holdout: run the model on some lines, assets, or shifts and not others, then compare. Where a true holdout isn't possible — a single critical asset, say — use a before-and-after with the confounders you logged earlier as controls, and state the uncertainty plainly. A staged rollout, line by line, gives you a natural comparison and de-risks the deployment at the same time.

So ask the awkward question before finance does: if we'd spent this money on better preventive schedules and operator training instead, how much of the gain would we have captured anyway? The honest answer is rarely zero. The AI-attributable share is what's left after you subtract it.

The costs people forget

Two cost categories get systematically underbudgeted, and both have grown sharply.

The first is the upkeep of the models themselves. A deployed model drifts as feedstock, equipment, and process setpoints change. Left unmonitored it decays silently, and a decayed model is worse than none because it carries false authority. NIST's AI RMF is blunt that measurement isn't a one-time gate; it's continuous, because the gap between training data and live data only widens. So budget for the loop, not just the launch.

The second is compliance, and the numbers here are no longer hypothetical. The EU Artificial Intelligence Act entered into force on 1 August 2024. Its prohibitions and AI-literacy provisions applied from 2 February 2025, the obligations for general-purpose AI from 2 August 2025, and the obligations for high-risk systems are due to apply from 2 August 2026. Industrial AI used as a safety component of regulated machinery can fall into the high-risk tier, which brings requirements for risk management, data governance, logging, human oversight, and conformity assessment.

And the penalties have teeth. According to Article 99 of the Act, the fines are tiered: up to €35 million or 7% of worldwide annual turnover for prohibited practices, up to €15 million or 3% for breaching most other obligations, and up to €7.5 million or 1% for supplying incorrect or misleading information. For a plant operator the cost usually isn't the fine itself; it's the documentation and assessment work needed to stay clear of it.

Alongside the law sits a voluntary standard worth naming. ISO/IEC 42001:2023, published in December 2023, is the first certifiable management-system standard for AI. It won't reduce your model's error rate, but it gives you an auditable structure for the governance the AI Act expects, and that structure has a real cost in time and process. Count it in the business case rather than discovering it during an audit.

How the number gets inflated

Most overstated ROI claims fail in one of a few predictable ways, and knowing them is the cheapest insurance you can buy.

Counting gross instead of net. The model caught a failure, but the team would have caught a third of those failures on the old route inspection too. Subtract what the baseline practice would have delivered.

Annualising a short run. One good quarter becomes a four-times-bigger headline number. So hold the claim until the observation window covers a representative cycle.

Ignoring the run cost. The benefit gets reported gross while the sensing, integration, retraining, and governance costs sit in someone else's budget. Net them against the gain or the ROI is fiction.

Crediting AI for a fixed process. Sometimes the real win was the new vibration sensor or the discipline of finally writing maintenance procedures down. The model rode along. That's fine, but call it what it is.

Who owns the number

ROI doesn't measure itself, and the most common failure mode isn't technical. It's that nobody owns the figure. The data team owns model accuracy. Maintenance owns work orders. Production owns throughput. Finance owns the P&L. The AI saving lives in the gaps between them, so it falls to no one and gets reported by everyone, each with a different number.

The teams that prove value tend to do one thing differently: they name a single owner for the ROI calculation before the project starts, and they agree the baseline and the cost-conversion factors with finance up front. That sounds bureaucratic. It's the opposite. It means the result is settled before the model runs, so the post-deployment conversation is about a number both sides already accepted rather than a negotiation conducted after the fact, when everyone has a stake in the answer.

Get finance into the room early for a second reason too. The people who approve capital are the people who decide whether the next AI project gets funded. A maintenance engineer convinced the model works isn't enough. The win has to be legible to the person holding the budget, in their units, against a baseline they signed off on. We've watched good deployments stall not because they failed but because nobody could show, in finance's own terms, that they'd succeeded.

And there's a governance dimension here that overlaps neatly with the AI Act and ISO/IEC 42001 work. Naming an owner, recording a baseline, and documenting the measurement method are exactly the artefacts an auditor will ask for. So the discipline that makes your ROI defensible is largely the same discipline that makes your compliance defensible. Do it once.

A measurement framework you can run

Pulling this together into something a plant team can operate, the framework has four moving parts that map onto the NIST Measure function.

Pick the metric before the model. One primary metric per use case, defined to the decimal, with a recorded baseline and a known cost-conversion factor. If you can't state the baseline, you're not ready to deploy.
Instrument for attribution. Design the rollout so a comparison exists: a holdout line, a staged deployment, or at minimum a clean before-and-after with logged confounders.
Account for the whole cost. Sensing, integration, data engineering, MLOps, OT security to IEC 62443, and governance to the AI Act and ISO/IEC 42001. The model is a fraction of the total.
Monitor continuously. Track model performance and the business metric together, watch for drift, and retrain on a schedule. ROI measured once at go-live is a snapshot of a moving system.

What the data implies for the next two to three years

Three trends in the numbers point the same way.

Adoption is climbing but value is concentrated. According to the U.S. Census Bureau's Business Trends and Outlook Survey, the largest real-time read on the question, the share of firms using AI to produce goods or services rose from about 4.6% at the start of 2024 to roughly 10% by September 2025, with broader any-function measures higher still — figures the Bureau set out in its AI Use at U.S. Businesses analysis and the Federal Reserve tracked in April 2026. Yet McKinsey's roughly 6% of "high performers" capture most of the measured EBIT impact. So the spread between adopters and value-getters is widening, and measurement discipline is much of what separates them.

Costs are shifting from models to plumbing. The model is increasingly a commodity; the baseline data, the integration into ISA-95 layers, the OT security, and the governance are where the money and the differentiation now sit. Plants that already instrumented themselves will deploy faster and prove value more cleanly than plants starting from a blank historian.

Governance is becoming a measured cost rather than an afterthought. With high-risk AI Act obligations due in August 2026 and ISO/IEC 42001 certification spreading, the documentation and assessment work moves onto the balance sheet. The plants that treat measurement and governance as part of the system, rather than as friction bolted on at the end, will be the ones that can state their AI ROI with a straight face.

That instrumented baseline and continuous-measurement layer is exactly what an edge telemetry and analytics platform is for: ruggedized sensing, edge telemetry, and learning models tied to a metric finance recognises. The point isn't the model itself but the proof — the before-and-after that shows what the model did. Start from the measurement question rather than from a model looking for a problem.

The honest summary is short. The trillion-dollar downtime number says the opportunity is genuine. The 39% EBIT number says most operators can't yet prove they're capturing it. The difference between those two figures isn't better algorithms. It's the unglamorous discipline of measuring the plant before you model it, accounting for the whole cost, and attributing the result to the model rather than to luck. Get that right and the return is there to be banked. Skip it and you'll own dashboards instead of returns.

How to Measure ROI on Industrial AI

Why the ROI is hard to see

Measure the plant before you model it

What each value lever is actually worth

Leading and lagging indicators

Building the business case

The benefit side

The cost side

The attribution method

The costs people forget

How the number gets inflated

Who owns the number

A measurement framework you can run

What the data implies for the next two to three years

References

Reuse & license

Disclaimer

Cite this article

Why the ROI is hard to see

Measure the plant before you model it

What each value lever is actually worth

Leading and lagging indicators

Building the business case

The benefit side

The cost side

The attribution method

The costs people forget

How the number gets inflated

Who owns the number

A measurement framework you can run

What the data implies for the next two to three years

References

Reuse & license

Disclaimer

Cite this article

Related articles