Vibration Analysis Meets Machine Learning, End to End

Done right, a vibration-plus-machine-learning setup gives you three things off one accelerometer: a named fault (outer race, inner race, rolling element, cage, imbalance, misalignment, looseness), a severity grade against a known scale, and a lead time measured in days to weeks before the bearing actually lets go. That's the output. The rest of this is how the signal gets from a bolt on a housing to that call, and where the pipeline quietly breaks if you're not watching.

What the pipeline actually produces

A working system does two jobs. It classifies the fault, so the planner knows whether to order a bearing or recheck alignment, and it estimates how long you've got, so the work goes into the next shutdown instead of an unplanned one. The estimate is the harder half. A localized bearing defect can announce itself in the high-frequency band long before it shows up in overall velocity or heat, which is exactly the window you want to trade reactive work for planned work.

So what does the call look like on a screen? Something like: outer-race defect, severity in the upper acceptable band and climbing, recommend replacement inside the next two to three weeks, with the envelope spectrum attached as evidence. That's the whole point of bolting machine learning onto vibration data. Not a prettier dashboard, but a fault name, a grade, and a date a planner can schedule against.

None of this replaces the standards engineers already trust. ISO 17359 still defines the procedure for setting up a condition-monitoring programme and pointing it at real failure modes, and ISO 20816 still defines how you grade broadband vibration severity (ISO 17359:2018). Machine learning sits on top of that frame. It doesn't get to ignore it.

Step 1: The sensor, and where you bolt it

Almost every bearing-fault pipeline starts with an accelerometer on or near the bearing housing. Placement and mounting decide your ceiling before any model exists. A magnet or a glued pad rolls off the high-frequency response; a stud into a flat, clean spot keeps it. Get this wrong and the resonances that carry the earliest defect energy never reach the digitizer, so no amount of clever modeling recovers them.

Sampling rate is the other hard limit. You have to clear Nyquist for the highest frequency you care about, and for bearings that's not shaft speed, it's the structural resonances the defect impacts excite, which live up in the kilohertz range. And there's a trap inside that limit: any energy above half your sample rate folds back down as aliases that look exactly like real, lower-frequency lines. So an analog anti-aliasing filter ahead of the digitizer isn't optional, and the sample rate has to leave headroom above the filter's corner. Skip the filter to save a part and you'll spend the next month chasing a fault frequency that doesn't exist. The two reference rigs everyone benchmarks against make the point. The Case Western Reserve University bearing data center ran a 2 hp Reliance Electric motor from 1797 down to 1720 rpm under 0 to 3 hp of load, seeded single-point faults from 0.007 in to 0.040 in by electro-discharge machining on the inner race, rolling element, and outer race, and logged drive-end acceleration at both 12 kHz and 48 kHz (CWRU Bearing Data Center). The run-to-failure set in NASA's Prognostics Center of Excellence repository, supplied by the Center for Intelligent Maintenance Systems at the University of Cincinnati, spun four bearings at 2000 rpm under a 6000 lb radial load and sampled at 20 kHz, capturing one-second snapshots on a fixed interval until the bearings failed (NASA PCoE Data Set Repository). Note the speeds: even a slow shaft needs fast sampling, because the diagnostic content is an order of magnitude above the running frequency.

So before anyone talks models, fix three things on the wall: a stud-mounted accelerometer at a real load-bearing location, a sample rate that clears the resonance band with margin, and a tachometer or speed reference. That last one matters more than it looks, and Step 3 explains why.

Step 2: From raw acceleration to features

Raw acceleration is mostly noise to a human and, untreated, to a model too. The signal-processing layer turns it into descriptors. ISO 13374 names this structure cleanly: a Data Acquisition block converts the transducer output to a digital parameter, and a Data Manipulation block runs the signal analysis and computes the meaningful descriptors that everything downstream reads (ISO 13374-1:2003). Keep that separation. It's what lets you swap a model later without re-plumbing the whole stack.

Three feature families do most of the work. Time-domain statistics come first: RMS, peak, crest factor, kurtosis. Kurtosis is sensitive to the impulsiveness a fresh spall adds, and it's cheap enough to run at the edge on every record. The spectrum is next: an FFT of the raw signal shows shaft orders, blade pass, gear mesh, and electrical lines. It's useful for imbalance, misalignment, and looseness, but weaker for early bearing pits. And then there's the envelope spectrum, the one that earns its keep for bearings. You band-pass the signal around a structural resonance, take the amplitude envelope, then FFT that envelope. Each pass of a rolling element over a defect is a tiny impact that rings the resonance, so the envelope's repetition rate exposes the defect even when its energy is buried in the broadband spectrum.

Which family catches which fault isn't a matter of taste. It follows the physics of how each fault moves the machine:

Fault	Dominant signature	Feature that catches it
Imbalance	Strong 1x shaft-speed line, radial	Raw spectrum
Misalignment	2x line, axial component	Raw spectrum
Mechanical looseness	Many shaft-speed harmonics	Raw spectrum
Early bearing defect	High-frequency impacts at defect frequency	Envelope spectrum, kurtosis
Advanced bearing wear	Raised broadband floor	RMS velocity, ISO 20816 zone
Gear tooth fault	Mesh frequency with sidebands	Envelope and cepstrum

The envelope rates aren't arbitrary. Ball pass frequency outer race, ball pass frequency inner race, ball spin frequency, and fundamental train frequency are deterministic functions of bearing geometry and shaft speed. Compute them from the bearing's number of rollers, contact angle, and pitch and ball diameters, then look for energy at those exact lines and their harmonics and sidebands. A peak at the outer-race frequency points at the outer race. A peak at the inner-race frequency, modulated at shaft speed, points at the inner race. That physics is why these features beat raw waveforms for diagnosis: they encode what's actually rotating.

Step 3: Baselines and alarms before any model learns

This is the step rollouts skip and regret. Before a classifier sees a single example, you need a baseline of healthy behavior and an alarm scheme grounded in a standard. ISO 20816-1 sets out how to measure and evaluate broadband machine vibration and grades it into severity zones, from newly commissioned through acceptable for long-term running, then not acceptable for continuous operation, then high enough to cause damage (ISO 20816-1:2016). Those zones give you a defensible first alarm with zero training data, and a sanity check the model has to agree with. And that sanity check is worth more than it sounds, because a standards-based alarm is something an auditor and an OEM both already accept.

Speed is the reason the tachometer mattered in Step 1. Bearing fault frequencies scale with rpm, so on a variable-speed drive the diagnostic lines move. Without a speed reference you either restrict analysis to steady-state windows or resample the signal into the angular domain (order tracking) so the lines stand still. Skip that on a VFD-driven machine and your envelope spectrum smears every fault into mush.

Set the baseline per machine, per operating point, not per fleet. Two nominally identical pumps on different foundations, suctions, and duty cycles have different healthy signatures. ISO 17359's whole approach is to direct monitoring at the credible failure modes for that asset and set alarm criteria against its own normal, not a generic table (ISO 17359:2018). Get the baseline right and the model's job shrinks to telling apart faults that all sit above alarm. Get it wrong and you've built a very expensive way to generate false alerts.

Step 4: The model

Now the learning layer. Two questions, two model shapes. Classification answers "what's wrong" by mapping a feature vector or a windowed signal to a fault label. Remaining-useful-life estimation answers "how long" by regressing time-to-failure from a degradation trend, which is why run-to-failure data like the IMS set matters: you can't learn a life curve from snapshots of healthy and broken with nothing in between.

So why is the life estimate so much harder than the fault label? Because degradation rarely climbs in a clean line. A bearing can sit in a stable defected state for weeks, then accelerate once the spall starts spalling its own debris. A useful RUL model tracks a health indicator (rising RMS, rising kurtosis, growing envelope-peak energy) against a failure threshold and projects the trend, but it has to express uncertainty, because the projection is only as good as the assumption that today's rate of change holds. An honest system reports a window, not a single day, and widens that window when the trend is noisy.

The field has moved through three phases, and Yaguo Lei and colleagues mapped them in their 2020 review and roadmap in Mechanical Systems and Signal Processing: classical machine learning on hand-engineered features, then deep learning that learns features from the signal directly, and a near-term push toward transfer learning so a model trained on one machine or condition can carry over to a related one (Lei et al., 2020). That arc isn't academic trivia. It tells you where to spend. On a single well-instrumented critical asset with good physics features, a classical model (support-vector machine, random forest, gradient boosting) on envelope and time-domain features is often enough, cheaper to run, and far easier to explain to a reliability engineer. Deep models start to pay when you have many machines, raw high-rate data, and faults that resist hand-engineered features. The same review flags the standing obstacles: not enough labeled fault data, class imbalance (healthy hours vastly outnumber fault hours), and models that don't transfer across machines or operating conditions.

Whatever the model, train on real features with a clear provenance and keep the Data Manipulation outputs in front of it. The public datasets are for prototyping and method comparison; the UCI Machine Learning Repository and similar archives are fine for that (UCI Machine Learning Repository). Your deployed model has to learn from your machines.

Where most rollouts go wrong

Here's the failure that gets buried in a slide deck reading 99% accuracy. That number is almost always measured wrong, and it doesn't survive contact with your plant.

A 2025 study by João Paulo Vieira and colleagues took a hard look at how bearing-fault models are evaluated and found the headline accuracies inflated by methodology, not skill. Sequential vibration data from a degrading bearing contains highly similar, near-duplicate segments; when you split that data randomly into train and test, near-twins land on both sides, the model memorizes instead of generalizing, and reported accuracy soars. Under evaluation protocols that block this leakage, the numbers drop sharply, revealing that benchmark results overstate real diagnostic reliability (Vieira et al., 2025). So treat any single accuracy figure with suspicion until you know how the data was split. We see the same thing in practice: a model that aced a vendor's benchmark, then folded the first time the plant ran a load case it hadn't seen.

Two rules keep you honest. First, split by time or by operating run, never randomly across a degradation sequence, so the test set is genuinely unseen. Second, assume domain shift is the default, not the exception. A model trained at one speed, load, and mounting will degrade on another machine, which is the precise problem transfer learning is meant to address and the reason a per-asset baseline from Step 3 is your backstop when the model is uncertain. Trust the standards-based alarm when the classifier disagrees with the velocity zone; that disagreement is usually the model failing, not physics.

Step 5: Deployment, advisories, and the loop

The output has to land as an action, not a dashboard nobody reads. ISO 13374 again gives the shape: State Detection compares new data to the baseline and decides which abnormality zone it falls in, Health Assessment rates current condition and diagnoses the fault, Prognostic Assessment projects remaining life, and Advisory Generation turns all of that into a recommended action with the supporting evidence (ISO 13374-1:2003). An advisory that says "outer-race defect, severity rising, recommend replacement within two to three weeks, here's the envelope spectrum" is something a planner can act on. A raw anomaly score isn't.

Architecturally, push the cheap, high-rate work to the edge and keep the expensive, slow-changing work central. Time-domain statistics, envelope features, and a first-pass classifier run fine on a gateway next to the machine, which cuts bandwidth and keeps a fast local alarm alive if the link drops. Heavier model training, fleet baselining, and life-curve fitting belong on the platform. That division is the backbone of how we build an edge telemetry and analytics platform: features and a light model at the edge, retraining and prognostics centrally, advisories back to the people who turn the wrenches.

Then close the loop. Every confirmed fault and every teardown is a label. Feed the outcome (what the bearing actually showed when it came out) back into the training set so the model improves on your machines and your failure modes, not a 2 hp lab motor's. A vibration-and-ML programme isn't a model you install once. It's a baseline, a standard, a feature pipeline, and a learning loop that gets sharper every time a bearing comes out on the bench and tells you whether the call was right.

Vibration Analysis Meets Machine Learning, End to End

What the pipeline actually produces

Step 1: The sensor, and where you bolt it

Step 2: From raw acceleration to features

Step 3: Baselines and alarms before any model learns

Step 4: The model

Where most rollouts go wrong

Step 5: Deployment, advisories, and the loop

References

Reuse & license

Disclaimer

Cite this article

What the pipeline actually produces

Step 1: The sensor, and where you bolt it

Step 2: From raw acceleration to features

Step 3: Baselines and alarms before any model learns

Step 4: The model

Where most rollouts go wrong

Step 5: Deployment, advisories, and the loop

References

Reuse & license

Disclaimer

Cite this article

Related articles