There is quite a bit of chatter about predictive maintenance today. Certainly, it seems that many people are in on this conversation and trying to distil the valuable information from the hype.
Fortunately, we can use our listening skills to filter out the parts of the conversation that are meaningful.
Recently, an academic group1 made some efforts in this direction, using ubiquitous Deep Learning methods to classify different fault types in fan bearings equipped with vibration sensors. They made use of an online data set,2 which “seeded” the bearings with faults of different sizes at different locations. Then they collected sensor measurements and made them openly available in MATLAB binary format.
The authors of 1 used 2 to do standard “data science” work in building a machine-learning classifier.
While the data set is primarily useful for classifying the faults, we can do a thought experiment that explores a time-aspect of the data, explicitly asking,
Let’s imagine the fault size growing over time; can we then predict it, given the sensor data?
This is straightforward using BIOVIA Pipeline Pilot, which integrates with the two open-source, data science platforms that are most in use today: Python and R.
In fact, this particular problem reveals one very nice feature of “component-based” tools such as Pipeline Pilot – namely, that we can use both Python and R in the same data pipeline.
For this problem, Python’s .mat file reader is very handy, and the pandas data manipulation and analysis package is excellent, in the opinion of this author, for feature engineering.
On the other hand, R has a very nice Hoteling package, which can help you determine the “T-Health” of a device. R also has the workhorse linear regression (lm()) method, which we used for our own work.
As stated above, we used scipy.io to read in the .mat files (there are a few dozen), and then pandas’ grouping capabilities to calculate simple statistics over the sensor data: mean, min, max, skew and kurtosis. Sensor output at the “drive end” of the shaft was 12,000 readings per second, and we calculated averages over 1 or 2 seconds, depending on the length of the measurement (typically 10 to 20 seconds).
The authors2 found that a frequency-space analysis gave a much higher classification accuracy than a straightforward time-space analysis. However, we did not follow their procedure of FFT and then Principal Component Analysis, although these methods are available in Pipeline Pilot.
The results of this analysis were a pandas Data Frame with Fault Size, Motor Load and the statistics.
When we had the “sample statistics” above, we moved the data from Python to R seamlessly using Pipeline Pilot.
Once in R, we explored various linear and nonlinear models, predicting the fault size versus the motor load and statistics. As expected, the motor load was insignificant, as were any interaction terms involving this variable. Also unsurprisingly, there was a significant correlation between the max (or min) of the vibration and the fault size. More surprisingly, this was nonlinear with a significant curvature term.
Figure 1 below shows the various models: