We understood how to predict an anomaly in a specific measurement parameter such as temperature in the last article by using control charting technique.
Another important technique to predict an outcome is the regression analysis. A regression analysis is used to predict the value of a dependent variable (say cutting tool life) on an independent variable on which the dependent variable depends (say the cutting speed).
Now, before you do a regression analysis you must have some idea or suspicion about the dependency of the dependent variable on the independent variable. For example, machinists may suspect based on their experience that higher cutting speeds may lead to lowering of tool life. Therefore, the first objective of doing a regression analysis is to confirm if this could be true.
Creating the scatter diagram
Here is a table of readings based on an experiment on life (in minutes) of a cutting tool with three different cutting speeds (meters/min). Life was measured for four tools for each level of speed to ensure we have a reasonable sample size at each speed.
To find some meaning out of this data it is important to plot the tool life (Y) against the cutting speed values (X). Note that Y is the dependent variable and X the independent variable that Y is suspected to be dependent upon. The plot is called a scatter diagram and is shown as below.
It is quite evident from the plot that as speed increases the life in minutes reduces from a high of 30’s to 40’s to 5’s to 15’s. So, the machinist’s suspicion appears to be correct. Thus, the first objective of confirming the suspicion through this analysis has been met.
Now, the second objective of doing regression analysis is to be able to predict the dependent variable Y – in this case tool life - by measuring the independent variable X, which is the cutting speed. This exercise is necessary since it is not possible to plot all the potential points in a graph like this for all the real-life situations.
To do this you will need to ‘fit’ a regression line that best suits the scatter plot. You can do this using Excel by plotting the trend line as in the chart below. (Other statistical software tools can also be used. If you love statistics you can also do this manually by using the Least Squares method.)
Excel also allows you to find the equation of the line as represented by the formula y = -5.6594x + 194.38
This shows a linear relationship between tool life and cutting speed and the negative slope indicates that as speed increases life decreases. The value of Y is 194.38 minutes if X is zero. This essentially means that if the tool were to stay static but in touch with the component that it was to cut it would last these many minutes!
Based on this equation you can now predict the life of the tool for say a cutting speed of 35 meters/min. An important point to note here is that this prediction is not a certainty. Additional statistical analysis however can show with what probability percentage (say 95% or 99%) the life will lie between two values – but let’s not complicate this for now!
Let’s instead look for a few more example use cases. For example, you may want to find the effect of temperature on viscosity of oil and predict the viscosity. Or tire life based on distance travelled. Or level of contaminants in lube oil on wear of bearings. And predict potential failure based on the regression model you come up with.
How can this help you with your IoT solution? As an example, you can capture the independent variable data (X – e.g. distance travelled) real-time. You can then use the analytics engine to predict the value of the dependent variable (Y – e.g. tire life). You can then set your rules engine to send an alert if the analytics engine predicts with say 95% probability that the tire will fail within the next 100 KM.
In the above examples, we have considered only one independent variable to be impacting the response variable. However, in many cases there could be multiple independent variables influencing the outcome of the response variable. For instance, tire life can be dependent upon distance traveled and also average roughness or bumpiness of the road traveled. In such a case the multiple linear regression is used.
Additionally, not all regression analysis is linear. In such cases non-linear curve fits and equations have to be arrived at.
And finally, you need to be cautious about deciding that the independent variable is indeed the cause of the value of the dependent or response variable. Here you need to use experience and judgment. A good example is that generally weight of people has a linear relationship with their height. But increasing the weight of a person does not increase his height!
If you have read my previous article you may be able to answer the following: why do you think that there is more than one reading (actually 4 for each of the 4 tools used) for the cutting tool life for each level of cutting tool speed? What is the statistical distribution these points are likely to follow? Write your answers in the comment box!
To learn more about such topics click on the button below and fill a small inquiry form and we shall get in touch.
Designed by W3Squad