Is your organization data literate?
by Murat Saglam
Do you remember the last time you heard this “We have a lot of data but don’t know what to do with it”?
Not quite a while ago, right?
Then, I am sure you heard this as well: “We have a lot of problems that should be solved by using this data”.
These are the signs of a literacy problem. You know there is an important text in your hand, but you cannot make sense out of it.
Today, organizations can experience a similar data literacy problem. By going digital, they have the “data”, the “words” at hand, but the “value”, a “meaningful sentence“ comes only with real “digital transformation”.
So what is the missing piece? Why are we not digitally transformed although we are digital? Going digital is just the necessary first step, not the sufficient one. Here, the enabler that combines multiple “data” in a value generating way is advanced analytics. Machine learning (ML), a group of statistical tools, are among those analytical methods. I know you heard enough about machine learning and convinced that it is very popular these days but bear with me for a while for a simple sketch.
Why do we need machine learning in the first place?
Before machine learning, in order to automate our work we used to do explicit coding, i.e. “if ‘X’ happens, do ‘Y’ then ‘Z’, whenever ‘Z’ is problematic do ‘X’ again type of meticulous coding. This way you are heavily dependent on a domain expert. For example, if you want to automate a production process at the shop-floor, you need a production engineer to be actively involved in the software coding process. Once coding is eventually done, you get a ‘model’ that can collect inputs and produce output automatically without a manual effort. Such models can be generated in many different ways and Machine Learning is the umbrella term for a group of implicit model generation techniques.
ML is different than other methods by being the art of creating ‘models’ without being explicitly programmed, sometimes even without a need of an expert in the process.
Let me touch upon a couple of machine learning algorithms at a very high level. My goal is to give a sense of how machine learning is powerful and how it can unlock value. In other words, let’s see how they can be some of the building blocks of your organization to become data literate with brief examples.
Three Machine Learning Categories: Supervised Learning, Unsupervised Learning, and Reinforcement Learning.
1) Supervised Learning, can it generate a model to reduce Mean Time to Repair (MTTR)?
This category is where we keep the production engineer still busy, actually, he/she will be our supervisor to teach the ‘machine’ to come up with a ‘model’ that can work in a production environment at the shop floor.
With supervised learning, a predictive model can be developed at the production shop floor and it can predict the occurrence of a machine failure before it actually happens. This way, a maintenance team can be alerted in advance and be on their way to troubleshoot the problem which ultimately reduces the MTTR. To be more precise, such a model is called a classifier where we expect an expert to define possible classes, e.g. possible states of a machine or a production line such as runtime, downtime, failure, idle modes, etc. These are all possible outputs. The relevant input to the model could be data from IoT sensors, e.g. instantaneous line speed, or a pattern of preceding states, history of previous failures, etc. For example, it could be more likely to experience a failure when there are too frequent state changes between a run-time production and a downtime. The key message here is that an ML classifier needs a supervisor because it needs to train on many historical examples labeled by the expert. As the model trains, it first makes wrong predictions, then gets feedback from the expert labels and improve the model in an iterative way. Once the ML classifier is well trained, you can start using it to predict an upcoming future activity at the shop-floor.
2) Unsupervised Learning, can it help me to better distribute my products to different production lines?
Assigning products to different facilities can be vital if you are doing multi-SKU production. For example, clustering compatible products and producing them in a grouped manner could have huge productivity impact. But how do you define ‘compatibility’ or what is the ideal number of clusters? If these questions are relevant to you, maybe unsupervised clustering algorithms could be helpful. If you are after campaign-like efficient production series, that means you can lose time while switching products. Once duration or effort related historical change-over data is available, without the presence of a supervisor to come and tell “these are the possible classes”, an unsupervised algorithm can find the best number of clusters and assign the products to those clusters for you. That could improve the product-line distribution process.
3) Reinforcement Learning, can it help me with the supply chain management?
The interesting one!
In reinforcement learning, the learning-agent receives feedback about its actions/decisions. Wait! Didn’t we just say that if there is an expert feedback this is called Supervised Learning?
Correct, but, unlike supervised learning, the feedback does not tell the model what to do next, it only tells how rewarding this action is, that’s it. Based on this reward, the reinforcement algorithm figures out what to do itself. Therefore reinforcement learning is a more powerful and generalized tool however it can be computationally very demanding.
How can reinforcement learning help us here? Think of a simple supply chain network including nodes where each node basically orders products, manages inventory, and ships product to the next node. Such a network would suffer from inventory costs and backorders if they are poorly managed. For example, ordering too low would result in high backorder costs for the current node and high inventory costs at the upstream node. Without having a supply chain expert telling us what are the cost dynamics at each node, if we only know the overall supply chain cost, a reinforcement model can learn the cost dynamics at each node based on historical order and inventory data and offer future orders for each intermediate node to maximize the overall supply chain reward, i.e. low total cost. For more, see the interesting study by Larry Snyder and Martin Takac from Lehigh University .
Be it a supervised, unsupervised or a reinforcement algorithm, machine learning is the new ABC of data literacy. But it requires well-selection and application of tools to bring real value. Thus, machine-learning-powered literacy also comes with a paradigm shift in team dynamics. The need for a domain expert will be still out there but in order to be competitive, ML-powered tools and data scientists have to be an integral part of a data literate team too.