Is your organization data literate?

Do you remember the last time you heard this “We have a lot of data but don’t know what to do with it”?

Not quite a while ago, right?

Then, I am sure you heard this as well: “We have a lot of problems that should be solved by using this data”.

These are the signs of a literacy problem. You know there is an important text in your hand, but you cannot make sense out of it.

Today, organizations can experience a similar data literacy problem. By going digital, they have the “data”, the “words” at hand, but the “value”, a “meaningful sentence“ comes only with real “digital transformation”.

So what is the missing piece? Why are we not digitally transformed although we are digital? Going digital is just the necessary first step, not the sufficient one. Here, the enabler that combines multiple “data” in a value-generating way is advanced analytics. Machine learning (ML), a group of statistical tools, is among those analytical methods. I know you heard enough about machine learning and are convinced that it is very popular these days but bear with me for a while for a simple sketch.

Why do we need machine learning in the first place?

Before machine learning, in order to automate our work we used to do explicit coding, i.e. “if ‘X’ happens, do ‘Y’ then ‘Z’, whenever ‘Z’ is problematic do ‘X’ again type of meticulous coding. This way you are heavily dependent on a domain expert. For example, if you want to automate a production process at the shop floor, you need a production engineer to be actively involved in the software coding process. Once coding is eventually done, you get a ‘model’ that can collect inputs and produce output automatically without a manual effort. Such models can be generated in many different ways and Machine Learning is the umbrella term for a group of implicit model generation techniques.

ML is different than other methods by being the art of creating ‘models’ without being explicitly programmed, sometimes even without a need of an expert in the process.

Let me touch upon a couple of machine learning algorithms at a very high level. My goal is to give a sense of how machine learning is powerful and how it can unlock value. In other words, let’s see how they can be some of the building blocks of your organization to become data literate with brief examples.

Three Machine Learning Categories: Supervised Learning, Unsupervised Learning, and Reinforcement Learning.

1) Supervised Learning, can it generate a model to reduce Mean Time to Repair (MTTR)?

This category is where we keep the production engineer still busy, actually, he/she will be our supervisor to teach the ‘machine’ to come up with a ‘model’ that can work in a production environment at the shop floor.

With supervised learning, a predictive model can be developed at the production shop floor and it can predict the occurrence of a machine failure before it actually happens. This way, a maintenance team can be alerted in advance and be on their way to troubleshoot the problem which ultimately reduces the MTTR. To be more precise, such a model is called a classifier where we expect an expert to define possible classes, e.g. possible states of a machine or a production line such as runtime, downtime, failure, idle modes, etc. These are all possible outputs. The relevant input to the model could be data from IoT sensors, e.g. instantaneous line speed, or a pattern of preceding states, history of previous failures, etc. For example, it could be more likely to experience a failure when there are too frequent state changes between a run-time production and downtime. The key message here is that an ML classifier needs a supervisor because it needs to train on many historical examples labeled by the expert. As the model trains, it first makes wrong predictions, then gets feedback from the expert labels and improves the model in an iterative way. Once the ML classifier is well trained, you can start using it to predict an upcoming future activity on the shop floor.

2) Unsupervised Learning, can it help me to better distribute my products to different production lines?

Assigning products to different facilities can be vital if you are doing multi-SKU production. For example, clustering compatible products and producing them in a grouped manner could have huge productivity impact. But how do you define ‘compatibility’ or what is the ideal number of clusters? If these questions are relevant to you, maybe unsupervised clustering algorithms could be helpful. If you are after campaign-like efficient production series, that means you can lose time while switching products. Once duration or effort related historical change-over data is available, without the presence of a supervisor to come and tell “these are the possible classes”, an unsupervised algorithm can find the best number of clusters and assign the products to those clusters for you. That could improve the product-line distribution process.

3) Reinforcement Learning, can it help me with supply chain management?

The interesting one!
In reinforcement learning, the learning agent receives feedback about its actions/decisions. Wait! Didn’t we just say that if there is expert feedback this is called Supervised Learning?
Correct, but, unlike supervised learning, the feedback does not tell the model what to do next, it only tells how rewarding this action is, that’s it. Based on this reward, the reinforcement algorithm figures out what to do itself. Therefore reinforcement learning is a more powerful and generalized tool however it can be computationally very demanding.

How can reinforcement learning help us here? Think of a simple supply chain network including nodes where each node basically orders products, manages inventory, and ships products to the next node. Such a network would suffer from inventory costs and backorders if they are poorly managed. For example, ordering too low would result in high backorder costs for the current node and high inventory costs at the upstream node. Without having a supply chain expert telling us what are the cost dynamics at each node, if we only know the overall supply chain cost, a reinforcement model can learn the cost dynamics at each node based on historical order and inventory data and offer future orders for each intermediate node to maximize the overall supply chain reward, i.e. low total cost. For more, see the interesting study by Larry Snyder and Martin Takac from Lehigh University [1].

Be it a supervised, unsupervised, or reinforcement algorithm, machine learning is the new ABC of data literacy. But it requires well-selection and application of tools to bring real value. Thus, machine-learning-powered literacy also comes with a paradigm shift in team dynamics. The need for a domain expert will be still out there but in order to be competitive, ML-powered tools and data scientists have to be an integral part of a data literate team too.

References:

1. https://beergameai.github.io/

Cookie	Duration	Description
__cfduid	1 month	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
JSESSIONID		Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_gat	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga_4J934618Q9	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.

Cookie	Duration	Description
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Duration	Description
_calendly_session	21 days	SCW's "book a demo" & "book a meeting" features are powered by Calendly LLC and set a simple cookie to help improve the functionality of the tool if you click through our demo/meeting booking calendar.
383aeadb58		This cookie is used by Zoho Forms broadly to use for analytics and reporting purposes.
CONSENT	16 years 8 months 4 days 12 hours	This cookie is set by Youtube and broadly used for analytics and reporting purposes.
e0b837f51c		This cookie is used by Zoho broadly to use for analytics and reporting purposes.