digitalist.global // February 19 2018
What Machine Learning is Great for
We are all being exposed to a daily barrage of articles and TV shows about artificial intelligence and its promise to revolutionise everything about our existence. Despite the eye catching headlines, the vast majority of corporate executives are still scratching their heads as to what the fuss is all about and what it will mean for their respective companies and industries. The perception of AI varies from being considered too complicated to grapple with to being seen as some sort of magic wand that could automagically fix any issue. The reality is always somewhere in the middle.
In this article, we will focus on Machine Learning (ML), a sub-branch of AI, and the aim is to demystify it in order to explain the nature of problems it can solve and how.
ML focuses on teaching computers how to learn without the need to be programmed for specific tasks. In fact, the key idea behind ML is that it is possible to create algorithms that learn from and make predictions on data.
The key difference between machine learning and traditional programming is depicted above. In traditional programing, a program takes input data, implements some rules (hardcoded by the developer of the solution) to produce an output.
In machine learning (supervised learning), the machine is presented with input data and desired output, and the goal is to learn from those training examples in such a way that meaningful predictions can be made for fresh unseen data.
This seems like a counter-intuitive way of “solving” problems but let’s demonstrate its usefulness through an example. It has to be noted that the aim here is not to go into any details about different machine learning algorithms and training processes involved but to provide an easily digestible example of what is meant by the term “machine learnt” solution.
Let’s assume that you’re a biologist and you want to create a tool for classifying an iris flower into one of the three iris species. What you have at your disposal is a data set that consists of samples of each of the 3 species of Iris (Iris setosa, Iris virginica and Iris versicolor) and for each sample you have the measurement of 4 features: the length and the width of the sepals and petals, in centimetres.
One way to go about solving this problem is to study the data set and manually work out the correlations between feature measurements and flower types. This is a really hard task especially if the data set is large with multiple features, simply due to the fact that human brain is not well suited for processing huge amounts of data and identifying patterns in them.
Instead, we can opt for training a machine learning algorithm to derive rules from available dataset as the following code snippet describes.
We first load the data set that will be used to train our machine learning algorithm (4th line). The iris data set contains feature measurements of a large number of iris flowers plus the exact classification of each sample (iris flower type). Variable X (5th line) contains the feature data (input data) that we will use to train the algorithm. We decided to use two out of the four available features (only petal length and width) for simplicity. Variable y (6th line) holds the target data (output) which is the iris flower type for a given sample. We decided to train a Decision Tree Classifier algorithm as this is a classification task and DecisionTreeClassifier is well suited for our dataset (8th line). We finally fit our algorithm with feature and target data to learn the solution (9th line).
What follows is the visualisation of the Decision Tree learnt (the solution):
As you can see, the above process returned the rules that we can use to classify any new flower sample to one of the 3 iris species. If you have an iris flower with petal length of 2.5 cm and petal width of 2cm then by following the above logical diagram (learnt from data), you will classify your new flower as versicolor.
Machine learning is not a panacea but instead is best suited for problems where either there is no known solution for or insights need to be extracted from very complex datasets. More specifically, machine learning is great for:
- Problems for which existing solutions require a lot of hand-tuning or long lists of rules.
- Complex problems for which there is no good solution at all using a traditional approach or have no known algorithm (i.e.: driving a car).
- Fluctuating environments: a machine learning system can adapt to new data (i.e.: predict stock prices).
- Getting insights about complex problems from large amount of data.
Machine learning won’t figure out what problem to solve. If you aren’t aligned with a human need, you’re just going to build a very powerful system to address a very small or even non-existent problem. As with traditional programming, the first step in the process must be to define the problem and set the objectives in business terms. In doing so, one may realise that the solution or parts of the solution can be framed as machine learning problem(s). If that’s the case, then the most important and time consuming part of the process comes to play: getting the data (list the data needed and how much you need, check legal obligations, convert the data into format you can easily manipulate) and exploring the data (study each attribute and its characteristics, visualise the data, identify extra data that would be useful). It’s only when you get the data right, you can proceed with training algorithms.
Advances in machine learning algorithms (especially in deep learning) have provided us with powerful toolset for solving problems that previously were deemed too hard, if not impossible, to deal with. This opens up great opportunities for all sort of businesses to improve products and processes and achieve significant competitive advantages. The tools by themselves are obsolete without the data that power them. Businesses need to be thinking along the lines of what are the business challenges or improvements they can bring about if they had the right data. It is if and only if (as mathematician tend to say) business challenges are defined and required data are identified, that AI practices can deliver on their highly touted promise.
By Yiannis Maglaras, Principal Solution Architect, Digitalist
To hear more about machine learning and AI, please join us for a Super Power Breakfast: Co-creating with AI in Helsinki March 8! Register and read more here!