Tuesday 6 February 2018

Artificial Intelligence & Explainability

Explainability has become a very hot issue within the very hot domain of Artificial Intelligence (AI).  Here is my perspective on the issue.  As usual, my views are based on a systems engineering (and real world application) approach ... oh, and these are my own personal views and don't represent those of my employer.

Most of the current excitement (I won't use the word hype) around AI stems from the publicity surrounding Machine Learning and, in particular, Deep Learning.  Deep Learning algorithms are not "explainable" and so ... with everyone assuming that AI is achieved by simply throwing data at Deep Learning algorithms ... it is no surprise that there is a lot of concern about explainability.

In reality, we need to think about systems and not algorithms (I say that quite a lot).

Consider, for example, a driverless car comprising multiple AI sub systems.  There may be a vision system that identifies and classifies vehicles.  There may be a laser range finder that calculates the distance to objects around the vehicle.  There may be a GPS system that identifies the location of the vehicle and this system will be supported by mapping data.  There may be a noise classification system that identifies the sound of sirens.  I would expect all this data to be fused together and fed into a decision engine and I would expect the output of the decision engine to be explainable.  I would expect the system to state, "I made the decision to pull over to the side of the road because the noise classifier detected a siren approaching from behind, the vision system identified an emergency vehicle approaching from behind, the road was straight and it was a safe place to stop without causing an obstruction".  Within this explanation there are "classifications" that may have come from an AI sub system that cannot fully explain it's reasoning.  For example, the vision system may not be able to explain how it identified an emergency vehicle approaching.  Whilst we may feel that is unacceptable, it is important to accept that there are many situations when we as human beings cannot explain own own image or sound recognition.  However, even when the classifier cannot explain its reasoning, it should be possible to design a Machine Learning system so that it retrieves the training data that influenced its classification decisions.  The vision system may not be able to explain its classification but it should be able to say, "here are the closest matching images in my training data".

Finally, remember that there are many different Machine Learning algorithms.  Rule induction system such as C4.5 generate human readable rules.  As such, these algorithms are able to explain their reasoning.

In summary, whilst certain algorithms cannot explain their decision making, overall systems should be engineered to be explainable and machine learning systems should be able to support their decisions with training evidence.

Wednesday 28 June 2017

How Close Are We To The Singularity?

Having spent most of my career delivering Artificial Intelligence (AI) application ... that solve real world problems ... I think it's fair to say that I'm passionate about AI!

In response to a discussion in the Twittersphere, I thought I would take a few minutes to ponder the question above.  Before I start, I should point out that the views I express here are my own personal views and do not in any way reflect the corporate position of IBM.

Let's start by reviewing what the great and the good thought!  Alan Turing predicted in 1950 that machines would answer questions in a way that was indistinguishable from a human expert within 50 years (i.e. by the year 2000).  In 1967 Minsky said, "Within a generation ... the problem of creating artificial intelligence will be substantially solved".  I think we would all agree that both predictions have been shown to be extremely optimistic.

At the 2012 Singularity Summit the predicted date for the Singularity was thought to be around 2040.  That is 23 years from now ... I have been working in AI for over 25 years so I have a reasonable grasp on how much progress is achievable in the predicted timeframe.

The very first AI conference I attended was the IEE Second International Conference on Artificial Neural Networks in November 1991.  Having just scanned through the proceedings it is clear to me that we have come a very long way in the core sensory and classification tasks.  In areas such as image recognition and classification there has been huge progress.

Sadly, machines still struggle with tasks that require higher levels of reasoning.  The type of tasks that I am referring to seem incredibly simple for humans but are extremely challenging for machines.  Two basic examples are pronoun resolution in documents and word identification in speech recognition.  In both these tasks, human beings draw on a vast knowledge base and other clues in order to correctly identify meaning.  We resolve pronouns because we know obscure facts or can inference about biographical information.  We identify words being spoken in a noisy environment by understanding the context of the conversation, the facial expressions of the speaker and many other hidden clues.

As a simple exercise, try reading a childrens' story and think about how you understand even the most basic of sentences.  You will find that your brain is leveraging a vast knowledge base ... the fact that a tortoise walks slowly or that cats like milk ... and picking up clues from the accompanying pictures ... such as the surprised look on the face of a butterfly.  Your brain is able to fuse the text in the sentence with the clues in the pictures and use your vast knowledge base to understand the story.  I haven't yet seen an AI system that can come close to that capability.  I have my own plans to invent one but that's another story!

Whilst I think we are making incredible progress in AI, my personal view is that there is a very long way to go.  We are still working on the core sensory skills (e.g. image recognition) and are a long way from the higher level reasoning skills required to enable the Singularity.

It's important to remember though that we can't predict the future through simple extrapolation.  Technological breakthroughs tend to result in step changes rather than gradual evolutions.

For the Singularity to happen, AI needs to deliver capabilities that can fuse input from multiple sources including a vast and deep knowledge base.  For me, one of the first indicators that we are making progress in this area will be when machines are able to leverage knowledge and common sense to perform basic resolution tasks in areas such as text analytics and speech recognition.

Putting aside my earlier caveat about predicting the future, I have to say that I believe 2040 is very optimistic!

Friday 2 December 2016

Machine Learning & Representative Training Data

In this blog, I’d like to touch upon the subject of “Representative Training Data’ … as usual, I must issue the standard health warning that the views expressed are my own and not those of my employer.

With so much interest in Machine Learning, I am sure many of you will have heard about the importance of ‘Representative Training Data’.  The term is pretty self explanatory in that if you want to train a machine learning system you obviously need training data … and it would be a good idea if that data looked like the data that was going to be encountered in the real world.

Therefore, the principle is simple … or is it?

In real world applications there are a few additional points you may wish to consider.

Firstly, it’s important to understand how the machine learning algorithm will use the data.  Most algorithms aim to achieve the optimum performance across the entire data set.  Consider a fraud detection system that looks at online spending profiles and decides whether or not each profile is fraudulent.  What happens if only 1 in 100 profiles are fraudulent?  If the learning algorithm minimises the error across the entire data set, the system can achieve a 99% accuracy simply by declaring that every profile is “NOT FRAUDULENT”.  If the data has to be representative of the real world observations then surely it would be wrong to bias the training set by adding in more examples of fraudulent profiles?  In other words, the theory says that the training data must be representative but in practice this doesn’t really work.

There are really only two ways of dealing with this problem.  The first is to artificially bias the training data and the second is to use a machine learning algorithm that incorporates some form of cost function and therefore places a higher emphasis on accurately detecting the fraudulent cases.

My personal preference would always be to look at the cost function as biasing training data can often descend into a tail chase.  You find that increasing the number of data points in one class degrades the performance against another class … so you increase the number of data points in the other class … and so on and so on.


Secondly, it’s very important to ensure that your training data remains representative once the system has gone live.  It is not unusual for the data to change whilst the system is in production.  For example, I once delivered an entity extraction system that was designed to work on English language documents only.  The Client loaded in a document set that contained a number of foreign language documents and suddenly the system was generating all sorts of strange entities.  Fortunately, we had designed the system to monitor various statistics about both the input and the output … when erroneous entities started being generated it blew the statistics and alerted us to the problem.  It is critical that any system monitors its source content to ensure it remains consistent with the representative data used in the training.

Tuesday 29 September 2015

Machine Learning : Training & Test Sets



In my last blog, I touched upon the importance of defining high quality training and test data when deploying a Machine Learning solution.

In this blog, I promised to dig deeper into this subject.  Before I start, I must point out that the views opinions expressed here are entirely my own and do not necessarily represent IBM’s positions, strategies or opinions.


Getting the data right is crucial in any machine learning solution and today I’d like to communicate three key messages:

  1. Understand the difference between academic testing and real world testing.
  2. Ensure you have representative data.
  3. Look at the data … take some time to read through it … you’d be surprised how revealing it can be.
So what do I mean by understand the difference between academic testing and real world testing?

In the academic world, researchers want to measure how effective different algorithms are.  This is normally done using a training set, a test set and a blind test set.  Basically we take a set of ground truth data comprising example input data together with the outputs we would expect the solution to generate.  We split that data into three and then use a third of the data to train the machine learning.  Machine learning algorithms can normally be tuned in some way so we use the second set of data as a Test Set.  This allows us to adjust the various training parameters, test and re-train to ensure that we have the optimum configuration.  Finally, once we’re happy that we have the optimum solution, the Blind Test set is used to formally evaluate the solution's performance.  The Blind Test data has been kept completely isolated from the other data sets so there is no chance of contamination in the process.

This method is ideal for an academic evaluation, however in practical applications of Machine Learning there are other considerations.  Imagine you are developing a Question Answering solution for a bank.  What is most important to you?  Deploying the most effective machine learning solution or deploying a solution that will always generate the correct answers?  The two are not necessarily the same thing.  Often we start projects with very little real world data and, by splitting that data into thirds, we immediately reduce the amount and quality of training data available.  Alternatively, if we simply use all the available data as training data, then we have no way of testing the system so that we know how it will behave against previously unseen data.  The counter-counter-argument is that if our available data is really that limited then even breaking out a blind test set still does not give us confidence that the tool will work against previously unseen data.

Unless we are working in a perfect environment where we know there is a huge set of statistically significant representative data, I prefer a boot strapping approach.  I like to build systems where the customer knows and understand that the system will work for data that is in the training set.  If we encounter previously unseen data, then we add it to the training data and continue.  In practical terms this means adopting a process along the following lines (for a QA system):

  1. Collect as many questions as possible … ideally from a live system.
  2. Train the machine learning solution using all data.
  3. Test the solution automatically using the same data and ensure it is generating the answers you expect.  Note don’t assume that it will have learned to answer all questions correctly as very few machine learning technologies do.
  4. Test with actual Users – real Users tend to misspell terms or enter different variants of questions.
  5. Identify any questions, i.e. previously unseen questions, that did not exist in your ground truth and add them to your ground truth.
  6. Re-train the solution and keep iterating until you are satisfied with the accuracy of the solution.
  7. Deploy to production and keep monitoring to ensure you pick up any previously unseen questions.

A key element of the process I outlined above is ensuring you have representative data.  This is vitally important in any machine learning application.  If the system has not been trained with representative data you cannot expect it to perform well.  Gathering representative data is often challenging; how do you collect data for an operational solution before deploying that operational solution?  There are approaches you could consider.  My preferred approach is to start small with a set of data developed internally.  Note that data will not be representative as you have developed it yourself.  However, you can use that data to build a prototype that you then test with your employees and business partners.  They will enter more realistic data, but still not fully representative, that will allow you to improve your prototype before field testing with end Users.  At that stage you will need to position the technology appropriately and ensure the Users understand that they are part of the development process.  Finally, once you are satisfied you can deploy to a production environment ... but keep monitoring!

When working with your training data, it’s really important to take the time to look at the actual data.  I personally like to read through data in its most raw form.  Often you will get summary reports saying that a solution is only 70% accurate or that certain groups of Users are unhappy.  Look at the data!  See exactly what is being asked of the solution and that will help you to understand the real issues.  You should expect to see ambiguous questions and inputs that would be difficult for a human being to interpret.  That doesn’t mean that you should accept the inaccuracy in the system – just that you may need to work on the User Interface or the processes for handling ambiguity or some other aspect of the solution.  You can only make wise decisions if you really understand the data so don’t be seduced by summary performance reports.

In my next blog I will talk more about representative training data and how that data is actually used by machine learning algorithms.

Friday 28 August 2015

Applying Machine Learning To Real World Problems

As the Chief Architect for IBM Watson Tooling, I am passionate about applying Machine Learning to real world business problems.  Inventing, implementing and applying Machine Learning solutions has been my (professional) life for over 20 years.

There is a huge amount of excitement around Artificial Intelligence (AI) right now and understandably Machine Learning is getting a lot of attention.   The last thing anyone wants is for failed projects to trigger a new AI Winter so it's really important that we get this right.  I therefore thought it would be helpful to publish a series of articles on the practical application of Machine Learning.

This blog is pitched at a non technical audience, however I'm more than happy to spin off deep techie conversations if required.  As a Master Inventor and senior technical leader in IBM, I must stress that this is my personal blog and that the views and opinions expressed are entirely my own.  They do not necessarily represent IBM's positions, strategies or opinions.

Machine Learning is all about computers learning from real world experience.  For example, if you want to build a system that answers questions on your corporate web site, you start by giving the computer a set of example questions and the answers you would like in response.  The Machine Learning system takes this training data and learns how to answer Questions.  Similarly, if you want to train a system to recognise people by the sound of their voices you collect examples of their voice recordings and label each recording to build a training set.

At a high level, this is an attractive message as it leaves the audience with the impression that everything is easy.  Grab some data, run it through a Machine Learning algorithm and you have a run-time system that you can apply easily to your live, operational data.

The devil though is in the detail and there are some important points that are worth noting:
  1. Machine Learning can only be effective when provided with good quality training data that is representative of the live operational data.  It's not a short cut!  Ensuring you have the right data is an important and time consuming aspect of the project.  I estimate that 80-90% of the time I spend on analytics projects is spent getting the data right.
  2. Machine Learning systems come in many different forms.  There are neural networks, probabilistic classifiers, Markov models, fuzzy networks and rules based systems.  All of these different AI algorithms can be trained using Machine LearningMachine Learning simply describes how the algorithm is trained.  For example, I have a long history with rules based systems in defence applications.  Defence customers liked rules based systems as they could understand why a system took a certain action in response to an input.  However, I often hear rules based systems criticised because the rules are hard to define.  In my Defence systems, the rules were derived using Machine Learning.
  3. be realistic about what you expect of a Machine Learning system.  Quite often there is a belief that if you throw a Machine Learning, or any AI, tool at a large pot of data it will discover something important.  For example, I have worked with law enforcement agencies where we were analysing huge amounts of data as part of criminal investigations.  Sometimes, the answer just didn't exist in the data!  In a Question Answering system, sometimes the Questions are so ambiguous that a human expert would struggle to understand.  We shouldn't expect AI systems to solve unsolvable problems.
  4. think hard about how you assess the accuracy of your system.  As with training data, it's important that any test data is representative of the live, operational data.  However, remember that these are statistical systems that can be skewed by the data.  Consider a Counter Fraud solution where the system has to identify cases of fraud.  If 75% of the test data cases are not fraudulent, then I can achieve an accuracy of 75% just by never declaring a case to be fraudulent.  Conversely, if I declare all cases to be fraudulent I achieve a perfect performance in predicting fraud but generate a huge number of false alarms in the process.  There's a whole raft of work on the science of measuring accuracy that I will discuss in a later blog, however the key point to understand is that the test data can be skewed to alter the performance metrics.
  5. also remember to factor in the cost impact of any decisions.  Consider a Customer Relationship Management (CRM) solution.  You may develop a Machine Learning system to predict the likelihood of a customer leaving and going to an alternative supplier.  A system may be 100% accurate in predicting that some customers are going to stay whilst failing to predict that other more valuable customers are going to leave.  In tuning the system it's important to consider the cost impact of a wrong decision.
  6. don't assume that Machine Learning will easily out perform other analysis techniques.  One of my earliest experiences was in applying Machine Learning in Formula 1 where many of the applications were in control systems.  Control theory is a huge discipline with massive amounts of research and a whole academic and professional discipline behind it.  Many of the systems we were looking at had been thoroughly analysed and modeled using control theory.  These existing engineering approaches generally performed better.  However, within control theory there were (are) specific problems that may benefit from a Machine Learning approach.  Engineers working in control systems may use Machine Learning as one of the tools in their toolkit.
Understanding these basic concepts will help you to make good decisions in exploiting Machine Learning technology.  I've been working in this field for over 20 years and have seen many very successful projects.  The benefits of this technology are immense if you understand the basic principles and apply them correctly.

In my next blog I will talk more about training and test data and how to ensure you get the data right.