Unpacking AI Part 1: Using machine learning to predict loss

This post is the first installment in a three-part series called 'Unpacking AI', where we will take a pragmatic look at how AI can be used to augment and improve some of the fundamental aspects of the insurance industry. 

In a previous post, we shared some interesting ideas about how AI can transform commercial insurance. In this post, we are going to elaborate on one of those ideas by explaining what machine learning is, and how it can be used in an insurance context to predict loss.


“Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.

— Artificial intelligence pioneer Arthur Samuel, 1959

Basically, machine learning is the idea that a computer program can teach itself to grow and change when exposed to new data. The optimal word here being data. The more data you expose the computer program to, the smarter it becomes. In machine learning, this data is often referred to as ‘training data’.

There are a lot of different types of machine learning algorithms, but in this post, we are going to focus on a class of algorithms called supervised learning. Supervised learning is really useful for finding patterns in historical data, and using those patterns to predict likely future events.


An obvious application for supervised learning is in commercial lines underwriting. Underwriting is the backbone of the insurance industry, and if we distill it right down, an underwriter’s job is to protect the insurance company from taking on business that is not profitable. Underwriters do this by distinguishing between good risks and bad risks.

Good risks = Risks that are unlikely to result in claims

Bad risks = Risks which are likely to result in high levels of claims

To successfully identify a good risk, the underwriter must figure out what types of risks experience high losses. This assessment is made by using various pieces of information about the risk, such as the building type, location, and loss history to calculate the probability of loss.


For this example, we will build a model to calculate the probability of loss for commercial buildings.

First, we need to build a dataset that describes a set of different buildings and the number of historical losses they have experienced. In commercial insurance, you would typically use four main categories to describe a building: construction type, occupancy, protection and exposure. For this example, we will use four basic ones; type of building, size of building, location, and number of losses per year.

Insurance Loss Prediction table

As you feed this data to the machine learning algorithm, the algorithm learns a model of the relationship between a building's characteristics and the probability of a claim within a certain time period.

We can ask the algorithm to identify buildings which have a high propensity for losses and based on this, give the building a score between 0 and 1. In this model, this score represents the probability of a loss within a 12 month period.

This model can be used to help underwriters avoid underwriting risks which are likely to experience a loss.

The more characteristics you feed to the algorithm, the more of an accurate model it can build of the nature of the risk in reality. The value of machine learning compared to traditional risk models lies in its ability to introduce a richer more diverse set of features or characteristics that are correlated with the risk. We will go into more detail on this in a future post.  


Machine Learning algorithms are not new, in fact, they’ve been around for over 20 years. What is new, is the amount of computer power and data that we now have access to; 90% of all the data in the world has been generated over the last two years [1]. 

The more data we feed to machine learning models, the more accurate our prediction for the probability of loss becomes. And the more accurate our prediction of the probability of a loss becomes, the better we are able to price insurance.

In the next installment of Unpacking AI, we will examine how new data sources can provide insurers with a wider set of observations about risk, and how this can further improve their ability to price loss and predict loss.

1. https://www.sciencedaily.com/releases/2013/05/130522085217.html