In our last post, we discussed how machine learning can be used to predict loss. In this post, we will illuminate how the use of new data sources can provide insurers with a wider set of observations about risk.
An insurer's ability to price risk accurately has always relied upon the amount of information they have at their disposal. In the 18th century, the price of insurance for a cargo ship was based solely on a single feature: its destination.
Historically, there was a huge gap between an insurer's estimate of a risk and the manifestation of the risk in reality. As more data has become available, this gap has begun to close.
Rating models have improved, and insurers have been able to utilise demographic information, industry data, and retrospective claims experience to build an even sturdier foundation from which to model the future.
Now, due to the low cost of computation, data has become ubiquitous across all types of devices. This has led to a proliferation of new data sources that are orders of magnitude more granular than the data insurers are currently using.
The volume of observations per insurable unit is skyrocketing. The visualisation below compares the cargo ship example, where only one feature of the risk was needed, to today - where insurance companies like Metromile capture billions of observations, such as data about a car's acceleration, breaking pattern and movements around a city, updated every second.
WHAT ARE NEW DATA SOURCES?
These data sources usually fall into one of two categories:
Humans recording data based on what they see; using devices such as computers and mobile phones
Devices that emit and generate data; such as the internet of things including wearable devices (IoT), satellites, GPS systems and drones
As mobile devices become more affordable and widely used, the number of observations recorded with them continues to grow. Today the number of active mobile devices on earth exceeds its human population .
Data is collected from sensors on manufacturing equipment, pipelines and weather stations. Wearable beacons and chips placed inside of consumer products such as watches provide instantaneous information about consumer behavior and transactions. Gartner, Inc. has estimated there will be 6.4 billion connected devices or ‘things’ in use globally in 2016 , forming a powerful web of interconnectivity and information.
These examples demonstrate a rapid decline in the costs of generating, capturing and storing data, providing an opportunity to represent and explain reality in a more accurate way.
IMPROVED RISK SEGMENTATION AND PRICING
Using AI methods such as machine learning and deep learning, insurers can harness and distil new data sources, allowing them to build models that are grounded in observations of the real world, in real-time. Some examples of new data sources particularly relevant to insurance are restaurant reviews, employee satisfaction scores, social profiles of companies, house prices, and traffic flow data.
By layering new data with their internal data, insurers will be able to observe a risk from multiple angles, identifying new and previously under-observed features across a multitude of risks.
Introducing these features into machine learning models this will significantly enhance an insurer’s ability to predict the frequency and severity of loss.
As PWC suggest, capturing new data allows insurers to uncover previously hidden predictive features :
“Combining “small data”, partner data, and third party or publicly available information has allowed the identification of additional features of a given risk that are powerfully predictive of claims frequency and claims severity, and yet may not be obvious, and occasionally are even counter-intuitive.
— Paul Delbridge, PWC
As the volume and diversity of observations per insurable object continues to increase, insurer's ability to tightly segment risks will improve, leading to accurate, and highly competitive pricing.