Unpacking AI part 3: Using machine learning to capture commercial property data

In an earlier post, we suggested that it was possible for an AI system to assess your eligibility for commercial insurance and issue you with a price in a matter of seconds  - all without having to ask you a single question. From a technology standpoint, this is complex but possible.

In this post, we will explain how this can be achieved for commercial property insurance using third-party data and the information available online.


Underwriters have used the COPE (Construction, Occupancy, Protection and Exposure) framework for decades to determine the likelihood of loss for a commercial property.

The COPE framework looks at attributes like the location and age of a building, the types of businesses that occupy it and the materials it is built from. All of these factors help an insurer to determine whether a policy will be issued in the first place, and how much it is worth.

Traditionally insurers gathered this information by asking insureds questions or manually collating it from various sources. Now construction data is increasingly accessible from third party datasets, for example, via satellite and geospatial imagery, meaning it can be collected and synthesised using machine learning.


The short answer is no. The information required by insurers typically falls into one of three buckets;

  1. Trivial and easy to obtain; such as the name or address of your business
  2. Obtainable but not trivial; such as your history of loss
  3. Information that does not exist online; such as the type of heating system       installed in your office building

The third instance usually refers to the technical features of an individual building. These features are hard to observe; such as how many locks an office building has, whether it has sprinklers or the types of appliances it contains.

Highly technical information is unlikely to be available on public domains as the incentive to put it there is almost nonexistent.

This means insurers face a tradeoff between obtaining comprehensive information and speed. They may opt to skip questions that are difficult to answer in favour of the value they gain from both improved customer experience and machine pricing.


At Cytora, we use a combination of human intelligence and machine learning to collect COPE data contained in unstructured web text. We started by capturing prior losses at a property level, and we are now augmenting this with occupancy, location and financial information.

The last step is to link all of the relevant pieces of information together to form a profile for a building or business. This is the most challenging task of all, as information can sometimes be fuzzy. For example, if Factual and OpenStreetMap have each listed a different occupancy under the same address, an algorithm must decide which one is correct. At the core of this process is combining scale (to capture the latitude of property attributes) and precision (to ensure the data is correct).


Below is the online form used by insurance firm AXA to issue quotes for a business premise or shop insurance. After each section, we will explain where the required information could be captured from third-party data sets and extracted using machine learning.

AXA form 1

Field name: Your Occupation
Datasets where this information may be captured: LinkedIn, Companies House Public Register, company website.

Field name: How do you trade (for example, a shop, your home or online).
Datasets where this information may be captured: Factual, OpenStreetMap, Google Places API

AXA form 2

Field name: Business Name
Datasets where this information may be captured: LinkedIn, company website, Companies House Public Register, Duedil

Field name: Company Status
Datasets where this information may be captured: LinkedIn, Companies House Public Register

Field name: Business Start Date
Datasets where this information may be captured: LinkedIn, company website, Companies House Public Register, Duedil

Field name: Annual Turnover
Datasets where this information may be captured: Companies House Public Register/Duedil

Field name: Annual Wage Bill
Datasets where this information may be captured: LinkedIn (calculated by number of employees), Companies House Public Register/Duedil

Field name: Do you have any subsidiary companies
Datasets where this information may be captured: Company website, Companies House Public Register

AXA form 3

Field Name: Where is it located
Datasets where this information may be captured: Company website, Factual, OpenStreetMap, Google Places API

Field Name: Listed Building?
Datasets where this information may be captured: Unknown

Field Name: Is it of standard construction?
Datasets where this information may be captured: Government datasets, satellite/Geospatial imagery

Field Name: Is it in good condition?
Datasets where this information may be captured: Satellite/Geospatial imagery, review websites such as Tripadvisor

Field Name: How often is it occupied?
Datasets where this information may be captured: Company website, Google Places API, Foursquare

AXA form 4

Field Name: Do you have lockable external doors and windows?
Datasets where this information may be captured: Unknown.

Field Name: Is your premises front protected with roller shutter doors?
Datasets where this information may be captured: Google Places, Factual

AXA form 5

Field Name: Have you had any claims or incidents in the last five years?
Datasets where this information may be captured: Cytora Dataset (aggregated across billions of disparate data points online)

With a few exceptions, most of the information required to obtain commercial insurance for a business premise exists online. Extracting and organising this information instantaneously at scale, however, is incredibly complex. There is a huge opportunity for those with advanced data extraction capabilities to provide value in this space.


The nirvana for insurers is to have the ability to generate a policy price immediately, requiring only the address of a property from the insured. Instead of requiring the customer to fill in a form, they will get the equivalent data to price risk using third party datasets spanning imagery, web data and other sources.

This won’t happen overnight, but eventually we will see a systematic reduction in the number of fields on any given insurance form. Insurers will compete on the extent to which they can directly or indirectly replace the data inputs. Expressed differently, the competition will be to provide the form with the fewest possible inputs. 

The shift from asking the insured for data to leveraging third-party datasets will also open up the possibility for insurers to observe new features about a risk that are outside the jurisdiction of the insured. For example, reviews left on TripAdvisor could be used to supplement the assessment of restaurants and hotels. Glassdoor, Facebook and Google reviews can help insurers to evaluate the success of a business.

The advantage for insurers is reduced underwriting costs, improved customer experience and optimised pricing; for consumers, it could mean faster, less frictional service, better pricing and virtually no paperwork.