How is big data transforming insurance?

Following the success of the presentation at the 2016 annual RMIA conference, our roving reporter interviewed the GB presenters, Peter Walker and Julian Martin.

Peter, why did you choose to talk about fraud under a topic of Big Data?

The RMIA’s suggested topic “Ways in which big data is transforming insurance” was huge and could have covered so many angles; marketing, underwriting, sales distribution, claims, etc. Julian and I asked ourselves how we could make it relevant to the risk management fraternity, and fraud is a significant risk to an insurer’s performance and results in so many ways.

It always surprises me that people outside of the industry don’t understand the impact of insurance fraud. In Australia alone it is estimated to cost up to $2.2billion dollars each year … and it isn’t costing the anonymous “industry”; it is costing honest policyholders like you and me. Insurers have to increase their premiums to cover these costs and this surcharge gets passed on to the consumers. It has been estimated that it adds about $75 to every single policy we buy. People need to stop thinking of insurance fraud as a “victimless crime”.

But insurance fraud sounds such a dry topic; how did you make it interesting for the audience?

That’s true. We spent quite a bit of time deciding how we could make the presentation both informative and entertaining. That’s how we came up with the idea of concurrent presentations, in a sort of race. I delivered a traditional Powerpoint style presentation on how modern predictive analytics and big data, including social media, could be the long awaited step change for insurers in managing their fraud risk. At the same time, Julian demonstrated in real-time how current tools are making big data solutions accessible and quick to deploy, by attempting to build and execute from scratch a big data driven fraud identification model for online purchases.

So Julian, was the IT demonstration truly live?

For sure. Pete made a joke at the start about never working on the stage with children, animals or live IT demonstrations, but I had confidence in the tools. Realistically the biggest challenge was time.
It’s a 5 step process which goes through a process of loading some example online data that is known to be valid, then cleaning up the data. Known fraud data is then also loaded and formatted and scores are assigned to the different elements enabling the model to be trained. Once the model was trained it was then deployed as web service in the Microsoft Azure cloud. We then were able to simulate an online purchase and submit it to the model to advise as to whether the transaction was fraudulent or not and live in real time show the outcome. The idea of the demonstration was to show that these tools are readily available and quick to use. There are many open source modules freely available on the internet which means we don’t have to reinvent the wheel each time.

Peter, you said in the presentation that 10% to 12% of insurance claims are fraudulent; is it really that high?

Just to clarify, I said that the consensus of industry professionals, research papers and surveys suggest that 10-12% of all claims “contain some element of fraud”. The figure is correct and it is backed up by the Insurance Fraud Bureau of Australia but there is an important distinction in the words, because it covers both hard and soft fraud. When most people think of insurance fraud they think of a person setting out to deliberately stage a claim. The industry calls this hard fraud. Soft fraud is where someone has had a genuine loss event but they choose to exaggerate their claim in some way. This impacts the results. The latter is still defrauding the insurance company but there is a big difference in the two types, especially when it comes to fraud identification.

What has changed over the years to combat insurance fraud?

Ah, it would be easy to say “a lot … but not very much” at the same time! When I started life in claims, over 35 years ago, fraud checking was a manual process and we considered each case against a Fraud Identification Checklist. Nowadays most insurers have automated ‘red flag’ systems where the claim system is looking for matching criteria of known past indicators of fraud. However, in reality, these red flags are not that different to the 35 year old checklist.

So exactly how is big data transforming the fight against insurance fraud?

The presentation talked about this in detail, but I’ll try to simplify the answer by breaking it in to two parts.

Firstly, one of the reasons that insurers are still relying on red flags which are fundamentally many years old is that the data they capture in their claim systems hasn’t significantly changed. There are some exceptions, but most claim systems in production use a database which consists of a finite number of alphanumeric data fields, which have fixed limitations on what can be stored. It is what we call highly structured data. However, anyone that has worked in claims knows that the truly meaningful information relating to the claim rarely sits in this structured data. It’s contained in statements of the circumstances of loss, in loss adjuster reports, in medical reports, in emails, even the case manager’s own files notes. So we have wealth of information but it remains inaccessible for data analysis to many insurers because it is unstructured data not contained within the searchable data fields.

Document imaging technology and optical character recognition has moved on in recent years, and the cost of storage has come down significantly, so it is now quite possible for organisations to store large amounts of what we call ‘natural language’ data (that’s verbal and written information) in digital form. With advances in text mining, insurers now have the opportunity to be able to search digitised documents of the entire file with text mining software. Add to this, modern anomaly detection and link analysis principles and the insurer has the ability to not only search but to analyse connections in all of that rich material that exists outside of the structured data fields.

Whilst this technology is available now, it is still not commonplace in the insurance claims department so making this step alone has the power to transform the insurers’ fight against fraud.

The second part is the broader opportunity opened up by the new ability to search unstructured data. We do not have to limit ourselves to the digitised text in the claim file; we can access big data from multiple sources, including social media.

Does social media really make that much difference? Surely it is the direct claim information which is more relevant?

Yes it makes a huge difference. We already have evidence that social media can be a game-changer in the fight against insurance fraud.

I made the point in my presentation that I try to keep my business life and my private life separate. However, it is becoming increasingly difficult to do this. Facebook now regularly suggests new ‘friends’ for me to connect with who are solely business connections of mine. They just happen to be connections through other sources, such as LinkedIn, or I just have their contact details in my phone. Like it or not, clearly there is some form of digital ID out there that is making that connection to me. Here’s another example; I searched an organisation for a project on my work laptop the other day and four hours later that same company was being promoted to me via Facebook. Tracking ‘cookies’ are used everywhere we go on the web, and would be fraudsters face the same challenge. It is difficult to hide from social media … and indeed we have evidence that some fraudsters still seem oblivious to this and are prepared to share openly.

Can you give me an example of what you mean by open sharing?

Of course. Perhaps the biggest “cash for crash” fraud ring conviction in the UK is a perfect example. The Yandell family ran a car repair business and were at the centre of the crime ring, staging vehicle accident claims. Police investigations resulted in 81 people being convicted of charges of conspiracy to defraud and social media played a part. For example, one of the claims made by a family member (Byron Yandell) alleged a collision with three other parties. He maintained that he didn’t know the others. However, police found photographs published on Facebook of Byron’s wedding which showed quite clearly that all three of the alleged injured parties were at the wedding reception!

The interconnectedness of the web seems impossible to avoid and it is this fact that is helping some insurers today, as more and more people openly share their lives in the cloud.

Finally, what do you see in the future for insurers in relation to fraud identification or prevention?

That’s perhaps the most difficult question of the interview. As recently as 10 years ago, no one was predicting what we have available today and technology is moving forward at a pace. Artificial Intelligence (AI) and machine learning is in its infancy for most claims departments but as it matures it opens up so many possibilities. For example, our current red flag systems are built on the structured data of past known fraud cases. If we took those same cases and opened up big data streams associated with each, we could use systems (running text analytics, modern predictive modelling software and machine learning) to potentially identify new flags. We can get the computer system to identify any correlation between common data points in this new, very broad, data associated with these fraud claims. If the commonalities are frequent then, statistically, these are likely to be new additional indicators which can be used to refine and improve the current fraud rating system.

There are also possibilities for predictive modelling to create fraud profiling which could be helpful in our flagging process.

At the moment most of the industry is still thinking about text searches. The next generation of social network analysis software designed to analyse unstructured data (such as IBM’s Watson product), is capable of including both static images and video streaming as data inputs. This suggests that our anti-fraud link analysis of the future will no longer be confined to text based data and will be using all those images and videos people are so freely distributing today.

GB's Peter Walker and Julian Martin presenting at the 2016 RMIA conference