Data science–potential uses in risk management
Introduction
Over the past number of years, developments in data science and computational capabilities have combined to enhance the tool kit available to risk managers. In particular, machine learning (ML) or artificial intelligence (AI) may ultimately have a profound impact on the insurance value chain, by providing innovative solutions and enhancing efficiencies and predictive capabilities. In this short note we explore some of the potential uses of data science techniques in the risk management of (re)insurance companies, together with some of the challenges associated with them.
Some potential use cases
Underwriting
Data science has already become well-established in the field of underwriting, allowing firms to more accurately assess and price risks, to determine key drivers which may be used as proxies for the underlying risk exposure, to understand policyholder behaviour, and to set assumptions about likely future experience. This has significant benefits from a risk management perspective, allowing firms to manage underwriting risk exposures relative to appetite and to ensure suitable and sustainable pricing of risk. Data enrichment by, for instance, external data sources or web scraping in combination with machine learning techniques like deep learning has the potential to yield a competitive advantage in risk selection and pricing.
Experience analysis
Traditional methodologies for the measurement of experience (for example, lapse investigations) are already founded on basic data science techniques. However, more advanced techniques, like machine learning classification methods or deep learning, can allow firms to make use of both structured and unstructured data and to explore the interconnectedness between different factors which might, collectively, influence the behaviours of policyholders. This can be especially useful in understanding dynamic behaviour, i.e., variations in expected behaviour over time, owing to the influence of other factors, which can significantly affect the overall potential variability in experience.
Operational risk
Data science techniques can facilitate the detection of data outliers and inconsistencies, which can often act as lead indicators of operational risk exposure. This can be especially true in the case of fraud detection, where comparison of individual actions (or inactions) relative to a broad and deep data set can reveal otherwise hidden patterns. Of course, this also holds more generally. For example, analysis of blips in productivity across the operations team may identify mounting instability within the company’s administrative system in advance of a major service interruption, allowing the firm time to pre-emptively address the underlying issues. When using machine learning techniques for these purposes, however, special attention must be given to the explainability of model outcomes in order to ensure that the intended users of this information understand and accept it and therefore utilise it as part of their risk management activities.
Horizon scanning
More generally, data science techniques can act as a very effective means of horizon scanning to detect potential emerging risks, attempting to identify patterns in the data which may indicate that either internal or external environmental factors are likely to impact upon the firm’s risk profile. Predictive models can be used to generate lead indicators which warn the firm—sufficiently far in advance such that appropriate action can be taken—of potentially adverse conditions. This can include, for example, measurement of the firm’s current risk profile relative to risk appetite or spotting anti-selective behaviour on the part of policyholders. This may be best illustrated using an example. Consider a firm with a stated appetite for lapse risk which implies that it is willing to bear a loss of €X million over a 12-month time horizon. The firm has a predictive model, and it is currently indicating that lapse risk is on the rise. Economic conditions are such that the predictive model expects that lapse risk will continue to increase and eventually far exceed the firm’s risk appetite. This outcome would represent a lead indicator suggesting that the firm needs to take appropriate action now in order to remain within its risk appetite, e.g., implement a loyalty scheme to improve policyholder retention.
Other common examples, which are already in place at many firms, include real time solvency and earnings monitors. Again, the data used here may be a mix of both structured and unstructured data and may draw from both the firm’s own dataset and from external sources.
Generative AI is set to prove particularly useful in building scenarios to assist firms with horizon scanning. ‘What-if’ analyses, which may be conducted as part of the annual Own Risk and Solvency Assessment (ORSA) process, can help to uncover and better understand risk exposures, making it a very helpful tool to aid communication with key stakeholders.
Data visualisation
Effective, clear and concise communication to relevant stakeholders is a common challenge for risk managers. The volume and complexity of information is often such that it is difficult to distil the key messages and to bring these to life. Data visualisation techniques facilitate the translation of information into charts and graphs, which can be either static or interactive, making information much easier to comprehend and communicate.
Challenges
Naturally, implementing data science techniques, in particular ML techniques, is not without its challenges. Even with sufficient resources and technical expertise, firms will need to consider:
Quantity and quality of data
ML models are data hungry, requiring vast amounts of data for training. The more data available for training, the better the ML model will be able to perform. However, the quality of this data is equally crucial. Inaccurate, incomplete or inconsistent data can lead to poorly performing models. Therefore, data cleaning and preprocessing are essential steps.
Model calibration
Data models often require fine-tuning of their parameters to improve their predictive accuracy. This trial-and-error process, known as model calibration, can be complex and time-consuming, especially for models with a large number of parameters.
Black box models
Many models, particularly deep learning models, are often seen as ‘black boxes’ due to their complex inner workings. This lack of interpretability can make it challenging to explain the model’s decisions to stakeholders, which is particularly problematic in sectors where explainability is crucial, such as insurance. To address this issue, several post-hoc methods have been developed to help explain a model’s outcome and improve user adoption.
Bias in the data/data ethics
Data models, particularly ML and AI models, can inadvertently learn and perpetuate biases present in the training data.
This can lead to unfair or discriminatory outcomes, raising serious ethical concerns. It is crucial to evaluate each model carefully to detect potential biases and to ensure that the data used for training is representative and free from bias (insofar as this is possible).
Constant evolution
Data models are trained on historic data patterns. In instances though where the nature of a given threat is constantly and rapidly changing, for example, in the context of cybersecurity risk, models may be constantly trying to keep pace with the emergence of new data. This makes it difficult to accurately calibrate models such that they can maintain some predictive capability.
Cyber risk and data leakage
ML systems can be vulnerable to cyber threats, including data breaches and adversarial attacks. In addition, when using open-source ML models, there’s a risk that sensitive data used for training could be leaked or misused. This is a major concern in fields like healthcare and finance, where data privacy is paramount. Ensuring the security of these systems is a significant challenge.
Model governance
When ML models are put into production, the management and oversight of these models throughout their lifecycle must be carefully implemented. This includes aspects like model validation, monitoring and maintenance. Poor model governance can lead to models which perform poorly or behave unpredictably. For instance, the distribution of data that a model receives can change over time, a phenomenon known as data drift. If not accounted for, data drift can lead to a deterioration in model performance. Thus, having early warning systems in place to detect such behaviour is essential.
Conclusion
In conclusion, whilst data science techniques offer immense potential, many challenges must be addressed to ensure their successful implementation and performance over time. This requires a multidisciplinary approach, combining expertise in data science, cybersecurity, ethics and domain-specific knowledge.