Article

リスク管理におけるデータサイエンス活用の可能性

11 December 2023

はじめに

過去数年にわたり、データサイエンスと計算能力の進展が相まって、リスク管理者が利用できるツール・キットが強化されてきました。特に、機械学習（ML）や人工知能（AI）は、革新的なソリューションを提供し、効率性や予測能力を高めることで、最終的に保険のバリューチェーンに大きな影響を与える可能性があります。本稿では、（再）保険会社のリスク管理におけるデータサイエンス手法の潜在的な活用事例と、それらに関連する課題を紹介します。

潜在的利用例

保険の引受査定

データサイエンスは、引受査定の分野では既に定着しており、保険会社がリスクをより正確に評価し、プライシングを行い、基になるリスク・エクスポージャーの代わりとして使用する主要因を決定し、保険契約者行動を理解し、将来起こりうる経験について前提条件を設定することが可能です。これはリスク管理の観点からも大きなメリットがあり、保険会社は保険引受リスク・エクスポージャーをリスク選好に照らして管理し、適切かつ持続可能なリスク・プライシングを可能にしています。例えば、外部データ・ソースやウェブ・スクレイピングとディープラーニングのような機械学習技術との組み合わせによるデータの拡充は、リスク選択とプライシングにおける競争優位性をもたらす可能性があります。

経験分析

経験を測定するための伝統的手法（例えば、契約失効調査）は、すでに基本的なデータサイエンス技術に基づいて構築されています。しかし、機械学習による分類手法やディープラーニングのようなより高度な手法を用いることで、企業は構造化データと非構造化データの両方を活用し、保険契約者の行動に総体的に影響を及ぼす可能性のある様々な要因の相互関連性を探ることができます。これは、動的な行動、すなわち、他の要因の影響による、時間の経過とともに予想される行動の変動を理解する上で特に有用であり、全体的な経験の潜在的変動に大きな影響を与える可能性があります。

オペレーショナルリスク

オペレーショナルリスク・エクスポージャーの先行指標として機能することが多々あるデータの異常値や不整合の特定は、データサイエンスの技術により促進することが可能です。これは特に不正検知について当てはまり、広範で深いデータセットに対する個々の行動（または不作為）を比較することで、比較していなければ隠れていたパターンを明らかにすることができます。もちろん、これは一般化しても当てはまります。例えば、運営チーム全体の生産性の中断を分析することで、大規模なサービス中断前に、会社の管理システム内の不安定性が高まっていることを特定し、基になる問題に先手を打って対処する時間を確保することが可能です。しかし、このような目的で機械学習技術を使用する場合、この情報の想定利用者がこの情報を理解し、受け入れて、リスク管理活動の一環として活用できるようにするためには、モデル結果の説明可能性について特に注意する必要があります。

Introduction

Over the past number of years, developments in data science and computational capabilities have combined to enhance the tool kit available to risk managers. In particular, machine learning (ML) or artificial intelligence (AI) may ultimately have a profound impact on the insurance value chain, by providing innovative solutions and enhancing efficiencies and predictive capabilities. In this short note we explore some of the potential uses of data science techniques in the risk management of (re)insurance companies, together with some of the challenges associated with them.

Some potential use cases

Underwriting

Data science has already become well-established in the field of underwriting, allowing firms to more accurately assess and price risks, to determine key drivers which may be used as proxies for the underlying risk exposure, to understand policyholder behaviour, and to set assumptions about likely future experience. This has significant benefits from a risk management perspective, allowing firms to manage underwriting risk exposures relative to appetite and to ensure suitable and sustainable pricing of risk. Data enrichment by, for instance, external data sources or web scraping in combination with machine learning techniques like deep learning has the potential to yield a competitive advantage in risk selection and pricing.

Experience analysis

Traditional methodologies for the measurement of experience (for example, lapse investigations) are already founded on basic data science techniques. However, more advanced techniques, like machine learning classification methods or deep learning, can allow firms to make use of both structured and unstructured data and to explore the interconnectedness between different factors which might, collectively, influence the behaviours of policyholders. This can be especially useful in understanding dynamic behaviour, i.e., variations in expected behaviour over time, owing to the influence of other factors, which can significantly affect the overall potential variability in experience.

Operational risk

Data science techniques can facilitate the detection of data outliers and inconsistencies, which can often act as lead indicators of operational risk exposure. This can be especially true in the case of fraud detection, where comparison of individual actions (or inactions) relative to a broad and deep data set can reveal otherwise hidden patterns. Of course, this also holds more generally. For example, analysis of blips in productivity across the operations team may identify mounting instability within the company’s administrative system in advance of a major service interruption, allowing the firm time to pre-emptively address the underlying issues. When using machine learning techniques for these purposes, however, special attention must be given to the explainability of model outcomes in order to ensure that the intended users of this information understand and accept it and therefore utilise it as part of their risk management activities.

Horizon scanning

More generally, data science techniques can act as a very effective means of horizon scanning to detect potential emerging risks, attempting to identify patterns in the data which may indicate that either internal or external environmental factors are likely to impact upon the firm’s risk profile. Predictive models can be used to generate lead indicators which warn the firm—sufficiently far in advance such that appropriate action can be taken—of potentially adverse conditions. This can include, for example, measurement of the firm’s current risk profile relative to risk appetite or spotting anti-selective behaviour on the part of policyholders. This may be best illustrated using an example. Consider a firm with a stated appetite for lapse risk which implies that it is willing to bear a loss of €X million over a 12-month time horizon. The firm has a predictive model, and it is currently indicating that lapse risk is on the rise. Economic conditions are such that the predictive model expects that lapse risk will continue to increase and eventually far exceed the firm’s risk appetite. This outcome would represent a lead indicator suggesting that the firm needs to take appropriate action now in order to remain within its risk appetite, e.g., implement a loyalty scheme to improve policyholder retention.

Other common examples, which are already in place at many firms, include real time solvency and earnings monitors. Again, the data used here may be a mix of both structured and unstructured data and may draw from both the firm’s own dataset and from external sources.

Generative AI is set to prove particularly useful in building scenarios to assist firms with horizon scanning. ‘What-if’ analyses, which may be conducted as part of the annual Own Risk and Solvency Assessment (ORSA) process, can help to uncover and better understand risk exposures, making it a very helpful tool to aid communication with key stakeholders.

Data visualisation

Effective, clear and concise communication to relevant stakeholders is a common challenge for risk managers. The volume and complexity of information is often such that it is difficult to distil the key messages and to bring these to life. Data visualisation techniques facilitate the translation of information into charts and graphs, which can be either static or interactive, making information much easier to comprehend and communicate.

Challenges

Naturally, implementing data science techniques, in particular ML techniques, is not without its challenges. Even with sufficient resources and technical expertise, firms will need to consider:

Quantity and quality of data

ML models are data hungry, requiring vast amounts of data for training. The more data available for training, the better the ML model will be able to perform. However, the quality of this data is equally crucial. Inaccurate, incomplete or inconsistent data can lead to poorly performing models. Therefore, data cleaning and preprocessing are essential steps.

Model calibration

Data models often require fine-tuning of their parameters to improve their predictive accuracy. This trial-and-error process, known as model calibration, can be complex and time-consuming, especially for models with a large number of parameters.

Black box models

Many models, particularly deep learning models, are often seen as ‘black boxes’ due to their complex inner workings. This lack of interpretability can make it challenging to explain the model’s decisions to stakeholders, which is particularly problematic in sectors where explainability is crucial, such as insurance. To address this issue, several post-hoc methods have been developed to help explain a model’s outcome and improve user adoption.

Bias in the data/data ethics

Data models, particularly ML and AI models, can inadvertently learn and perpetuate biases present in the training data.

This can lead to unfair or discriminatory outcomes, raising serious ethical concerns. It is crucial to evaluate each model carefully to detect potential biases and to ensure that the data used for training is representative and free from bias (insofar as this is possible).

Constant evolution

Data models are trained on historic data patterns. In instances though where the nature of a given threat is constantly and rapidly changing, for example, in the context of cybersecurity risk, models may be constantly trying to keep pace with the emergence of new data. This makes it difficult to accurately calibrate models such that they can maintain some predictive capability.

Cyber risk and data leakage

ML systems can be vulnerable to cyber threats, including data breaches and adversarial attacks. In addition, when using open-source ML models, there’s a risk that sensitive data used for training could be leaked or misused. This is a major concern in fields like healthcare and finance, where data privacy is paramount. Ensuring the security of these systems is a significant challenge.

Model governance

When ML models are put into production, the management and oversight of these models throughout their lifecycle must be carefully implemented. This includes aspects like model validation, monitoring and maintenance. Poor model governance can lead to models which perform poorly or behave unpredictably. For instance, the distribution of data that a model receives can change over time, a phenomenon known as data drift. If not accounted for, data drift can lead to a deterioration in model performance. Thus, having early warning systems in place to detect such behaviour is essential.

Conclusion

In conclusion, whilst data science techniques offer immense potential, many challenges must be addressed to ensure their successful implementation and performance over time. This requires a multidisciplinary approach, combining expertise in data science, cybersecurity, ethics and domain-specific knowledge.