Classification of insurance claim emails
A hybrid machine learning case study
Email communication is still undoubtedly the most popular means of official electronic communication. It offers full flexibility of conveying a message, which is paramount when communication cannot be a priori framed. Therefore, even structured web forms very often contain open fields for the customers to express their message freely.
Such flexibility for the customer needs to be matched by the providers. Indeed, it is necessary to establish efficient processes to handle arbitrary communication.
Bot functionality is widely used for customer support of retail sales but not yet for purposes like insurance claims. Indeed, in most of the cases flexible communication of the insurers’ clients is asserted by human workload on the provider side. Claim handlers have to dig through piles of emails from their customers on a daily basis.
In most of the cases, the insurance claim communication is simple and boils down to sending invoices in the attachments together with the accompanying salutation and valediction clauses. Very often customers also ask for a confirmation of receiving their documents. In such cases, a response from a claim handler is not needed because a confirmation email can be sent automatically once the invoices are processed. However, if a client posts an important question it should be answered by a claim handler reasonably promptly. Indeed, it is a matter of quality of service and may produce a reputational risk in case of failure. In broader sense, it is part of user experience (UX) which is currently one of the core focus points of many Insurtech-driven entities.
In order to facilitate the work of claim handlers, these communications can be supported by an automated filtering system to label the emails that require human response and those that do not.
In a research study that we performed, a health insurance client faced a significant redundancy, where nearly 45% of incoming emails were passed to the claim handlers but the true rate was expected to be around 14%. Therefore, the goal for this research was set to:
Develop a machine learning model that is capable of aiding in straight-through processing and filtering of claim emails without losses in customer satisfaction or increases in the workload of the hospitalization office.
Thus, the obvious criteria was not to deliver more work to the claim handlers than they had at that time already, but the real challenge was not to miss the important emails.
We analyzed nearly 10,000 emails received in different separate periods. Due to the relatively small amount of emails in comparison to the number of terms or words, we had to reduce the dimensionality of the terms while controlling the loss of information at the same time.
The email messages followed a three-step preparation process:
- Tokenization
- Stop-word and personal data removal
- Stemming
Tokenization splits the content into single elements, like words. Stop-words (e.g., the, a, for, under), as well as personal data, need to be removed in order to focus on the actual claim content. Stemming strips the word’s root or stem from prefixes and suffixes, e.g., walking and walked are both recorded as walk. It is done to reduce the number of unique tokens in the data.
The body of an email also delivered additional features like indicators for internal or confidential communication, question mark presence, and the general email length. In addition, two features were extracted from the subject line, i.e., forward and reply indicators. Furthermore, we also considered the number of the attachments extracted from the attachment line in the header of an email.
We used Latent Dirichlet Allocation (LDA) as a data reduction technique. LDA, as opposed to other dimension reduction techniques, produces nonparametric distributions of probabilities over topics rather than “hard” classification. This allowed us to reduce the dimensionality without much loss of information. This technique divides terms and tokens into topics, which greatly reduces the number of variables for the classification task. However, the number of topics is of influence for the interpretation of the topics. Increasing the number of topics results in more information retention at the cost of independence between these topics. Therefore, different classification models were trained to determine the best-performing model given a specific pretrained LDA model.
We investigated several general classification models not only to select the best performing one but to understand the applicability of these models in our particular use case of email filtering:
- Support-vector machine (SVM)
- K-nearest neighbor (KNN)
- eXtreme Gradient Boosting (XGBoost)
- Multilayer perceptron (MLP)
- With one hidden layer
- With two hidden layers
Due to the different computational properties of these models, they either benefit from a larger independence between the variables or from a larger information retention. The parameters from the LDA needed to be adjusted to reflect these properties and so to ensure the optimal performance of the classification models.
In order to achieve that, we considered a hybrid model setup—in particular, the LDA and a classification model worked in sequence as a single machine learning model, as shown in the diagram in Figure 1.
Figure 1: Single machine learning model
Hybrid model setup
For each of the classification models, the hyperparameters of the model in question and the number of topics in the LDA model were considered as tunable parameters. As such the hyperparameter space always contained the number of topics.
We optimized the hyperparameters using Bayesian optimization, which finds the expected set of hyper parameters that is most likely to improve the score of a model. This method greatly reduced the computation time in comparison to greedy algorithms such as random or grid search. However, the stop condition for Bayesian optimization depends on the number of iterations, which needs to be set beforehand.
The optimal models were determined by evaluating a custom performance measure based on a two-component metric consisting of a quality and a quantity score:
Both components of the scoring metric had to also satisfy their boundary conditions. These metrics only hold for a strict binary classification. Because the considered classification models output continuous predictions, a threshold needed to be chosen above which emails were classified as requiring the claim handler’s response. This threshold was chosen to minimize the quality scoring metric while satisfying the quantity score boundary.
The final scoring metric used the area under the curve (AUC) to combine both quantity and quality scores for all of the feasible threshold levels and yield the results shown in Figure 2.
Figure 2: Final scoring metric
CLASSIFIER | AUC | QUANTITY | QUALITY |
---|---|---|---|
XGBOOST | 0.918 | 16.4% | 0.9% |
MLP 1 HIDDEN LAYER | 0.908 | 11.6% | 1.5% |
MLP 2 HIDDEN LAYERS | 0.901 | 11.2% | 2.3% |
SVM | 0.897 | 11.8% | 1.9% |
KNN | 0.695 | 16.8% | 3.0% |
The optimal classifier is selected based on the highest AUC, which is inversely related to both quantity and quality scores. Although the combination of LDA with XGBoost has the highest AUC score, this model setup also results in the second highest and thus second worst quantity score. This is caused by the choice of threshold. Because the quality of this model setup is best, and it still reduces the current workload by 28%, it is considered the best performing model.
Given the dynamic nature of email communication, the production process could greatly benefit from continuous monitoring, training, and testing of the proposed model. Indeed, learning from the past classification process and incorporating customers’ and claim handlers’ feedback could improve model quality far beyond the initial performance assessed on the limited email history.
To conclude, we found that implementing a proposed hybrid model for claim email filtering could significantly reduce the customer support workload while maintaining a satisfactory response quality level. Indeed, reducing the workload of the hospitalization office would release time for answering the relevant questions and in this way increase customer satisfaction.