Skip to main content

Three data and analytics pitfalls that risk managers of self-insured organizations should avoid

10 February 2022

Data and analytics play a role in most decisions we make these days. On a personal level, data is used when you’re researching online for a big purchase or while selecting players for your fantasy football draft. From a risk management perspective, data and analytics can be used to evaluate potential acquisitions, identify opportunities for improvements for safety and loss control to lower workers’ compensation costs, or determine how much excess liability insurance to buy.

Data can help highlight trends, but sometimes the data source and format of the results can be deceptive and lead to inappropriate conclusions. It is important for you to understand the following three items when using data and analytics in your decision-making.

Understand the source of the data

It is important to first understand the source of the data underlying an analysis. Is it from reputable and unbiased origins? In a casualty actuarial report, the data sources commonly used are loss runs (e.g., paid and incurred loss information), exposure information (e.g., payroll, revenue, auto units), and insurance policy information (e.g., policy years and retentions).

For workers’ compensation, for example, loss data is typically from the third-party administrator (TPA) or the carrier, and your actuary relies on this data with the understanding that it has not been manipulated. The payroll data that is used is either audited by your carrier or is your estimate. Increased comfort with the data leads to increased comfort in the results, avoiding the idea of “garbage-in-garbage-out.”

When you are researching the reviews online for the lawnmower that you’ve been thinking about buying, ask yourself: where do those five-star reviews come from? Were consumers incentivized to leave a positive review? Are the reviews from actual consumers or from bots? Considering the source of the data underlying the results will help you gauge your reliance on it when making decisions for your business.

Evaluate the preparer and publisher of the analysis

Now that you’ve identified the source of the data and feel confident that it hasn’t been manipulated, you need to next evaluate the source of the analysis. When your actuary prepares an analysis, they should be providing an independent viewpoint of your exposure to risk. There should be no outside influences, and the result you receive shouldn’t be intended to mislead you in your decision-making for reserving or budgeting.

This may not always be the case. While researching lawnmowers, you find a report online that says John Deere is the best one on the market. If that report were issued by John Deere, it would have a very different intention from a report issued by Consumer Reports. Understanding the preparer and publisher of the analysis can help you identify any inherent biases of the results. Author Luis Alberto Urrea says in his book, The Devil’s Highway: A True Story, “Numbers never lie, after all: they simply tell different stories depending on the math of the tellers.”

Review the format of the results

While numbers don’t lie, they can be formatted in different ways to tell different stories. When reviewing the results of an analysis, pay close attention to tables and graphs.

1. Have tables or graphs been manipulated to tell a particular story?

Figure 1: Total cost of risk, by percentiles, x-axis increments of 0.05

Figure 2: Total cost of risk, by percentiles, inconsistent x-axis increments

Figures 1 and 2 include the same underlying data, comparing the total cost of risk for three insurance program options; the difference between these two graphs is the scaling of the x-axis. In Figure 1, the x-axis increments are spaced evenly by every 5th percentile, and there aren’t any kinks in the lines for Options 1 and 2. Instead, this graph indicates a steady increase for both options, where Option 1 is better than Option 2 until approximately the 55th percentile and then reverses.

In Figure 2, the x-axis increments are inconsistent, causing kinks in the Option 1 and Option 2 lines. It also appears that Options 1 and 2 flatten out at the higher percentiles; however, the spacing from the 95th to the 99th percentiles by 0.01 is creating this illusion. Not only that, but there appears to be more overlap between Options 1 and 2, potentially resulting in an agnostic decision between the two options.

Figure 3: Total cost of risk, by percentiles, truncated by adjusted y-axis

Figure 3 uses the same x-axis as Figure 1, but the y-axis has been adjusted to a minimum and maximum that doesn’t contain all the data points. Adjusting the y-axis can help zoom in on the picture, now demonstrating that Option 2 has a slight advantage over Options 1 and 3 at the 60th percentile, but there is a lot of lost information outside the bounds of the y-axis. This may seem like an extreme example but paying attention to the scale of the x- and y-axis of a graph can help you recognize when you aren’t receiving the whole story.

2. How is the data presented: raw or normalized?

Data can be presented as raw or normalized. Examples of raw data include claim counts, paid losses, and incurred losses. Normalized data, on the other hand, is a relative measure. This means that the raw data has been adjusted to reflect differences in exposure, risk profile, or external factors such as inflation. An example of a normalized metric used in workers’ compensation is frequency, or the raw claim count per $1 million payroll. Dividing by payroll normalizes the claim counts.

Figure 4: Claim count and ultimate losses, by policy year

Figure 4 contains the claim counts and ultimate losses for nine policy years. The claim counts appear to increase in most years. The ultimate losses follow a similar pattern. At first glance, these are not great results. Seeing a graph like this may cause you to scrutinize your loss control efforts or result in unnecessary action.

Figure 5: Severity and frequency, by policy year

Figure 5 includes the same information as Figure 4, but it has been normalized. Severity, or average cost per claim, considers both the ultimate losses and the claim counts included in Figure 4 (ultimate loss divided by claim counts). This would indicate an improvement, counter to what Figure 4 would indicate. The frequency in Figure 5 is mostly flat, indicating that there might not be a need to implement any additional safety efforts. When payroll is growing, there is a higher propensity for loss, oftentimes resulting in more claims and loss dollars. Once claim counts are normalized by payroll, it gives a better indication of actual experience.

A third metric that hasn’t been shown in Figure 5 but is also commonly used in actuarial reports is loss rate, or ultimate losses per $100 payroll. Normalizing ultimate losses by dividing by payroll might highlight a trend different from the one seen when looking at raw ultimate losses.

3. Are data point comparisons consistent?

Workers’ compensation losses can take years to develop and, when examining undeveloped losses, it’s easy to make the mistake of comparing inconsistent data points.

Figure 6: Incurred losses, currently valued, by policy year

When looking at Figure 6, you may notice how it appears that there is a vast improvement in incurred loss experience between 20X7 and 20X9. Before celebrating this favorable outcome, it’s important to realize that this graph is shown as “currently valued,” meaning each policy year is evaluated at a different age. For example, 20X9 is 12 months from policy inception, 20X8 is 24 months old, 20X7 is 36 months old, and so on. A policy year may only have 50% to 70% of its ultimate workers’ compensation losses incurred at 12 months old.

Figure 7: Incurred losses, aged 12 months from inception, by policy year

In Figure 7, all policy years are shown aged 12 months from inception, or “green-to-green.” This demonstrates how comparing across a consistent basis can change the picture. Instead of showing the favorable development seen in Figure 6, this now illustrates stability in incurred losses between 20X6 and 20X9. Another observation may be that there has been a shift between 20X5 and 20X6 and identifying the cause of that shift may help direct your safety efforts.

4. What is the sample size of the study?

Paying attention to the sample size can help you avoid jumping to unnecessary conclusions. The “law of large numbers” is a phrase that is commonly mentioned when discussing sample size. When looking at the count of reviews on lawnmowers, would you have more confidence with five reviews or 500 reviews? Too small of a sample may lead to increased volatility, which may make it hard to identify any underlying trends.

For example, for a company with an average of 10 workers’ compensation claims a year, each year could be vastly different. One year may have a $1 million loss, causing the total incurred losses in that year to be much higher than the surrounding years. One year may be fortunate to have all claims less than $25,000. Because of this increased volatility, it’s difficult to determine what losses will ultimately be for each year.

For a company with an average of 200 workers’ compensation claims a year, there will be gained stability in the total losses. Some of these claims will be large and some will be small, but the total losses will average out to a more consistent result year over year. A company with an average of 1,000 claims a year has even more consistency and may produce more credible results.

When an actuary provides a forecast, they rely on the history to predict the future. When the history is consistent, it’s easier to project the upcoming year’s losses with greater confidence. With increased volatility, due to low claim count or loss volumes, there can be a wide range around the average. The presence or absence of a large loss can completely change the result, increasing the uncertainty of the selection.

In summary

If you don’t spend all your time in spreadsheets or crunching data, it can be easy to trip over one of these data and analytics hazards. Understanding the source of the underlying data, the party who prepared and published the results, and the format and content of the tables and graphs can prevent you from unintentionally misinterpreting your data.

Your actuary can help you decode your data and analytics so you are able to make an informed decision for your business. They can help organize data into a format that is user-friendly for all audiences. They might even be able to suggest which lawnmower to purchase!


About the Author(s)

Melissa Huenefeldt

We’re here to help