Geocoding for insurance: How to ensure precision and accuracy in the geocoding workflow
Importance of geocoding in insurance
Geocoding is integral to assessing location-based risks for property insurers. By converting address data to specific locations, it allows insurers to identify, analyze, and price risk at more granular levels. Incorporating geocoding and the right quality controls into the insurance workflow provides a significant advantage. However, not all insurers include geocoding in their pricing and underwriting. And even those that do may fail to properly vet their geocoded results before binding a policy. In this article, the second in our series on geocoding for insurance, we discuss how to identify potential bad results throughout the geocoding workflow.
Sidebar
What is geocoding for property insurance?
Geocoding is a process that takes information like street addresses or place names and matches them to geographic latitude and longitude coordinates. For insurers, geocoding can be used to validate and standardize input addresses, as well as to convert property information to spatial data. With spatial data, insurers can more easily examine location-based risk factors. The geocoding workflow is typically comprised of three phases:
1. Reviewing and cleaning input addresses
2. Running addresses through a locator
3. Reviewing the output and re-geocoding as needed
For more information, see the first article in our series, Geocoding for insurance: An overview.
Read less
Understanding spatial uncertainty in the context of geocoding
Converting a street address to a single coordinate point inherently involves some level of spatial uncertainty because our homes, businesses, and equipment take up more physical space than a single point. Even with this limitation, geocoding still represents one of the best ways to estimate the location of an insured. It is important to recognize that, while geocoding is incredibly precise (it nearly always returns a specific coordinate point), it is not always accurate. Even the best possible point match cannot accurately represent a building and, depending on the geocoder used, the result may more accurately reflect a delivery point or parcel centroid. In cases where no unique match is found, the resulting point could be interpolated between two known addresses, a street corner, or even a ZIP Code centroid. All of these results are returned as a precise latitude and longitude coordinate, but not all of them are equally accurate.
Figure 1: Precision vs. accuracy
Identifying geocoding errors and understanding their origins are critical for minimizing inaccuracies. The primary sources of uncertainty here include the quality and completeness of input address data as well as that of the quality and recency of the reference data used in the geocoder. Various attributes of the geocoded result, known as a match, can serve as our first clue that there may be greater than acceptable levels of uncertainty in the estimated location of an insured property. As you will see, conducting quality checks at each step is vital for identifying potential inaccuracies.
Understanding the sources of error in geocoded results
Geocoding works by matching address elements (e.g., street number, street name, ZIP Code, city, etc.) to geocoder database records. Most geocoding errors occur when the application cannot match an input address in its entirety and therefore outputs a latitude/longitude point corresponding to the most geographically refined address element(s) it can match. Ultimately, the best results will occur when the input address contains as many or all possible address elements.
Figure 2: Matching address elements
Let’s look at some examples given this hypothetical input address:
1234 Main Street
Columbus, Ohio 43205
USA
According to our geocoder, while there is no building that has this address, there is indeed a 1200 block on Main Street in Columbus. The geocoding application cannot match 1234 Main Street exactly, but it can estimate where this address would be if it did exist along the block.
Now let’s consider the same address on Main Street, but this time the street name has also been misspelled:
1234 Maine Street
Columbus, Ohio 43205
USA
The geocoding application cannot match the street name, so it will likely defer to the next most geographically precise address element—the ZIP Code—and return a latitude/longitude point corresponding to the center of the ZIP Code area. While this results in the next-best possible match, depending on the intended purpose of the geographic analysis, it could be impractical to use.
Figure 3: Matching by ZIP Code
Base Map Service Layer Credits: Esri, NASA, NGA, USGS, FEMA, TomTom, Garmin, SafeGraph, GeoTechnologies, Inc, METI/NASA, USGS, EPA, NPS, US Census Bureau, USDA, USFWS.
In the first two examples, a geocoder provided precise locations when it could not find an exact match for the input address. It might be reasonable to assume that because the geocoder could not find a match the input address was wrong or did not exist. However, geocoders rely on data from local governments and there is often a large gap between when new buildings are constructed and when their addresses and locations are available in commercial geocoding applications.
As these examples illustrate, we can’t assume, just because a precise latitude/longitude point has been returned for a given input address, that it is accurate. Fortunately, geocoding applications typically provide additional information that allow a user to gauge the accuracy of each match and filter out those that may be inappropriate. Special care should be taken when geocoding new construction.
From precision to accuracy: Best practices
To investigate the quality of the geocoded match, the user must carefully examine the results, paying particular attention to several metrics produced by the geocoder. These metrics describe characteristics of the match including: whether the address was matched to one or more locations, the match level to which the address was mapped, and the score of the location it matched to.
Each of these variables should be considered as part of a thorough quality assessment for the geocoded match. For example, if the record shows that a match was successful as a point address, but has a low score, further investigation may be necessary to fix any issues the geocoder may have missed. Without proper data review, it is possible to overlook corrections that would increase the quality of results.
The scenarios described above do not consider instances when the geocoder returns a location that is technically correct but could be improved upon using additional references data. For example, a single address could refer to a parcel that includes a single building, several buildings, or a plot of land. Because the geocoder returns an exact point, or set of coordinates, it could place that point anywhere within the parcel. Often what may be referred to as a “roof top” may not accurately represent building locations within a given parcel.
Figure 4: Geocoding match inaccuracies
Base Map Service Layer Credits: Maxar, Microsoft.
The ideal point location for geocoding depends on the application. Is the end goal of the analysis routing between locations? Identifying hazards? Flood zone determinations? Depending on the application, the line of business, specific local hazards, and pricing sophistication, different levels of uncertainty may be acceptable. However, absent an automated underwriting routine and periodic, systematic review of all geocoding results, insurers may not fully understand the uncertainty associated with their policy locations.