Why NetToolKit Geo?
Accuracy at an affordable price, with licensing freedom
Published on Oct 25, 2019
Comparison Of Geocoding Providers
At the end of the day, what good is a geocoder if it’s not accurate enough? If you didn’t care about accuracy, you could have a random number generator generate tuples for you and call it a day.
Many of the lower-cost geocoding providers unfortunately lack accuracy that one might hope for. For example, the US Census Bureau offers a geocoding service that is free. Sounds great; it’s the government, and you’re looking for data that the government should know. Yes, in principle. In practice, the results are less than ideal. For example, 3350 Scott Boulevard, Santa Clara, CA is an office park and the result from the US Census Bureau is the farthest off: on the opposite side of the street and about 1,000 feet off from the closest corner.
Why does the government (which has parcel data) not know where to geocode the address? The answer lies in the fact that the “government” is actually made up of many different entities. Parcel data is collected and maintained by municipal governments, which need to know lot boundaries and such for zoning purposes or to resolve property disputes; there’s no particular reason for each of those local city and county governments to develop an in-house geocoder. The federal government, on the other hand, has the resources to develop and maintain a geocoder, but they do not have access to all of the parcel data that the many different local governments have collected*. The result from PostGIS is similarly off. OpenStreetMap and OpenCageData are much closer, but still across the street. NetToolKit returns what appears to be the geometric center of the office park, very close to Google’s result. How does NetToolKit do it? Very simply, NetToolKit has rooftop data from OpenAddresses which has coordinates for most addresses in the US. Another example: 4112 10th Street South, Moorhead, MN
You can see that OpenStreetMap and OpenCageData are the furthest off, and US Census Bureau and PostGIS are off in the opposite direction. What’s happening here? Explained elsewhere, the other providers appear to be using interpolation to compensate for the lack of actual underlying coordinates. It’s hard to provide accurate coordinates without the data at the individual address level; thanks, OpenAddresses.
We can find many, many more examples of how interpolation of TIGER data yields less than ideal results.
In some cases, even the geocoding providers that generally provide reasonable results will yield inexplicable results. We have seen a few examples of results of Mapbox (which generally returns reasonable results in the US) return baffling coordinates, for example, for 4645 Wyndham Lane, Suite 210 Frisco, TX
Many times, if you zoom in, you might see plausible explanation such as a similarly named street, or that the geocoding provider simply did not have any data for the street. In this case, we see that nope, Mapbox understood the street, and seems to have placed the address on some totally different street. We’ve seen a fair share of bugs in our own geocoder and we know that the outward manifestations of the bugs can at first seem inexplicable.
Putting aside interpolation and random bugs, issues with accuracy can manifest themselves in other ways too. For example, if there is one unambiguous result, we might consider the return of multiple results as less accurate. For example, Mapbox returns three results for 13127 Raritan Ct., Westminster, CO: one that coincides with several other providers, one that is 2.5 miles away (“College Hills, Westminster, Colorado 80031”, and one that is over 1,200 miles away in Atlanta, Georgia (“Westminster, Atlanta, Georgia 30327, United States”). (For all you science nerds out there wondering if it’s really called an accuracy issue if you get three results and one of them is correct while the other two are significantly off, you might call it a precision issue, but we thought it might be overly pedantic to list it as a separate point.)
If you expect to get back one result, and you get back three, your confidence in the result might be shaken a little. Fortunately, Mapbox returns a relevance score with each of its results. In this case, Mapbox actually returns five results, but we filtered out the two that were least relevant (both with a score of 0.486). The three results that are displayed have relevance scores that are very close: 0.670667, 0.648667, 0.645493. You might think that you can just take the result with the highest relevance score, but that would fail in a case like 604 E El Camino Real, Sunnyvale, CA. Here, the other providers that we compare return a single result, but Mapbox returns five, with the relevance scores of top results both being 1 even though they are about one mile apart from each other.
Another issue that we’ll file under accuracy is when a provider might have the data but is unable to return coordinates when asked. Let’s take a look at 19707 US-59 Humble, TX.
Here, the lower accuracy providers such as OpenCage and Texas A&M return some point in the city (it’s hard to tell exactly what Texas A&M is returning since it does not return a normalized address), and they do better than both Nominatim and US Census Bureau (which don’t return results) and PostGIS (which returns a completely different address). Mapbox, interestingly, returns a different address along the same freeway, which is further than the results from OpenCage and Texas A&M, but in some sense, more precise. Mapbox does have the data, however, since if you query instead for the address by the alias 19707 Highway 59 Humble, TX, it returns the expected address. The address is somewhat unusual -- usually we think of addresses in neighborhoods or more pedestrian-friendly roads. However, many businesses line the nation’s highways and people will want to know where those places are. Are both forms of the address valid? We think so. If anything, the first address is more specific, since it differentiates US-59 from I-59 (an entirely different highway). This particular issue would be very easy for Mapbox to fix, but it is not difficult to imagine there being other parsing issues.
Incidentally, this issue is also illustrative of how smaller geocoding providers can do better by specializing in one country.
Accuracy is important, and we at NetToolKit have invested a lot in building an accurate product for geocoding addresses in the US.
Cost is easy to understand, and much easier to assess. All else being equal, we prefer spending less money for things that people really care about, like amassing a giant stuffed teddy bear collection. Since geocoding results are often displayed on maps, we’ll include map pricing here alongside geocoding pricing.
Famously, Google raised their prices in 2018, going from $0.50 per thousand map views to $14 per thousand map views (with a $200 monthly credit). If your site has even a moderate number of map views (e.g. 50,000 views per month), that’s a lot of giant stuffed teddy bears. Google’s geocoding is less expensive, but not by much: $5 per thousand requests.
Mapbox, though not as accurate as Google, charges almost the same as Google for map views once you get beyond the free tiers: $5 per thousand requests, although their geocoding cost is substantially less at $0.75 per thousand for “temporary geocodes” (something to be touched on later).
For most websites, Bing Maps requires you to fill out a Request for Quote to find out pricing for more than 125,000 transactions per year.
Tomtom’s pricing is pretty affordable at $0.50 for 1,000 transactions (minimum purchase of $25); however, it is difficult to assess their accuracy given their prohibition to do so (“12.12. You shall not use the Maps APIs in connection with similar products of TomTom’s competitors to check, compare or benchmark the Maps APIs unless explicitly approved by TomTom in writing.” -- insecure much?) Someone who looked into this reported their achieving an accuracy rate of 73%. We have, by the way, written in for permission to benchmark their service; so far, we have not heard back.
HERE has an interesting pricing model. On one hand, the free limit of 250,000 monthly transactions sounds rather generous (versus, for example, Google’s monthly allowance of 28,000); on the other hand, the freemium tier also limits developers to only 5,000 “Monthly Active Users” (which appears to refer to mobile device users, not web visitors). Hence, if you have a service with a limited number of users, then HERE might work out for you; otherwise, you could inquire about their Premier tier, which is presumably more expensive than their Pro tier, which has a monthly cost of $449.
NetToolKit’s service does not have any monthly fees and comes in at merely $0.10 for 1,000 transactions (minimum purchase of $10), well below other providers.
Among the lower accuracy providers, you can get free geocoding (for example, US Census Bureau or Nominatim). OpenCageData’s pricing can get quite inexpensive for high volumes. For example, if you’re willing to spend $500 per month, you can get 1,000 transactions for about $0.20 (they price their service by tier rather than by the number of transactions). One of the challenges of their pricing model is that if your geocoding needs are volatile, you might need to be vigilant about your pricing tier in order to avoid overpaying.
Not all lower accuracy providers come cheap, though. Texas A&M’s geocoder, for example, can run $15 for 1,000 transactions (low-volume), which is much more expensive than Google, or $1.20 for 1,000 transactions if you’re spending $1,200 per month.
For the uninitiated, selecting a mapping service might seem like a simple matter of assessing accuracy and cost and making a decision. So one would think. Unfortunately, the better known services all impose restrictions via their terms of service; some of those restrictions could pose roadblockers, as they did for us years ago.
A very common restriction among the more expensive providers (Google, Mapbox, HERE) is that geocoding results cannot be shown on maps from other services. This restriction might not seem like a big deal, but when your primary maps provider increases pricing to about 14 times their previous pricing (ahem, Google), the non-portability of your geocoding results can be a major issue if you have millions of addresses to work with. While you’re busy shipping features and keeping your service afloat, it’s nice to not have to quickly find another geocoding provider and re-geocode all of your addresses. There is, also, the financial cost of all of those geocoding transactions. Rather, what would be nice is to have geocoding results that you can use on any maps -- part of what we call free-range geocoding.
Another common restriction is only allowing caching of geocoding results under strict conditions. If your website is an online directory, and you enable geographical searches (e.g. restaurants near Mountain View, CA), you would want to cache the geocoding results for all of your entries so that your search can return in a reasonable time. Unfortunately, caching often brings its own set of compliance hurdles. HERE, for example, lists their Terms and Conditions, but that incorporates another document called the Acceptable Use Policy. Buried in the second document is:
Caching or storing any location data for the purpose of building a repository of location assets or scaling one request to serve multiple end users is prohibited. You may not use any HERE Property in a manner that pre-fetches, caches, or stores data or results, except:
- As explicitly allowed by the caching headers (HTTP/1.1 standard) returned by HERE services; or
- To the extent you are storing or caching for no more than thirty (30) days only to the extent necessary for enabling or improving an end user's use of the HERE services
So, clients who need to cache results need to check the max-age value in the cache-control header and save it to their database. You would then probably want to maintain a job that runs daily and refreshes any values that have expired (the lesser of the value from cache-control and 30 days). This scheme assumes that the first sentence is not violated (“for the purpose of building a repository of location assets or scaling one request to serve multiple end users is prohibited”) -- you wouldn’t be building for the purpose of serving multiple users per se, but that would be an effect. Are you supposed to re-geocode all of your entries for each unique user? Probably not, but what if someone at HERE wakes up ornery one day?
The terms for Google Maps are more straightforward, but similarly restrictive: “Customer can temporarily cache latitude (lat) and longitude (lng) values from the Geocoding API for up to 30 consecutive calendar days, after which Customer must delete the cached latitude and longitude values.” If your online directory had one million entries to geocode (in order to enable geographic searches), that would be $4,800 ($5/thousand queries - $200 monthly credit) every month.
Mapbox is more lenient in this regard: their terms of service has no prohibition against caching geocoding results -- it’s just that “you may not perform bulk or automated queries.” So, if you run an online directory, have fun geocoding your many entries when a user tries to do a geographic search (e.g. restaurants near me) and returning in a reasonable time.
In terms of maps, Mapbox also has additional requirements. Mapbox uses OpenStreetMap, requiring an attribution to them. Beyond that, Mapbox also requires a link to themselves and a separate link to encourage end users to improve their map quality. All three links are in addition to Mapbox branding that is shown on the map.
Out of the tier of expensive providers, Bing appears to have the fewest restrictions. That was not the case at the beginning of this year (they appear to have updated one document in April 2019 and another in October 2019).
Many of the more severe restrictions (really Mapbox? no automated queries?) will likely get sorted out over time. However, the changing terms illustrate both:
1. the difficulty in staying in compliance even when customers are trying (for example, Google Maps used to not have the requirement to delete geocoding results within 30 days) and
2. the risk that terms might suddenly change to become less favorable.
Generally, the less expensive tier of providers have no restrictions, whether on caching or displaying on others’ maps, and even allow reselling. We call that free-range geocoding.
For developers who are primarily working with addresses in the US, we think the choice is pretty clear. Yes, we’re biased, but NetToolKit offers online map tiles and accurate geocoding results at an affordable price without onerous licensing restrictions. If you think there are other dimensions that we should compare geocoding providers on, please let us know.