The Basics of Geocoding
Published on Aug 7, 2019
What is geocoding?
Geocoding is the process of converting street addresses (or other names of geographic places) into coordinates. For example, humans can easily understand an address such as “1600 Pennsylvania Ave NW, Washington, DC” or “2 Lincoln Memorial Cir NW, Washington, DC.” However, in order for computers to perform certain operations (such as finding the distance or a route between the two addresses), the human-readable addresses first need to be converted into coordinates that computers can work with. Geocoding is the software process that converts addresses into coordinates.
What is geocoding used for?
Coordinates are used for operations as basic as plotting a point on a map. The automated plotting of points on maps opens up a plethora of possibilities. For example, if a chain restaurant wants to easily show where its locations are, it can geocode the addresses of its locations and plot them on a map. Likewise, if a city planner wanted to see whether the city had enough fire stations, the planner could plot the addresses of the current fire stations and draw a radius around each point to indicate where fire trucks could arrive at within a reasonable timeframe. On a personal level, individuals might want to visualize cities that they have visited. Each of these scenarios require that software know how to plot the input, and geocode enables that.
How to geocode?
There are several different facets to geocoding, including address parsing and normalization, conflict arbitration, name resolution, and performance optimization, but two basic components of geocoding are rooftop geocoding and interpolation.
At the most basic level, geocoding is done through database lookups: take an address, find it in a database, and return the associated longitude and latitude. Colloquially, this method might be called rooftop geocoding, and is generally accepted as the most precise form of geocoding since information is known about each specific land parcel.
© OpenStreetMap contributors
We do not live in a universal addressing system with a systematic street naming convention, and consequently, there is no way to directly translate a street address into coordinates without underlying data. This intrinsic reliance on underlying data means that in theory, organizations that go out and regularly survey buildings and record geographic coordinates will have an advantage in providing a service over organizations that do not have access to such sources of data. Unsurprisingly, going out to gather such data street by street is very expensive, and companies that acquire such data tend to charge much more for their services.
Fortunately, many municipalities make geocoding-related data available. Apparently, some countries have contributed geocoding information for their entire area to OpenStreetMap, prompting some to believe that in those countries, OpenStreetMap geocoding can be more accurate than some commercial providers; unfortunately, that is not the case for the United States. Happily, many local and state governments have made coordinates available for parcels in their areas, and people have worked on aggregating and standardizing that data, publishing it at OpenAddresses.io. Those who are curious about the coverage of OpenAddresses can view their coverage map, which shows that coverage of the US is quite good.
(For those curious about the meaning of the different shades of green, we wrote to OpenAddresses and were told that “Light green is data from states or larger entities, dark green is data from counties or smaller entities.” They believe the distinction could be meaningful in that data from the smaller organizations is often rolled up into the data from the larger organizations, meaning that data at the local level could be fresher.) What about the holes in coverage? For example, there are sections of Arizona and Tennessee that are not covered by OpenAddresses data. Another public source that has coordinates for individual parcels is the National Address Database organized by the US Department of Transportation. While their coverage is less than OpenAddresses’, they do cover some areas that OpenAddresses does not.
What about gaps in coverage for both OpenAddresses and National Address Database? Perhaps the largest set of geocoding data from a single organization comes from the United States Census Bureau, which publishes TIGER, but the data comes with a twist: coordinates are at the level of street segments, not individual parcels.
© OpenStreetMap contributors
For example, suppose you were looking up 2964 Neet Avenue, San Jose, CA in the TIGER data. The data set has coordinates for either end of the street segment (stored in what is called the geometry of the street segment) and it also has information regarding the house numbers at either end (TIGER shows that one end of the street segment has house numbers 2900 and 2901 -- one house on either side of the street-- while the other end has 3098 and 3099). House numbers on that street that fall outside of that range likely are part of another street segment (a street might be broken up into multiple segments) or might not exist (for example, in the case of someone misremembering an address).
For house numbers that are on the street segment, if you make the assumption that houses are evenly spaced, then you might come up with a reasonable approximation of the house location by interpolating between the endpoints of the street segment. Interpolation works better with some street segments than others. For example, for a certain part of Darnell Street in Houston, TX, TIGER reports one street segment starting at 5200 and ending at 5300. In reality, the street segment starts with 5203 and ends with 5231. Since the range in the database is much wider than it is in reality, any actual address on the street segment will be interpolated to a much smaller section of the actual street.
The US Census Bureau, for example, offers a geocoder based on the TIGER data, and you can see how their results are on the other side of the street segment compared to Google’s, Bing’s, and ours (the standard configuration for PostGIS relies heavily on TIGER data, so it is not surprising that their result is close to the US Census Bureau’s result). Interpolation is not going to be as precise as rooftop coordinates where you have the specific coordinates for the address you are looking for. However, when you do not have rooftop coordinates available, interpolation is better than nothing. Fortunately, rooftop-precision coordinates are available for most of the US.
There is much more that can be written about geocoding, but if you are just looking to convert street addresses into latitude and longitude coordinates, please learn more about our service.