Table of Contents |
---|
Introduction
In collaboration with the Treasurer/Tax Collector we are making a restful and the Department of Public Health, we have written a RESTful street address geocoding service available.
This This page describes the that service.
We mostly follow the We try to follow the OGC geocoding standard which is described here.
The Request
Here is the DEV URL with an example address of "100 Main St":
The service uses the street segment network only.
These data are carefully maintained by Department of Public Works, and brought into EAS on a nightly basis.
We return a single best candidate and have not implemented a scoring system.
Instead of scoring we return exceptions to help you correct the input data.
The location is based on the linear interpolation of the matching street segment.
We offset by 4 meters (about 12 feet) to the even or odd side like this:
The Response
We strongly prefer JSON (available now) but we can produce XML if you buy us a beer.
The JSON is formatted for readability but your browser may obscure that.
The way to see the nicely formated JSON is to right click on the web page and select "view source".
If you use chrome or firefox you can type "ctrl-u".
If you use Internet Explore you can type "alt-v-c".
In any case, the JSON should look something like this:described here. The service uses EAS addresses. While EAS addresses are bound to the street network, the results will differ significantly from a street network geocoding service. For example, if you try to geocode "100 Main St" using this service, you will get zero candidates. That's because (at the time of this writing) there is no "100 Main St" in EAS. Although there is a street segment that supports "100 Main St" there is no building or proposed building that has that address. Why use EAS geocoding? If you want units or parcels that are associated with an address, this is probably your best bet. EAS addresses are curated and maintained (not mined) and we synchronize the streets and parcels with Dept. of Public Works on a daily basis. To see what EAS address look like, check out the web interface which is here (internal).
Roadmap
Currently, we are developing a geocoder that will run in ESRI ArcGIS. Eventually, we would like to eliminate the necessity of downloading and running a Python client.
User Contacts
Contact | Department |
---|---|
Richard Hagner | Treasurer/Tax Collector |
Darrell Ascano | Treasurer/Tax Collector |
----------------- | Dept. of Public Health |
Aksel Olsen | Dept. of Planning |
Example URLs
NOTE: all the links in this section are to an internal server which is not accessible from outside the city (sorry)
The URL is case sensitive.
Here is a simple query for 115 Main St.
A query for 1200 18th will result in 2 candidates.
Your query does not have to include the zip code.
A query for 2655 Hyde returns lots of parcels.
A query for 1000 Pine returns lots of units.
Example Response
We currently support JSON; see just below for an example. The basic structure follows the OGC eocoding standard which ESRI helped specify. As you can infer from the sample below, EAS provides unit level and parcel level information. We currently exclude retired entities (base address, unit address, parcel) because that's what 95% of our users need. To see what's available, either contact us or take a look the data model.
Code Block |
---|
{ "inputAddressStringspatialReference": { "100 Main St", "inputZipCodeStringwkid": "94105" 4326 }, "spatialReferencecandidates": [ { { "wkid": 4326 }, "foundMatchaddress": true"115 MAIN ST", "foundEasMatch": false, "candidateslocation": [ { { "addressy": "100 MAIN ST",37.79152826999239, "locationx": "POINT (-122.3948959209805700 37.7917552676099080)", 39400753158013 }, "score": null100, "attributes": { "StreetName": "MAIN", base_address_num": 115, "base_address_suffix": null, "StreetTypestreet_name": "STMAIN", }, "street_type": "ST", "addressNumberzipcode": "10094105", "streetparcels": {[ "streetNamePrimary": { "basemap_streetblk_namelot": "MAIN3717012", "streetblk_typelot": "ST3717012", }, "unit_num": null, "streetNameAliasesdate_map_add": [],"1998-07-01", "ldate_fmap_adddrop": 101null, "l_t_add": 199, "address_base_flg": true }, "r_f_add": 100, { "map_blk_lot": "3717013", "rblk_t_addlot": 198"3717013", "segunit_cnnnum": 8628000,null, "geometrydate_map_add": "MULTILINESTRING ((-122.3949402377882200 37.7918420903234780, -122.3934079394635700 37.7906126704266400))" 1998-07-01", }, "zipCodedate_map_drop": "94105",null, "jurisdiction "address_base_flg": "SF MAIN"true } ], "validationsunits": [], } } "didYouMean": [] } |
Results and Performance
We have tested the DEV implementation against all of the Treasurer/Tax Collector addresses; here are the results.
found street match | found eas match | count | percent |
---|---|---|---|
false | false | 1444 | 2.72 |
true | false | 6481 | 12.24 |
true | true | 44996 | 85.02 |
52921 | 100 |
The service processes about 500 addresses per minute which should be fine for our purposes.
We have done no performance optimization so we can probably make it faster if you have a compelling reason.
Technical Details
If you are interested in technical details you can find the code mostly in 2 places.
More or less...
And in an effort to set a good example we have written real unit tests which you can see here!
Examples
The nominal example:
You can send data with no street suffix if it's not ambiguous:
http://10.250.60.189/geocode/streetNetwork/findAddressCandidates?f=json&Address=157 Noe&Zip=94114
But be aware that there are ambiguous cases:
We do not parse degraded addresses:
Address that include extra information at the end parse but typically fail to return a candidate
But if it's clearly as unit, it works
...
Field Definitions
Anchor | ||||
---|---|---|---|---|
|
- spatialReference - Defines the coordinate system for the specified coordinates. WKID 4326 refers to the geographic coordinate system (GCS) using datum WGS 1984.
- candidates - a list of matched address(es) and their attributes
- address - concatenated street address
- location - point within the parcel, not the centroid of the parcel
- x - longitude
- y - latitude
- score - see below
- attributes
- base_address_num
- base_address_suffix
- street_name
- street_type
- zipcode
- parcels
- map_blk_lot - the "base parcel" identifier
- blk_lot - the parcel identifier aka assesor parcel number or APN
- date_map_add - the date that the parcel was added to the city map by public works
- date_map_drop - the date that the parcel was retired from the city map by public works
- unit_num - Unit number associated with this parcel, if present
- address_base_flg
- units - a list of the units for that address
Scoring
If everything matches perfectly, you should get a score of 100.
If you misspell the street name by 1 letter the score goes down by 2.
If you leave off the zip code... TO DO.
If you leave off the street suffix, you may get multiple matches and the score goes down by 4.
If you have the wrong street suffix the score goes down by 4.
Results and Performance
Using this example file from Dept. of Public Health, we saw a rate of about 1000 addresses per minute with barely any load on the servers. The results on this same dataset are summarized in this table.
score | percent |
---|---|
88 | 0.02 |
90 | 0.10 |
94 | 6.28 |
96 | 2.89 |
98 | 90.29 |
The example file contains 4,725 rows with 908 unique addresses. 896 of these addresses (98.7%) were resolved. Where multiple addresses are returned, a reasonably good result can be expected from accepting the match with the highest score. Of the 896 addresses that were resolved, 863 (96%) returned a single result. 33 (3%) returned two addresses, and 2 (0.02%) returned three addresses. The result with the highest score was the correct candidate in all cases.
Of the remaining 12 addresses that were not resolved:
Two addresses failed due to irretrievable data-entry errors:
625 \ St |
1344 St |
Seven addresses were not in the enterprise address database:
509 MINNA St |
24 WENTWORTH St |
6 St. Louis Alley |
15 ROMOLO Pl |
3101 MISSION St |
752 SHOTWELL St |
2725 VAN NESS Ave |
Five listed extra information or multiple street numbers, and were resolved manually:
2358 2362 15th St | 2362 15th St |
2855 2857 2859 BUSH St | 2855 Bush St |
3354 3358 cesar chavez St | 3354 Cesar Chavez St |
654 GRANT Ave 2 | 654 Grant Ave |
3148 CESAR CHAVEZ BLDG#14 St | 3148 Cesar Chavez |
In April 2013, a more challenging DPH data set was geocoded against the EAS geocoding service, which resulted in some Lessons from Geocoding Locations from ED Database.
Client
Here is a working command line client written for Python 2.7.
If you download it and run it with no arguments, you will get the following usage argument:
Div | ||||
---|---|---|---|---|
| ||||
usage:
c:\temp\geocoding_client_dph.py
<host ip> <host port>
<input file path> <output file path>
<address column name> [zip code column name]
example command line:
python geocoding_client_dph.py eas.sfgov.org 80 c:/temp/input.csv c:/temp/output.csv Address Zip |
You must specify the host IP, host port, input file, output file name, name of the field containing addresses, and optionally the name of the field containing zip codes.
For <host IP>, you can run the client against the development (DEV), quality assurance (QA) or production (PROD) server... the production server is the most stable and is recommended.
Running the client against this input file will produce a file like this. The output file will contain your original data fields, plus 18 additional fields with elements converted from the json query result.
As documented above, each address query can return several match candidates and several parcels. The output file will contain fields called match_input_row, match_candidate_row, and match_parcel_row that will help you decode these cases. The part of the number before the decimal point in the detail_row is the row in the input file that the query came from. Additional candidates returned are enumerated in the tenths column after the decimal point. Additional parcels returned are enumerated in the thousandths column. For example, detail_row 237.205 represents the fifth parcel of the second candidate returned for input row 237.
The first candidate and the first 1-2 parcels are generally, but not always, all that is necessary to resolve the address.
In future we would like to include command-line flags with the option to only return the best match.
In future we would also like to return only unique parcel matches
Finally, in future we would like to eliminate retired parcels that no longer have mapped lot lines from the results. The sample input file returns five map_blk_lots that have been retired (evidenced by the fact that "date_map_drop" is defined) with no alternatives for those addresses.
You should be able to feed any standard comma separated value (csv) file to this client.
You'll need python 2.7 or better, which is installed with ESRI ArcGIS Desktop 10.1.
Sometime in the future we hope to provide a web page for a file upload and bulk geocoding; this will move us closer to a self-serve model. For now, this solution is quite workable and will give us a good start on providing the web page. If you don't have the resources to install and run the client on your own, consider dropping us a line - maybe we can process it for you.
If you want to run this client alongside an ESRI product that uses a different version of python, you'll need to take some special steps - we hope to include note on how to do that later.
Server
The server code is here and here.
The tests are over here.
Invalid Address List
The following apparent errors in the EAS were discovered in the course of debugging and examining the results of this geocoder:
Address | Error Description |
15 ROMOLO Pl | Not in EAS, although the parcel is mapped |
3148 CESAR CHAVEZ BLDG#14 St | Many addresses for one parcel |
752 SHOTWELL St | Not in EAS, although the parcel is mapped |
670 Natoma | Linked to two APNs, both of which are retired |
1443 Clayton St | Linked to many other addresses, probably erroneously |
1411 Mason St | Linked to an adjacent parcel, probably erroneously |
Attachments |
---|