Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In collaboration with Treasurer/Tax Collector and the Department of Public Health, we have written a restful RESTful street address geocoding service. This page describes that service. We try to follow the OGC geocoding standard which is described here.  The service uses EAS addresses. While EAS addresses are bound to the street network, the results will differ significantly from a street network geocoding service. For example, if you try to geocode "100 Main St" using this service, you will get zero candidates. That's because (at the time of this writing) there is no "100 Main St" in EAS.  Although there is a street segment that supports "100 Main St" there is no building or proposed building that has that address. Why use EAS geocoding? If you want units or parcels that are associated with an address, this is probably your best bet. EAS addresses are curated and maintained (not mined) and we synchronize the streets and parcels with Dept. of Public Works on a daily basis. To see what EAS address look like, check out the web interface which is here (internal).

Roadmap

Currently, we are developing a geocoder that will run in ESRI ArcGIS. Eventually, we would like to eliminate the necessity of downloading and running a Python client.

...

ContactDepartment
Richard HagnerTreasurer/Tax Collector

Darrell Ascano

Treasurer/Tax Collector
Stephanie Cowles-----------------Dept. of Public Health
Aksel OlsenDept. of Planning

...

The URL is case sensitive.

Here is a simple query for 115 Main St.

query for 1200 18th will result in 2 candidates.

Your query does not have to include the zip code.

query for 2655 Hyde returns lots of parcels.

query for 1000 Pine returns lots of units.

Example Response

...

If you misspell the street name by 1 letter you get a score of 98the score goes down by 2.

If you leave off the zip code... TO DO.TODO

If you leave off the street suffix...TODO, you may get multiple matches and the score goes down by 4.

If you have the wrong street suffix ...TODOthe score goes down by 4.

Results and Performance

Using this example file from Dept. of Public Health, we saw a rate of about 1000 addresses per minute with barely any load on the servers. The results on this same dataset are summarized in this table.

...

2358 2362 15th St2362 15th St
2855 2857 2859 BUSH St2855 Bush St
3354 3358 cesar chavez St3354 Cesar Chavez St
654 GRANT Ave  2654 Grant Ave
3148 CESAR CHAVEZ BLDG#14 St3148 Cesar Chavez

 

In April 2013, a more challenging DPH data set was geocoded against the EAS geocoding service, which resulted in some Lessons from Geocoding Locations from ED Database.

Client

Here is a working command line client written for Python 2.7.

If you download it and run it with no arguments, you will get the following usage argument:

Div
idusage
alignleft
usage:
    c:\temp\geocoding_client_dph.py
    <host ip> <host port>
    <input file path> <output file path>
    <address column name> [zip code column name]
example command line:
    python geocoding_client_dph.py eas.sfgov.org 80 c:/temp/input.csv c:/temp/output.csv Address Zip

 

You must specify the host IP, host port, input file, output file name, name of the field containing addresses, and optionally the name of the field containing zip codes.

For <host IP>, you can run the client against the development (DEV), quality assurance (QA) or production (PROD) server... the production server is the most stable and is recommended.

Running the client against this input file will produce a file like this. The output file will contain your original data fields, plus 18 additional fields with elements converted from the json query result.

As documented above, each address query can return several match candidates and several parcels. The output file will contain a field called detailfields called match_input_row, match_candidate_row, and match_parcel_row that will help you decode these cases. The part of the number before the decimal point in the detail_row is the row in the input file that the query came from. Additional candidates returned are enumerated in the tenths column after the decimal point. Additional parcels returned are enumerated in the thousandths column. For example, detail_row 237.205 represents the fifth parcel of the second candidate returned for input row 237.

...