Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In collaboration with Treasurer/Tax Collector and the Department of Public Health, we have written a restful RESTful street address geocoding service. This page describes that service. We try to follow the OGC geocoding standard which is described here.  The service uses EAS addresses. While EAS addresses are bound to the street network, the results will differ significantly from a street network geocoding service. For example, if you try to geocode "100 Main St" using this service, you will get zero candidates. That's because (at the time of this writing) there is no "100 Main St" in EAS.  Although there is a street segment that supports "100 Main St" there is no building or proposed building that has that address. Why use EAS geocoding? If you want units or parcels that are associated with an address, this is probably your best bet. EAS addresses are curated and maintained (not mined) and we synchronize the streets and parcels with Dept. of Public Works on a daily basis. To see what EAS address look like, check out the web interface which is here (internal).

Roadmap

Currently, we are developing a geocoder that will run in ESRI ArcGIS. Eventually, we would like to eliminate the necessity of downloading and running a Python client.

...

ContactDepartment
Richard HagnerTreasurer/Tax Collector

Darrell Ascano

Treasurer/Tax Collector
Stephanie Cowles-----------------Dept. of Public Health
Aksel OlsenDept. of Planning

...

The URL is case sensitive.

Here is a simple query for 115 Main St.

query for 1200 18th will result in 2 candidates.

Your query does not have to include the zip code.

query for 2655 Hyde returns lots of parcels.

query for 1000 Pine returns lots of units.

Example Response

...

Code Block
{
    "spatialReference": {
        "wkid": 4326
    },
    "candidates": [
        {
            "address": "115 MAIN ST",
            "location": {
                "y": 37.79152826999239,
                "x": -122.39400753158013
            },
            "score": 100,
            "attributes": {
                "base_address_num": 115,
                "base_address_suffix": null,
                "street_name": "MAIN",
                "street_type": "ST",
                "zipcode": "94105",
                "parcels": [
                    {
                        "map_blk_lot": "3717012",
                        "blk_lot": "3717012",
                        "unit_num": null,
                        "date_map_add": "1998-07-01",
                        "date_map_drop": null,
                        "address_base_flg": true
                    },
                    {
                        "map_blk_lot": "3717013",
                        "blk_lot": "3717013",
                        "unit_num": null,
                        "date_map_add": "1998-07-01",
                        "date_map_drop": null,
                        "address_base_flg": true
                    }
                ],
                "units": []
            }
        }
    ]
}
Field Definitions
Anchor
fields
fields
  • spatialReference - Defines the coordinate system for the specified coordinates. WKID 4326 refers to the geographic coordinate system (GCS) using datum WGS 1984. 
  • candidates - a list of matched address(es) and their attributes
    • address - concatenated street address
    • location - point within the parcel, not the centroid of the parcel
      • x - longitude
      • y - latitude
    • score - see below
    • attributes
    • base_address_num
    • base_address_suffix
    • street_name
    • street_type
    • zipcode
    • parcels
      • map_blk_lot - the "base parcel" identifier
      • blk_lot - the parcel identifier aka assesor parcel number or APN
      • date_map_add - the date that the parcel was added to the city map by public works
      • date_map_drop - the date that the parcel was retired from the city map by public works
      • unit_num  - Unit number associated with this parcel, if present
      • address_base_flg 
    • units - a list of the units for that address

...

If you misspell the street name by 1 letter you get a score of 98the score goes down by 2.

If you leave off the zip code... TO DO.

If you leave off the street suffix..., you may get multiple matches and the score goes down by 4.

If you have the wrong street suffix ...the score goes down by 4.

Results and Performance

Using this example file from Dept. of Public Health, we saw a rate of about 1000 addresses per minute with barely any load on the servers. The results on this same dataset are summarized in this table.

...

2358 2362 15th St2362 15th St
2855 2857 2859 BUSH St2855 Bush St
3354 3358 cesar chavez St3354 Cesar Chavez St
654 GRANT Ave  2654 Grant Ave
3148 CESAR CHAVEZ BLDG#14 St3148 Cesar Chavez

 

In April 2013, a more challenging DPH data set was geocoded against the EAS geocoding service, which resulted in some Lessons from Geocoding Locations from ED Database.

Client

Here is a working command line client written for Python 2.7.

If you download it and run it with no arguments, you will get the following usage argument:

Div
idusage
alignleft
usage:
    c:\temp\geocoding_client_dph.py
    <host ip> <host port>
    <input file path> <output file path>
    <address column name> [zip code column name]
example command line:
    python geocoding_client_dph.py eas.sfgov.org 80 c:/temp/input.csv c:/temp/output.csv Address Zip

 

You must specify the host IP, host port, input file, output file name, name of the field containing addresses, and optionally the name of the field containing zip codes.

For <host IP>, you can run the client against the development (DEV), quality assurance (QA) or production (PROD) server... the production server is the most stable and is recommended.

Running the client against this input file will produce a file like this. The output file will contain your original data fields, plus 18 additional fields with elements converted from the json query result.

As documented above, each address query can return several match candidates and several parcels. The output file will contain fields called match_input_row, match_candidate_row, and match_parcel_row that will help you decode these cases. The part of the number before the decimal point in the detail_row is the row in the input file that the query came from. Additional candidates returned are enumerated in the tenths column after the decimal point. Additional parcels returned are enumerated in the thousandths column. For example, detail_row 237.205 represents the fifth parcel of the second candidate returned for input row 237.

The first candidate and the first 1-2 parcels are generally, but not always, all that is necessary to resolve the address.

In future we would like to include command-line flags with the option to only return the best match.

In future we would also like to return only unique parcel matches

Finally, in future we would like to eliminate retired parcels that no longer have mapped lot lines from the results. The sample input file returns five map_blk_lots that have been retired (evidenced by the fact that "date_map_drop" is defined) with no alternatives for those addresses.

You should be able to feed any standard comma separated value (csv) file to this client.

You'll need python 2.7 or better and you'll need to install the simplejson python library, which is installed with ESRI ArcGIS Desktop 10.1.

Sometime in the future we hope to provide a web page for a file upload and bulk geocoding; this will move us closer to a self-serve model. For now, this solution is quite workable and will give us a good start on providing the web page. If you don't have the resources to install and run the client on your own, consider dropping us a line - maybe we can process it for you.

...