Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

In collaboration with the Treasurer/Tax Collector we are making a street address validation service available. This page describes this service. This is the first version and we are waiting on the Treasurer to comment.

Here is the DEV URL with an example address of "100 Main St":

http://10.250.60.189/validate/streetAddress/100%20main%20St/

Here are validation links to 10 random addresses from  the Treasurer's database:

1501 HOWARD ST
PMB 169 3701 SACRAMENTO ST
1560 DAVIDSON AVE
331 CORTLAND AVE
2000 FILLMORE ST
1017 DIVISADERO ST
1618 UNION ST
2250 PINE ST
559 PACIFIC AVE
157 NOE ST
1000 18TH

 and the Department of Public Health, we have written a RESTful street address geocoding service. This page describes that service. We try to follow the OGC geocoding standard which is described here.  The service uses EAS addresses. While EAS addresses are bound to the street network, the results will differ significantly from a street network geocoding service. For example, if you try to geocode "100 Main St" using this service, you will get zero candidates. That's because (at the time of this writing) there is no "100 Main St" in EAS.  Although there is a street segment that supports "100 Main St" there is no building or proposed building that has that address. Why use EAS geocoding? If you want units or parcels that are associated with an address, this is probably your best bet. EAS addresses are curated and maintained (not mined) and we synchronize the streets and parcels with Dept. of Public Works on a daily basis. To see what EAS address look like, check out the web interface which is here (internal).

Roadmap

Currently, we are developing a geocoder that will run in ESRI ArcGIS. Eventually, we would like to eliminate the necessity of downloading and running a Python client.

User Contacts
ContactDepartment
Richard HagnerTreasurer/Tax Collector

Darrell Ascano

Treasurer/Tax Collector
-----------------Dept. of Public Health
Aksel OlsenDept. of Planning
Example URLs

NOTE: all the links in this section are to an internal server which is not accessible from outside the city (sorry(sad))

The URL is case sensitive.

Here is a simple query for 115 Main St.

query for 1200 18th will result in 2 candidates.

Your query does not have to include the zip code.

query for 2655 Hyde returns lots of parcels.

query for 1000 Pine returns lots of units.

Example Response

We currently support JSON; see just below for an example. The basic structure follows the OGC eocoding standard which ESRI helped specify. As you can infer from the sample below, EAS provides unit level and parcel level information. We currently exclude retired entities (base address, unit address, parcel) because that's what 95% of our users need. To see what's available, either contact us or take a look the data model.

Code Block
{
    "spatialReference": {
        "wkid": 4326
    },
    "candidates": [
        {
            "address": "115 MAIN ST",
            "location": {
                "y": 37.79152826999239,
                "x": -122.39400753158013
            },
            "score": 100,
            "attributes": {
                "base_address_num": 115,
                "base_address_suffix": null,
                "street_name": "MAIN",
                "street_type": "ST",
                "zipcode": "94105",
                "parcels": [
                    {
                        "map_blk_lot": "3717012",
                        "blk_lot": "3717012",
                        "unit_num": null,
                        "date_map_add": "1998-07-01",
                        "date_map_drop": null,
                        "address_base_flg": true
                    },
                    {
                        "map_blk_lot": "3717013",
                        "blk_lot": "3717013",
                        "unit_num": null,
                        "date_map_add": "1998-07-01",
                        "date_map_drop": null,
                        "address_base_flg": true
                    }
                ],
                "units": []
            }
        }
    ]
}
Field Definitions
Anchor
fields
fields
  • spatialReference - Defines the coordinate system for the specified coordinates. WKID 4326 refers to the geographic coordinate system (GCS) using datum WGS 1984. 
  • candidates - a list of matched address(es) and their attributes
    • address - concatenated street address
    • location - point within the parcel, not the centroid of the parcel
      • x - longitude
      • y - latitude
    • score - see below
    • attributes
    • base_address_num
    • base_address_suffix
    • street_name
    • street_type
    • zipcode
    • parcels
      • map_blk_lot - the "base parcel" identifier
      • blk_lot - the parcel identifier aka assesor parcel number or APN
      • date_map_add - the date that the parcel was added to the city map by public works
      • date_map_drop - the date that the parcel was retired from the city map by public works
      • unit_num  - Unit number associated with this parcel, if present
      • address_base_flg 
    • units - a list of the units for that address
Scoring

If everything matches perfectly, you should get a score of 100.

If you misspell the street name by 1 letter the score goes down by 2.

If you leave off the zip code... TO DO.

If you leave off the street suffix, you may get multiple matches and the score goes down by 4.

If you have the wrong street suffix the score goes down by 4.

Results and Performance

Using this example file from Dept. of Public Health, we saw a rate of about 1000 addresses per minute with barely any load on the servers. The results on this same dataset are summarized in this table.

scorepercent
880.02
900.10
946.28
962.89
9890.29

The example file contains 4,725 rows with 908 unique addresses. 896 of these addresses (98.7%) were resolved. Where multiple addresses are returned, a reasonably good result can be expected from accepting the match with the highest score. Of the 896 addresses that were resolved, 863 (96%) returned a single result. 33 (3%) returned two addresses, and 2 (0.02%) returned three addresses. The result with the highest score was the correct candidate in all cases.

Of the remaining 12 addresses that were not resolved:

Two addresses failed due to irretrievable data-entry errors:

625 \ St
1344  St

Seven addresses were not in the enterprise address database:

509 MINNA St
24 WENTWORTH St
6 St. Louis Alley
15 ROMOLO Pl
3101 MISSION St
752 SHOTWELL St
2725 VAN NESS Ave

Five listed extra information or multiple street numbers, and were resolved manually:

2358 2362 15th St2362 15th St
2855 2857 2859 BUSH St2855 Bush St
3354 3358 cesar chavez St3354 Cesar Chavez St
654 GRANT Ave  2654 Grant Ave
3148 CESAR CHAVEZ BLDG#14 St3148 Cesar Chavez

 

In April 2013, a more challenging DPH data set was geocoded against the EAS geocoding service, which resulted in some Lessons from Geocoding Locations from ED Database.

Client

Here is a working command line client written for Python 2.7.

If you download it and run it with no arguments, you will get the following usage argument:

Div
idusage
alignleft
usage:
    c:\temp\geocoding_client_dph.py
    <host ip> <host port>
    <input file path> <output file path>
    <address column name> [zip code column name]
example command line:
    python geocoding_client_dph.py eas.sfgov.org 80 c:/temp/input.csv c:/temp/output.csv Address Zip

 

You must specify the host IP, host port, input file, output file name, name of the field containing addresses, and optionally the name of the field containing zip codes.

For <host IP>, you can run the client against the development (DEV), quality assurance (QA) or production (PROD) server... the production server is the most stable and is recommended.

Running the client against this input file will produce a file like this. The output file will contain your original data fields, plus 18 additional fields with elements converted from the json query result.

As documented above, each address query can return several match candidates and several parcels. The output file will contain fields called match_input_row, match_candidate_row, and match_parcel_row that will help you decode these cases. The part of the number before the decimal point in the detail_row is the row in the input file that the query came from. Additional candidates returned are enumerated in the tenths column after the decimal point. Additional parcels returned are enumerated in the thousandths column. For example, detail_row 237.205 represents the fifth parcel of the second candidate returned for input row 237.

The first candidate and the first 1-2 parcels are generally, but not always, all that is necessary to resolve the address.

In future we would like to include command-line flags with the option to only return the best match.

In future we would also like to return only unique parcel matches

Finally, in future we would like to eliminate retired parcels that no longer have mapped lot lines from the results. The sample input file returns five map_blk_lots that have been retired (evidenced by the fact that "date_map_drop" is defined) with no alternatives for those addresses.

You should be able to feed any standard comma separated value (csv) file to this client.

You'll need python 2.7 or better, which is installed with ESRI ArcGIS Desktop 10.1.

Sometime in the future we hope to provide a web page for a file upload and bulk geocoding; this will move us closer to a self-serve model. For now, this solution is quite workable and will give us a good start on providing the web page. If you don't have the resources to install and run the client on your own, consider dropping us a line - maybe we can process it for you.

If you want to run this client alongside an ESRI product that uses a different version of python, you'll need to take some special steps - we hope to include note on how to do that later.

Server

The server code is here and here.

The tests are over here.

Invalid Address List

The following apparent errors in the EAS were discovered in the course of debugging and examining the results of this geocoder:

AddressError Description
15 ROMOLO PlNot in EAS, although the parcel is mapped
3148 CESAR CHAVEZ BLDG#14 StMany addresses for one parcel
752 SHOTWELL StNot in EAS, although the parcel is mapped
670 NatomaLinked to two APNs, both of which are retired
1443 Clayton StLinked to many other addresses, probably erroneously
1411 Mason StLinked to an adjacent parcel, probably erroneously

Attachments