The Bulk Loader can load data into the EAS from either a shapefile or comma-separated value (CSV) file, and is run on the ETL server as a job.py
job. Currently, the ETL server is running the Windows operating system.
Ensure that your shapefile has a defined projection before attempting to load it into EAS
Shapefiles must be named bulkload.shp
and placed in the following directory on the ETL server:
C:\apps\eas_automation\app_data\data\bulkload_shapefile
Input shapefiles require the following schema:
field name | data type | length |
---|---|---|
OBJECTID | integer | 9 |
block | string | 80 |
lot | string | 80 |
unit | string | 80 |
unit_sfx | string | 80 |
st_num | integer | 4 |
st_num_sfx | string | 20 |
st_name | string | 32 |
st_suffix* | string | 10 |
st_type | string | 4 |
source | string | 32 |
st_prefix | string | 10 |
address | block | lot | unit | unit_sfx | st_num | st_num_sfx | st_name | st_suffix* | st_type | st_prefix |
---|---|---|---|---|---|---|---|---|---|---|
1 SOUTH VAN NESS AVE | 3506 | 001 | 1 | VAN NESS | AVE | SOUTH | ||||
20 25TH AVE NORTH | 1301 | 018 | 20 | 25TH | NORTH | AVE | ||||
127 A CENTRAL AVE | 1233 | 007 | 127 | A | CENTRAL | AVE | ||||
5900 03RD ST 2217 | 5431B | 093 | 2217 | 5900 | 03RD | ST | ||||
100 BROADWAY | 0141 | 003B | 100 | BROADWAY | ||||||
200 MAIN ST (APARTMENT) 2D | 3739 | 008 | 2 | D | 200 | MAIN | ST |
CSV files must be named bulkload_data.csv
and placed in the following directory on the ETL server:
C:\apps\eas_automation\app_data\data
Input CSV files require the following schema:
field name | data type | length |
---|---|---|
block | string | 256 |
lot | string | 256 |
unit | string | 256 |
unit_sfx | string | 256 |
street_number | string | 256 |
street_number_sfx | string | 256 |
street_name | string | 256 |
street_sfx* | string | 256 |
street_type | string | 256 |
load_source | string | 32 |
address | block | lot | unit | unit_sfx | street_number | street_number_sfx | street_name | street_sfx* | street_type |
---|---|---|---|---|---|---|---|---|---|
1 SOUTH VAN NESS AVE | 3506 | 001 | 1 | SOUTH VAN NESS | AVE | ||||
20 25TH AVE NORTH | 1301 | 018 | 20 | 25TH | NORTH | AVE | |||
127 A CENTRAL AVE | 1233 | 007 | 127 | A | CENTRAL | AVE | |||
5900 03RD ST 2217 | 5431B | 093 | 2217 | 5900 | 03RD | ST | |||
100 BROADWAY | 0141 | 003B | 100 | BROADWAY | |||||
200 MAIN ST (APARTMENT) 2D | 3739 | 008 | 2 | D | 200 | MAIN | ST |
* Use this field for streets that have a Street Name Post Directional, for example the "North" in "25th Ave North".
The Bulk Loader will produce an output table (bulkloader.address_extract
) that contains, among other information, the disposition of each input address record that was submitted to the Bulk Loader. The output table may already have records in it for any of the following reasons:
If you do not want these existing records to be considered in your upcoming bulk-load attempt, then these records should be deleted prior to running your bulk-load by following these steps:
bulkloader.address_extract
).Verify that the output table is not empty.
SELECT COUNT(*) AS record_count FROM bulkloader.address_extract; |
If the output table is not empty (record_count > 0
), then truncate the output table and vacuum the database.
TRUNCATE bulkloader.address_extract; VACUUM FULL ANALYZE; |
Verify that the output table is now empty (record_count = 0
).
SELECT COUNT(*) AS record_count FROM bulkloader.address_extract; |
The Bulk Loader is run on the ETL server, and depending on the number of address records that are going to be processed, the Bulk Loader will complete its work at any time from a few minutes, to a few hours. The following table contains some information for some typical runs.
environment | type | records | stage | bulk load |
---|---|---|---|---|
SF_DEV | shapefile | 558 | 7-9 seconds | 30-130 seconds |
SF_DEV | shapefile | 49,980 | 25 seconds | 5 minutes |
Move to the directory that contains the file job.py
.
cd C:\apps\eas_automation\automation\src |
Stage the input data.
python job.py --job stage_bulkload_shapefile --env [Your Environment] --action EXECUTE |
python job.py --job stage_bulkload_csv --env [Your Environment] --action EXECUTE |
Run the bulkload
job.
python job.py --job bulkload --env [Your Environment] --action EXECUTE |
Evaluate the results of the bulk-load by opening the database and inspecting the output table bulkloader.address_extract
for values in the field exception_text
Addresses that were successfully bulk-loaded will have a date in the field load_tms
and a NULL value in the field exception_text
.
-- Count records successfully bulk-loaded: SELECT COUNT(*) FROM bulkloader.address_extract WHERE NOT(load_tms IS NULL) AND (exception_text IS NULL); -- View records successfully bulk-loaded: SELECT * FROM bulkloader.address_extract WHERE NOT(load_tms IS NULL) AND (exception_text IS NULL); |
The presence of exception text indicates a problem with the source data.
-- Count records with exception: SELECT COUNT(*) FROM bulkloader.address_extract WHERE NOT(exception_text IS NULL); -- View records with exception: SELECT * FROM bulkloader.address_extract WHERE NOT(exception_text IS NULL); |
bulkloader.address_extract
. Instead, correct the corresponding records in the source data and then re-run the Bulk Loader.If block and lot values were included in the input data, a value should be found in the field address_x_parcel_id
. Run the query below to find any records that had block and lot input values but did not get assigned a value for address_x_parcel_id.
-- Count records where block and lot were provided but no address_x_parcel was assigned: SELECT COUNT(*) FROM bulkloader.address_extract WHERE (NOT (block IS NULL) OR NOT (block IS NULL)) AND address_x_parcel_id IS NULL; |
Base addresses that were submitted by the Bulk Loader but were already in the EAS are indicated by values in the fields address_base_id
and load_tms
, and NULL values in the fields street_segment
, zone_id
or exception_text
.
-- Count base addresses that were already in the EAS: SELECT COUNT(*) FROM bulkloader.address_extract WHERE NOT(address_base_id IS NULL) AND NOT(load_tms IS NULL) AND (street_segment_id IS NULL) AND (zone_id IS NULL) AND (exception_text IS NULL); -- View base addresses that were already in the EAS: SELECT * FROM bulkloader.address_extract WHERE NOT(address_base_id IS NULL) AND NOT(load_tms IS NULL) AND (street_segment_id IS NULL) AND (zone_id IS NULL) AND (exception_text IS NULL); |
Base addresses that were submitted by the Bulk Loader and were not already in the EAS are indicated by values in the street_segment_id
field.
-- Count base addresses that were not already in the EAS: SELECT COUNT(*) FROM bulkloader.address_extract WHERE NOT (street_segment_id IS NULL) -- View base addresses that were not already in the EAS: SELECT * FROM bulkloader.address_extract WHERE NOT (street_segment_id IS NULL) |
Evaluate records with non-NULL address_id
values to count/view addresses that were added to the EAS.
-- Count addresses added to the EAS: SELECT COUNT(*) FROM bulkloader.address_extract WHERE NOT (address_id IS NULL) -- View addresses added to the EAS: SELECT * FROM bulkloader.address_extract WHERE NOT (address_id IS NULL) |
Evaluate tables in the public
schema to cross-check record counts from the bulkloader
schema evaluation.
Find the change_request_id
of the most recent bulk load.
-- Get the change_request_id of the most recent bulk load: SELECT change_request_id FROM public.change_requests WHERE requestor_comment LIKE 'bulk load change request' ORDER BY change_request_id DESC LIMIT 1; -- Make note of this id for the queries in the subsequent steps. |
Count and view new base addresses associated with the change request.
-- Count new base addresses SELECT COUNT (*) FROM public.addresses WHERE activate_change_request_id = change_request_id_goes_here AND unit_num IS NULL -- View new base addresses SELECT * FROM public.addresses WHERE activate_change_request_id = change_request_id_goes_here AND unit_num IS NULL. |
Count and view new unit addresses associated with the change request.
--- Count new unit addresses SELECT COUNT (*) FROM public.addresses WHERE activate_change_request_id = change_request_id_goes_here AND NOT unit_num IS NULL -- View new unit addresses SELECT * FROM public.addresses WHERE activate_change_request_id = change_request_id_goes_here AND NOT unit_num IS NULL |
Count and view all new addresses associated with the change request.
-- Count all new addresses SELECT COUNT (*) FROM public.addresses WHERE activate_change_request_id = change_request_id_goes_here -- View all new addresses SELECT COUNT (*) FROM public.addresses WHERE activate_change_request_id = change_request_id_goes_here |
After successful loading and evaluating large batches of addresses, clean up the database by truncating the Bulk Loader output table (bulkloader.address_extract
) and vacuuming the database.
TRUNCATE bulkloader.address_extract; VACUUM FULL ANALYZE; |