Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Change title to make it distinct from the new 'Bulk Loader Process' page.



Section

Table of Contents

Prepare the Input Data

The Bulk Loader can load data into the EAS from either a shapefile or comma-separated value (CSV) file, and is run on the ETL server as a job.py job.  Currently, the ETL server is running the Windows operating system.

Shapefile

Ensure that your shapefile has a defined projection of NAD 1983 StatePlane California III FIPS 0403 Feet (EPSG 102643) before attempting to load it into EAS.

...

field namedata typelength
OBJECTIDinteger9
blockstring80
lotstring80
unitstring80
unit_sfxstring80
st_numinteger4
st_num_sfxstring20
st_namestring32
st_suffix*string10
st_typestring4
sourcestring32
st_prefixstring

10

Examples

addressblocklotunitunit_sfxst_numst_num_sfxst_namest_suffix*st_typest_prefix
1 SOUTH VAN NESS AVE
3506001

1
VAN NESS

AVESOUTH
20 25TH AVE NORTH1301018

20
25THNORTHAVE

127 A CENTRAL AVE

1233

007



127ACENTRAL
AVE
5900 03RD ST 22175431B0932217
5900
03RD
ST
100 BROADWAY0141003B

100
BROADWAY


200 MAIN ST (APARTMENT) 2D37390082D200
MAIN
ST

Comma-separated Value (CSV) file

CSV files must be named bulkload_data.csv and placed in the following directory on the ETL server:

...

field namedata typelength
blockstring256
lotstring256
unitstring256
unit_sfxstring256
street_numberstring

256

street_number_sfxstring256
street_namestring256
street_sfx*string256
street_typestring256
load_sourcestring32

Examples

addressblocklotunitunit_sfxstreet_numberstreet_number_sfxstreet_namestreet_sfx*street_type
1 SOUTH VAN NESS AVE
3506001

1
SOUTH VAN NESS

AVE
20 25TH AVE NORTH1301018

20
25THNORTHAVE

127 A CENTRAL AVE

1233

007



127ACENTRAL
AVE
5900 03RD ST 22175431B0932217
5900
03RD
ST
100 BROADWAY0141003B

100
BROADWAY

200 MAIN ST (APARTMENT) 2D37390082D200
MAIN
ST

...

* Use this field for streets that have a Street Name Post Directional, for example the "North" in "25th Ave North".

Prepare the Output Table

The Bulk Loader will produce an output table (bulkloader.address_extract) that contains, among other information, the disposition of each input address record that was submitted to the Bulk Loader.  The output table may already have records in it for any of the following reasons:

...

  1. Establish your SSH tunnel to the target EAS database server.
  2. Invoke pgAdmin and navigate to the target database, and then the output table (bulkloader.address_extract).
  3. Right-click the output table and invoke the Query Tool.
  4. Remove leftover records from the previous Bulk Loader iteration.

    Code Block
    languagesql
    TRUNCATE bulkloader.address_extract, bulkloader.blocks_nearest;


  5. Run garbage collection to free disk space held by truncated tuples.

    Code Block
    languagesql
    VACUUM FULL ANALYZE bulkloader.address_extract;
    VACUUM FULL ANALYZE bulkloader.blocks_nearest;


  6. Optional: count all records in the database for use in a before/after record count comparison.

    Code Block
    languagesql
    SELECT schemaname,relname,n_live_tup FROM pg_stat_user_tables ORDER BY schemaname,relname,n_live_tup;


Run the Bulk Loader

The Bulk Loader is run on the ETL server, and depending on the number of address records that are going to be processed, the Bulk Loader will complete its work at any time from a few minutes, to a few hours.  The following table contains some information for some typical runs.

...

  1. Remote desktop (RDP) into the ETL server.
  2. Move to the directory that contains the file job.py.

    Code Block
    languagetext
    cd C:\apps\eas_automation\automation\src


  3. Stage the input data.

    Code Block
    languagetext
    titleShapefile Input Format
    python job.py --job stage_bulkload_shapefile --env [Your<environment> Environment] --action EXECUTE


    Code Block
    languagetext
    titleCSV Input Format
    python job.py --job stage_bulkload_csv --env [Your<environment> Environment] --action EXECUTE


  4. Run the bulkload job.

    Code Block
    languagetext
    python job.py --job bulkload --env [Your Environment]<environment> --action EXECUTE


Evaluate the Results

Anchor
evaluate_bulkloader_schema
evaluate_bulkloader_schema
Evaluate 'bulkloader' schema

Evaluate the results of the bulk-load by opening the database and inspecting the output table bulkloader.address_extract for values in the field exception_text

  1. Addresses that were successfully bulk-loaded will have a date in the field load_tms and a NULL value in the field exception_text.

    Code Block
    languagesql
    -- Count records successfully bulk-loaded:
    SELECT COUNT(*)
    FROM bulkloader.address_extract
    WHERE NOT(load_tms IS NULL) AND (exception_text IS NULL);
    
    -- View records successfully bulk-loaded:
    SELECT *
    FROM bulkloader.address_extract
    WHERE NOT(load_tms IS NULL) AND (exception_text IS NULL);


  2. The presence of exception text indicates a problem with the source data.

    Code Block
    languagesql
    -- Count records with exception:
    SELECT COUNT(*)
    FROM bulkloader.address_extract
    WHERE NOT(exception_text IS NULL);
    
    -- View records with exception:
    SELECT *
    FROM bulkloader.address_extract
    WHERE NOT(exception_text IS NULL);
    
    -- View totals for each distinct 'exception_text' value
    SELECT exception_text, Count(*) 
    FROM bulkloader.address_extract
    GROUP BY exception_text
    ORDER BY exception_text;


  3. To rectify bulk-load exceptions, do not correct any exception records in the output table bulkloader.address_extract. Instead, correct the corresponding records in the source data and then re-run the Bulk Loader.
  4. If block and lot values were included in the input data, a value should be found in the field address_x_parcel_id. Run the query below to find any records that had block and lot input values but did not get assigned a value for address_x_parcel_id.

    Code Block
    languagesql
    -- Count records where block and lot were provided but no address_x_parcel was assigned:
    SELECT COUNT(*) FROM bulkloader.address_extract
    WHERE (NOT (block IS NULL) OR NOT (block IS NULL))
    AND address_x_parcel_id IS NULL;


  5. Base addresses that were submitted by the Bulk Loader but were already in the EAS are indicated by values in the fields address_base_id and load_tms, and NULL values in the fields std_unit, street_segment, zone_id, address_id and exception_text.

    Code Block
    languagesql
    -- Count base addresses that were already in the EAS:
    SELECT COUNT(*)
    FROM bulkloader.address_extract
    WHERE NOT(address_base_id IS NULL)
          AND NOT(load_tms IS NULL)
          AND (std_unit IS NULL)
          AND (street_segment_id IS NULL)
          AND (zone_id IS NULL)
          AND (address_id IS NULL)
          AND (exception_text IS NULL);
    
    -- View base addresses that were already in the EAS:
    SELECT *
    FROM bulkloader.address_extract
    WHERE NOT(address_base_id IS NULL)
          AND NOT(load_tms IS NULL)
          AND (std_unit IS NULL)
          AND (street_segment_id IS NULL)
          AND (zone_id IS NULL)
          AND (address_id IS NULL)
          AND (exception_text IS NULL);


  6. Count and view new base addresses by looking for non-NULL values in the street_segment_id field. Cross-check with the 'public' schema by counting new 'public.address_base' records as shown below in Evaluate 'public' schema.

    Code Block
    languagesql
    -- Count base addresses that were not already in the EAS:
    SELECT COUNT(*)
    FROM bulkloader.address_extract
    WHERE NOT (street_segment_id IS NULL);
    
    -- View base addresses that were not already in the EAS:
    SELECT *
    FROM bulkloader.address_extract
    WHERE NOT (street_segment_id IS NULL);


  7. Evaluate records with non-NULL address_id values to count/view unit addresses that were added to the EAS.

    Code Block
    languagesql
    -- Count unit addresses added to the EAS:
    SELECT COUNT(*)
    FROM bulkloader.address_extract
    WHERE NOT (address_id IS NULL);
    
    -- View unit addresses added to the EAS:
    SELECT *
    FROM bulkloader.address_extract
    WHERE NOT (address_id IS NULL);


Anchor
evaluate_public_schema
evaluate_public_schema
Evaluate 'public' schema

Evaluate tables in the public schema to cross-check record counts from the bulkloader schema evaluation.

  1. Find the change_request_id of the most recent bulk load.

    Code Block
    languagesql
    -- Get the change_request_id of the most recent bulk load:
    SELECT change_request_id
    FROM public.change_requests
    WHERE requestor_comment LIKE 'bulk load change request'
    ORDER BY change_request_id DESC
    LIMIT 1;
    
    -- Make note of this id for the queries in the subsequent steps.


  2. Count and view new base addresses associated with the change request.

    Code Block
    languagesql
    -- Count new base addresses
    SELECT COUNT (*)
    FROM public.addresses
    WHERE activate_change_request_id = change_request_id_goes_here
    AND unit_num IS NULL;
    
    -- View new base addresses
    SELECT *
    FROM public.addresses
    WHERE activate_change_request_id = change_request_id_goes_here
    AND unit_num IS NULL;


  3. Count and view new unit addresses associated with the change request.

    Code Block
    languagesql
    --- Count new unit addresses
    SELECT COUNT (*)
    FROM public.addresses
    WHERE activate_change_request_id = change_request_id_goes_here
    AND NOT unit_num IS NULL;
    
    -- View new unit addresses
    SELECT *
    FROM public.addresses
    WHERE activate_change_request_id = change_request_id_goes_here
    AND NOT unit_num IS NULL;


  4. Count and view all new 'public.addresses' records associated with the change request.

    Code Block
    languagesql
    -- Count all new addresses
    SELECT COUNT (*)
    FROM public.addresses
    WHERE activate_change_request_id = change_request_id_goes_here;
    
    -- View all new addresses
    SELECT *
    FROM public.addresses
    WHERE activate_change_request_id = change_request_id_goes_here;


  5. Count and view new 'public.address_base' records added to the EAS. Cross-check with the 'bulkloader' schema as shown above in Evaluate 'bulkloader' schema.

    Code Block
    languagesql
    -- Count new 'public.address_base' records
    SELECT COUNT(*) FROM public.address_base, public.addresses
    WHERE public.address_base.address_base_id = public.addresses.address_base_id
    AND public.addresses.address_base_flg = TRUE
    AND public.addresses.activate_change_request_id = change_request_id_goes_here;
    
    -- View new 'public.address_base' records
    SELECT activate_change_request_id, address_id, public.address_base.*
    FROM public.address_base, public.addresses
    WHERE public.address_base.address_base_id = public.addresses.address_base_id
    AND public.addresses.address_base_flg = TRUE
    AND public.addresses.activate_change_request_id = change_request_id_goes_here;


  6. Optional: count all records in the database for use in a before/after record count comparison.

    Code Block
    languagesql
    SELECT schemaname,relname,n_live_tup FROM pg_stat_user_tables ORDER BY schemaname,relname,n_live_tup;


Free space after large Bulk Loader batches

After successful loading and evaluating large batches of addresses, clean up the database by truncating and vacuuming the the Bulk Loader output tables.

...