Production Readiness

Risks and Mitigation

Here we enumerate the risks, and detail how the risk will be mitigated.

General Security Measures

Until we are better staffed to administer a public facing site, we shall allow only traffic from the city and county to access the site and web services. By adopting this simple and effective meaure, we can sidestep a number of other issues such as DOS and SQL injection attacks.
TODO - Will, can I ask you to do this work? I recall that we can use a subnet mask or equivelent.

Hardware Failure

Hardware failures are handled by applogic. As long as we do not use too many resources on a given grid, the hardware failover is automatic.
TODO - Will, is there a way we can be notified when there is a hardware failure? Do we need to do anything here.

Denial of Service Attacks

We will not be taking any measures to detect or to mitigate a DOS attack.

SQL Injection Attacks

The application code uses a framework to escape all user input which effectively dispenses with this problem.

User Credential Security

Should we/can we enforce strong passwords?
TODO - Paul, check and see if we can do this.
Should we/can we force password changes?
TODO - Paul, check and see if we can do this.
Each year we will conduct a user account audit and disable or remove all accounts that are not in use.
DEFER

System Administration Credential Security

administration credentials include:

ETL Processing

windows user account
ssh keys

Web Application

applogic grid root
root (or sudo) and each linux VM
tomcat
geoserver
postgres
web application("superuser" - django)
apache httpd ("nobody"?)

We shall review all of the administration credentials, and do the following:

insure that default passwords are not used
insure that passwords are strong
TODO - Paul

Viruses

We will not be scanning for viruses.

Data Center Connectivity Lost Indefinitely

With each version release we will back up the entire application to a secondary data center. We will make database backups every 24 hours day and copy them to the secondary data center. Should we lose the primary data center, we will bring up the application using the most recent database backup. Performance of the application for the first day will be sluggish because the map cache will be empty/out of date. It will take overnight to reseed the map cache.

TODO - add job to completely reseed the map cache (Paul)

Database Integrity

We are using the Postgres database which is know for stability and robustness.
In any event, every 24 hours we

run consistency checks to insure integrity
run a full vacuum/analyze to guarantee good performance
backup the database
copy the backup to offsite grid
report any issues to application support

TODO - Paul with help from DBA and Will

Deployment practices

The Enterprise Addressing System includes a data server, a map server and a web server. The application deployment are managed using standard practices with 3 separate environments (DEV, QA, PROD). Changes of any sort are first tested in the DEV environment. If the tests pass, we apply the changes to QA where business users conduct testing. Only after the business users approve the changes do we release any changes to PROD. This includes everything from OS upgrades through to our own application code, minor and major.

Since all application resources are hosted "in the cloud", all deployment and ETL activities in PROD shall be conducted through SSH tunnnels. Under the DVE and QA environments we shall occasionally allow non-SSH access for convenience.

For various legacy oriented reasons, we were unable to employ standard practices for the extract, transform, and load (ETL) processes. While this process has been coded to support DEV, QA, and PROD environments, none of the participating systems have more than a single environment. (Is this correct?) This includes each of the data servers (SFGIS, DPW, ASR). The workstation that execute the ETL is virtualized but is not backed up and has no failover plan in place. (Is this correct?)
TODO - Jeff, please do what you can to bolt this down. Add more details here.

Lack of FME License Manager failover; FME uses shrink-wrapped, non-production database; etc. These and other FME issues are documented on Citypedia.

The Department of Technology has a Change Control system in place to advise and vet proposed changes to production systems. Change Controls will be created prior to the release of any changes to the Enterprise Addressing System production system, per departmental policy.

Development Practices

The software development team uses version control (Subversion), bug tracking (Jira), wiki collaboration (Confluence), all of which is hosted by Atlassian. All source code, DDL, DML, design documents, etc. are stored on Atlassian. When a version is released, the repository is tagged. When bug fixes are made to production, a branch is create in the repository.

Disk Space

Every 24 hours we

check that there are no disk space issues
notify application support if there are any issues
TODO - Will, can I ask you to look at this? Paul can help.

Failover

Before we go into production and once every year, we will conduct a failover exercise to ensure that we are able to provide business continuity.
TODO - Paul

Recovery Time Objective

Without MAD, DBI will not be able to issue permits. (Is this correct?) The recovery time objective for the application is 2 hours. Hema, is this acceptable?

Recovery Point Objective

The recovery point objective for the database is 24 hours. Hema, is this acceptable? I am assuming that, given our limited resources, we will use database backups for recovery, and that we do not want to use replication. If we were to use replication, we could have a recovery point objective right down to a single transaction. But we have no expertise here and so I recommend against this approach.

Administration Roles

We need some wiki pages and some bodies - detailed below

user administration

TODO - Hema, can DBI help with this?

linux system admin

sfgis will do this
todo - Paul - wiki pages

postgres db admin

sfgis will do this
We should have a real DBA sign on.
todo - Paul - wiki pages

production support