Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Risks

  • hardware failure
  • denial of service attacks
  • SQL injection attacks
  • credential security
  • viruses
  • data center is physically destroyed
  • data center connectivity is lost
  • database is corrupted
  • production support
  • key person risk
  • deployment practices
  • development practices
Table of Contents

Mitigation of Risks

The following section details the risks, if they will be addressed, and how they will be mitigated.

...

Hardware Failure

...

todo

General Security Measures

Until we are better staffed to administer a public facing site, we shall allow only traffic from the city and county to access the site and web services. Will, can I ask you to do this work?

...

Hardware Failure

...

Hardware failures are handled by applogic. As long as we do not use too many resources on a given grid, the hardware failover is automatic. Will, is there a way to be notified when there is a hardware failure?

Denial of Service Attacks

...

The application code uses a framework to escape all user input which effectively dispenses with this problem.

...

User Credential Security

...

Should we/can we enforce strong passwords?
Should we/can we force password changes?
Each year we will conduct a user account audit and disable or remove all accounts that are not in use.

...

System Administration Credential Security

...

We shall review all of the administration credentials, and do the following:

  • insure that default passwords are not used
  • insure that passwords are strong
Viruses

todo

...

data center is physically destroyed

...

todo

...

data center connectivity is lost

...

todo

...

database is corrupted

...

todo

...

production support

...

todo

...

key person risk

...

todoWe will not be scanning for viruses.

...

Data Center Connectivity Lost Indefinitely

...

With each version release we will back up the entire application to a secondary data center. We will make database backups every 24 hours day and copy them to the secondary data center. Should we loose the primary data center, we will bring up the application using the most recent database backup. Performance of the application for the first day will be sluggish because the map cache will be empty/out of date. It will take overnight to reseed the map cache.

...

Database Integrity

...

We are using the Postgres database which is know for stability and robustness.
In any event, every 24 hours we

  • run consistency checks to insure integrity
  • run a full vacuum/analyze to guarantee good performance
  • backup the database
  • copy the backup to offsite grid
  • report any issues to application support
Deployment practices

The MAD application includes a data server, a map server and a web server. The application deployment are managed using standard practices with 3 separate environments (DEV, QA, PROD). Changes of any sort are first tested in the DEV environment. If the tests pass, we apply the changes to QA where business users conduct testing. Only after the business users approve the changes do we release any changes to PROD. This includes everything from OS upgrades through to our own application code, minor and major.

...

The software development team uses version control (Subversion), bug tracking (Jira), wiki collaboration (Confluence), all of which is hosted by Atlassian. All source code, ddl, dml, design documents, etc are stored on Atlasian. When a version is released, the repository is tagged. When bug fixes are made to production, a branch is create in the repository.

...

Disk Space

...

Every 24 hours we

  • check that there are no disk space issues
  • notify application support if there are any issues
    Will, can I ask you to look at this?

Recovery Time Objective

Without MAD, DBI will not be able to issue permits. (Is this correct?)
The recovery time objective for the application is 1 hour. (Is this OK?)2 hours. Hema, is this acceptable?

Recovery Point Objective

The recovery point objective for the database is 8 24 hours. (Is this OK?)

In some cases, such as a data center failure, the map cache that we fail over to
will be unseeded. The cache can be reseeded over night but the responsiveness
of the entire application will be slow until the the cache is reseeded.

access to the database is more important than access to the cached map data

upon a DC failover, we will have to reseed chache

Can we assume that we are not going to use replication?

If no replication, how much work can we loose? (1 day, 4 hours?, 2 hours?)

monitor disk spaceHema, is this acceptable? I am assuming that, given our limited resources, we will use database backups for recovery, and that we do not want to use replication. If we were to use replication, we could have a recovery point objective right down to a single transaction. But we have no expertise here and so I recommend against this approach.

Administration Roles

  • user administration
  • linux system admin
  • postgres db admin
  • production support

...

user administration

...

todo

...

linux system admin

...

todo

...

postgres db admin

...

todo

...

production support

...

todo