PostgreSQL Installation
Repository Setup
If you blindly type "yum install postgresql-devel", you may have trouble getting everything installed properly.
Use the postgresql repositories instead of the Centos repsositories.
Here is how you do that.
Pre-Installation
Before you install, you should make sure you have a clean slate.
First, look for other postgres installs. I found that my centos 5 comes with postgres 8.1 already installed.
I suggest that we remove these older versions before we install the newer ones.
If you really need both versions to work side by side - this wiki page won't help you.
yum list postgres* yum uninstall postgresql yum uninstall postgresql_XXX yum uninstall postgresql_YYY
If you installed from source you can go the the build directory and type
make uninstall
Installation
While we developed EAS under postgresql 8.3.7, installing 8.3.7 was giving us trouble. 8.3.11 was available via YUM and once we used YUM the install went smoothly.Using YUM is the way to go.
I followed these instructions:
Once you have YUM configured to use only the postgres respositories and not the Centos repos (see above), the following steps provide a pain free installation.
yum install postgresql yum install postgresql-devel yum install postgresql-server yum install postgis yum install postgresql-contrib
Additional Links
Initializing Database
Our Oracle trained DBAs decided to intialize the DB here
/ur01/pgsql/data
Then we want to create 2 databases.
createdb mad --owner postgres createdb sfmaps --owner postgres
At this point, you can use pg_restore.
For more on that read
http://eas.googlecode.com/svn/trunk/releases/1_0_prod/README
Links
- http://www.postgresql.org/docs/8.3/static/creating-cluster.html
- http://www.postgresql.org/docs/8.3/static/runtime-config-file-locations.html
- http://archives.postgresql.org/pgsql-admin/2007-05/msg00104.php
Performance Tuning
The EAS databases (mad, sfmaps) need to support 3 sorts of activities
- relatively high I/O because it's geospatial
- light OLTP actvity from single django web application
- most connections from single django web application (no connection pooling)
- small number of simultaneous connections (estimate max of 50)
- heavy write activity during overnight ETL processes
- heavy read processes from geoserver to produce web maps, especially during cache seeding
The host VMs for the DB have different amounts of RAM as shown in the table below.
Here I discuss the settings in the postgresql.conf file.
The default settings are for low resource environments and will result in relatively poor performance.
These settings are my "best guess" based on some reading (see links below).
We'll need to do some monitoring in PROD to tune these values.
parameter | value | value | value | links (below) |
---|---|---|---|---|
VM GB RAM | 4 | 6 | 6 |
|
shared_buffers | 1GB | 1.5GB | 1.5GB | 1,2 |
work_mem | 16MB | 16MB | 16MB | 2 |
maintenance_work_mem | 128MB | 128MB | 128MB | 2 |
wal_buffers | 1MB | 1MB | 1MB | 2 |
checkpoint_segments | 6 | 6 | 6 | 2 |
random_page_cost | 2 | 2 | 2 | 2 |
seq_page_cost | 1 | 1 | 1 | 2 |
effective_cache_size | 2GB | 3GB | 3GB | 1 |
max_connections | 100 | 100 | 100 | default |
A restart is needed for these configurations to take affect.
Links
1 - http://www.westnet.com/~gsmith/content/postgresql/pg-5minute.htm
2 - http://workshops.opengeo.org/postgis-intro/tuning.html
3 - http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
4 - http://postgis.refractions.net/docs/ch06.html
Security - Quick Start
This is my best guess on security and users - feel free to recommend alternatives.
For the time being, I have modified
- hba.conf
- postgresql.conf.
so as to allow the postgres user to connect to any database from any of
- localhost
- geoserver
- web server
This needs to be substantially refined for production.
The following section should address this in complete detail.
Security - Production
The dataserver should be accessible only from the following:
- web server
- geo server
- localhost
We will use postgres's hba.conf to control this access.
Remember that each environment (DEV, QA, PROD) will have it's own trio of severs:
- web server
- geoserver
- dataserver
See This diagram for an example.
Get the IPs for the hba.conf from Henry or Paul.
We want to be as strict as is reasonable - not as strict as possible.
Database Accounts
I know we'll want at least 2 users
- postgres
- eas_user
Let's discuss if there is any value in having additional users such as:
- geoserver user
- etl_user
Postgres User
The postgres user will be
- used mostly by DBAs
- be used by developers in DEV and QA
- used in emergency cases by developers in PROD
EAS User
The eas_user is for users of the EAS application...
- needs to be able to read and write to all tables the mad and sfmaps databases
- should we use a group or role here?
- to make things simple, we'll use eas_user for access from geoserver
For any new users, we'll need to set the correct access privileges.
Until now, I have been using the postgres user everywhere (sorry!).
Installation Errors
error:
configure: error: no acceptable C compiler found in $PATH
solution:
yum install gcc-c++
error:
configure: error: readline library not found
solution:
yum install readline-devel.x86_64
error:
configure: error: zlib library not found
solution:
yum install zlib.x86_64 zlib-devel.x86_64
During pg_restore you may see the following:
pg_restore --host localhost --port 5432 --username postgres --dbname mad ./mad-1_0-beta-2.backup pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 734; 1255 116672554 FUNCTION st_simplifypreservetopology(geometry, double precision) postgres pg_restore: [archiver (db)] could not execute query: ERROR: could not find function "topologypreservesimplify" in file "/usr/lib64/pgsql/liblwgeom.so" Command was: CREATE FUNCTION st_simplifypreservetopology(geometry, double precision) RETURNS geometry AS '$libdir/liblwgeom', 'topolo... pg_restore: [archiver (db)] could not execute query: ERROR: function public.st_simplifypreservetopology(geometry, double precision) does not exist Command was: ALTER FUNCTION public.st_simplifypreservetopology(geometry, double precision) OWNER TO postgres; WARNING: errors ignored on restore: 2 [postgres@CentOSDB17882 ~]$
solution
Don't worry about these errors.
This is because we upgraded postgres and postgis.
Moreover, we do not use these functions.