pg_dump is a great tool.  You can get a consistent database backup from a live database without impacting your running traffic. Many people don’t realize however that a pg_dump isn’t actually a complete backup…

pg_dump

pg_dump operates on a single database in a postgres cluster.  You provide it with the name of a database and [by default] it dumps all the data, you can also select individual tables, schemas, etc… with different flags.  Essentially, given  a database instance like so:

pgadmin3straight

pg_dump is capable of operating at this level of the tree:

Screen Shot 2016-10-27 at 3.53.00 PM

 

Okay great.  You’ll notice however that your backups don’t actually include the other objects that are associated with an instance:

  • Tablespaces
  • Group Roles
  • Login Roles (users)

The other problem is that pg_dump can operate on 1 and only 1 database at a time.

Problems

Based on the above, we have a few major issues

  1. pg_dump does not include tablespaces, groups or users
  2. pg_dump only dumps ONE of my databases in a cluster / instance

Solutions

Groups / Users / Tablespaces

The simple solution is to use pg_dumpall -g

-g = ‘dump only global objects (users, groups, tablespaces)’

Essentially, pg_dumpall is designed to dump ‘all objects in a cluster‘.  if you run pg_dumpall with no arguments it will dump all data in all databases in plain text mode.  Frankly, I’ve always found this to not be very useful for large databases.  However, the -g flag says “only dump the global objects.

Since I don’t recommend pg_dumpall (in most cases) for your data backups, I recommend a strategy that looks like the following:

pg_dumpall -g > globals.sql

pg_dump -d billing -F c -C > billingdb.bak

This gives me the globals (users, groups, tablespaces) AND then I get a custom format (compressed with table of contents and selective restore) pg_dump of the billings database.  That leaves me with significantly more flexibility than a gigantic clear, text file of my data.  With this type of backup, I could choose to only restore the ‘ledger’ table from the billingdb.bak file if I wanted, or, I could restore the ‘ledger’ table definition with no data.  The options are huge and endless.  Just running a plain pg_dumpall would give me a giant text file that I’d need to sift through with an editor to accomplish the same goal.

Multiple DBs

Okay, great!  What if I have 5 databases per cluster that I need backed up ?

pg_dump only operates on one db at a time, I don’t want to hard-code my backup script!  No problem I say, you can use psql to generate a list of DB’s and run pg_dump in a loop.  My personal, complete version looks something like:
printf "Start: date" >> $backupDir/$dt/backup.log
pg_dumpall -g > /$backupDir/$dt/globals.sql
dbs=psql -t -c "select array_to_string(ARRAY(select datname from pg_database),' ')" | sed -e '/^$/d' | sed -e 's/^[ \t]//g'
for db in $dbs
do
printf "Performing backup of: $db\n"
pg_dump -v -Fp -b -C $db > /$backupDir/$dt/$db.sql 2>> /$backupDir/$dt/backup.log
done
printf "Complete: date" >> $backupDir/$dt/backup.log

This complete script includes rotation logic for simplifying your logical backup management.