Helpful PostgreSQL Logging and Defaults

I use PostgreSQL every single day.  I develop against multiple versions of the database (4 versions running in parallel as I write), I have apps that use it and, my daily DBA-ing.  The biggest question I get from newbies and veterans a like is: “What are your PostgreSQL defaults?”

If you follow postgres, you already know that the default configuration (postgresql.conf) is very conservative from a resource (CPU, IOPS, Disk Space) perspective.  Over the last 10 years, we [the community] have developed some straightforward and easy to understand formulas that will help you tune… shared_buffers for example.  The item that always gets left out though is logging.  As a developer, I’m always looking for ways to see “How the database is answering the questions I ask”.  When I get a new postgres instance set up, I have a personal (somewhere in the cobwebs) checklist that I run.  Some are based on the purpose of the deployment (shared_buffers, work_mem, etc…), some are things that I always set.  Aside from memory, the biggest set of “standard” items I set are all related to logging.  I’m big on monitoring (my pg_stat_activity patch was accepted back for 9.2) and having the right detail presented to me is important.

TL;DR

 logging_collector = on
 log_filename = ‘postgresql-%a.log’
log_truncate_on_rotation=on
 log_line_prefix = ‘%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h ‘
 log_checkpoints = on
 log_autovacuum_min_duration = 0
 log_temp_files = 0
 log_lock_waits = on
the goal here is to have postgres ‘bubble-up’ details about what’s going on to us so

that we can see exactly what is happening when.  In addition to setting the log files, you can pick up a great log analyzer (like pgBadger or pgCluu) that will break all of this down in to a gorgeous, easy-to-use report.

Scott’s logging defaults

logging_collector = on

By default, the logging_collector is disabled, this means that all of the info / error / warning, etc… logs that come out of postgres get kicked are sent to stderr.  Problem is, when your postgres forks from the terminal and then you close the terminal, you lose all your logs.  Turning this setting on will write all the log messages in a sub-directory of $PGDATA (log_directory = ‘pg_log’ by default, which, I like).  This setting also requires a restart, so I always end up setting it before I even start postgres the first time.  Postgres will automatically create the pg_log directory (or whatever log_directory is set to) in the data directory.  Now, all of those info / error / warning will get written a log file in the pg_log directory.

log_filename = ‘postgresql-%a.log’

This sets the name of the actual file that log messages will be written to (in your shiny, new pg_log log_destination directory).  The %a means that you’ll see Mon, Tue, Wed, Thu, Fri, Sat, Sun.  The patterns are based on standard strftime escapes (man page).  The reason that I like using %a is that you get auto-rotation on the log files.  You will keep one week’s worth of logs, when you rotate 7 days later, you won’t be creating a huge number of log files.

Note: Depending on my requirements, I will adjust this for production.  If I have any special retention policy that I need to abide by, I’ll make the filename: postgresql-YYYY-MM-DD-HH24mmSS.log  (log_filename = ‘postgresql-%Y-%m-%d-%H%M%S.log’ ).  The trouble with this is that you’ll need to deal with log cleanup yourself (cron to archive logs + 30 days old … ).

log_truncate_on_rotation=on

This essentially says “when I switch log files, if a log already exists with that name, truncate the existing one and write new logs to an empty file”.

For example, on Monday, May 15th, we wrote a log file:

postgresql-Mon.log

Now, at midnight on Monday, May 22nd, postgres is going to rotate its log back to:

postgresql-Mon.log

This is our data from last week.  If you leave log_truncate_on_rotation = off (the default), then postgres will append to that log file.  That means, in December, you’ll have data for every Monday throughout the year in the same file.  If you set this to on, it’ll will nuke the old data and give you only data from the most recent Monday.  If you need keep log files for longer than 7 days, I recommend you use a more complete name for your log files (see log_filename above).

log_line_prefix = ‘%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h ‘

The log_line_prefix controls what every single line in the log file looks like.  My personal favorite here is actually stolen directly from pgBadger’s suggested configuration.  pgBadger is an amazing tool for parsing and analyzing the postgres log files.  If you set the log_line_prefix directly, pgBadger can provide incredible detail about: “what happens where, who did it and, when the did it”.   Just to show you the difference….

default log file error message:

LOG: database system was shut down at 2016-05-19 11:40:57 EDT
LOG: MultiXact member wraparound protections are now enabled
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
ERROR: column "create_error_for_fun" does not exist at character 8
STATEMENT: select create_error_for_fun;

With a rich log_line_prefix

2016-05-19 11:46:22 EDT [14913]: [1-1] user=,db=,app=,client= LOG: database system was shut down at 2016-05-19 11:46:21 EDT
2016-05-19 11:46:22 EDT [14913]: [2-1] user=,db=,app=,client= LOG: MultiXact member wraparound protections are now enabled
2016-05-19 11:46:22 EDT [14911]: [3-1] user=,db=,app=,client= LOG: database system is ready to accept connections
2016-05-19 11:46:22 EDT [14917]: [1-1] user=,db=,app=,client= LOG: autovacuum launcher started
2016-05-19 11:46:27 EDT [14921]: [1-1] user=postgres,db=postgres,app=psql,client=[local] ERROR: column "create_error_for_fun" does not exist at character 8
2016-05-19 11:46:27 EDT [14921]: [2-1] user=postgres,db=postgres,app=psql,client=[local] STATEMENT: select create_error_for_fun;

Now, I know who, what, where, why and when.

log_checkpoints = on

Like any database, postgres is going to use the disks.  In order to accomplish dealing with I/O efficiently, postgres (like many other databases) uses something called a checkpoint in order to synchronize it’s memory to those disks.  Checkpoints occur periodically based on a number of things (load, configuration, etc…).  The thing to keep in mind is that a checkpoint will use disk I/O, the busier the database, the more it requires.  Setting this to on means that you know without a doubt when a checkpoint occured.  It’s the same ol’ story: “Every once in a while, I get long-running queries, different ones each time!”… “I’m seeing a spike in IOPS and I don’t know why!” … “Sometimes my data load gets bogged down for some reason!” … etc…

This very well could be due to a large checkpoint occurring.  Since it’s based on load / configuration / time, it’s critical that the server write a log of when checkpoint occurred so that you’re not left in the dark.  There’s also useful information in these logs about how postgres is behaving (it can even help you tune your memory settings).

2016-05-19 12:01:26 EDT [15274]: [1-1] user=,db=,app=,client= LOG: checkpoint starting: immediate force wait
2016-05-19 12:01:26 EDT [15274]: [2-1] user=,db=,app=,client= LOG: checkpoint complete: wrote 0 buffers (0.0%); 
     0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.001 s; 
     sync files=0, longest=0.000 s, average=0.000 s

(NB: There was some line-wrap here, I’ve manually entered carriage returns.  The second line prints as one, long line)

It’s important to note that this is going to up the volume of your logs.  But, it’s minimal and the benefits far outweigh the few extra bytes needed to write the message.  (pgBadger will parse these up and give you a nice, clear picture of your checkpoint behavior).

 log_autovacuum_min_duration = 0

(default: log_autovacuum_min_duration = -1). The tuning convention isn’t just a boolean on/ off.  Essentially, you are telling postgres: “When an autovacuum runs for x milliseconds or longer, write a message to the log”.  Setting this to 0 (zero) means that you will log all autovacuum operations to the log file.

For the uninitiated, autovacuum is essentially a background process that does garbage collection in postgres.  If you’re just starting out, what you really need to know is the following:

  • It’s critical that autovacuum stay enabled
  • autovacuum is another background process that uses IOPS

Because autovacuum is necessary and uses IOPS, it’s critical that you know what it’s doing and when.  Just like log_checkpoints (above), autovacuum runs are based on load (thresholds on update / delete velocities on each table).  This means that vacuum can kick off at virtually any time.

2016-05-19 12:32:25 EDT [16040]: [4-1] user=,db=,app=,client= LOG: automatic vacuum of table "postgres.public.pgbench_branches": index scans: 1
 pages: 0 removed, 12 remain
 tuples: 423 removed, 107 remain, 3 are dead but not yet removable
 buffer usage: 52 hits, 1 misses, 1 dirtied
 avg read rate: 7.455 MB/s, avg write rate: 7.455 MB/s
 system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
2016-05-19 12:32:25 EDT [16040]: [5-1] user=,db=,app=,client= LOG: automatic analyze of table "postgres.public.pgbench_branches" system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
2016-05-19 12:32:27 EDT [16040]: [6-1] user=,db=,app=,client= LOG: automatic analyze of table "postgres.public.pgbench_history" system usage: CPU 0.01s/0.13u sec elapsed 1.70 sec
Just like with log_checkpoints the information that we receive here is very rich and can be invaluable when trying to diagnose ‘fleeting’ behavior of a system.  As with log_checkpoints, you are increasing the volume of your log files.  In the past 13 years, I’ve found this type of data well worth the overhead.  The nice thing about this logging parameter is that you can throttle back the volume by saying “only log vacuums that take 10 seconds (10,000  milliseconds) or longer”.

log_temp_files = 0

To be as efficient as possible, postgres tries to do everything it can in memory.  Sometimes, you just run out of that fickle resource.  Postgres has a built-in ‘swap-like’ system that will employ temp files with the data directory to deal with the issue.  If you’ve spent any time around disks (especially spinning-rust), you’ll know that swap can cause some serious performance issues.  Just like checkpoint and autovacuum, temp files are going to happen automatically.  Unlike these other two processes, they are going to occur if the queries you are running need the temp space.  From a developer’s perspective, I want to know if the process that I’m engineering is going to use temp.  From a DBA’s perspective, I want to know if the dang developers did something that needs temp space :) ( more likely, my dang maintenance jobs are using it).  To help in tuning your queries and maintenance processes, log your temp files.  It’ll tell you what size temp file was needed and which query caused it:

2016-05-19 12:31:20 EDT [15967]: [1-1] user=postgres,db=postgres,app=pgbench,client=[local] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15967.0", size 200204288
2016-05-19 12:31:20 EDT [15967]: [2-1] user=postgres,db=postgres,app=pgbench,client=[local] STATEMENT: alter table pgbench_accounts add primary key (aid)
The log_temp_files parameter is set in units of kilobytes.  Essentially “write a log if I created a temp file that was this many KB or more in size”.  I prefer 0 (zero) because I want to see all the temp files in use.  Again, in my experience, the “juice is worth the squeeze” here.

 log_lock_waits = on

Databases are servicing lots of clients all trying to do very similar work against the same set of data.  This can cause contention (it’s the nature of the beast).  log_lock_waits let’s you see where your contention is.  It will give you detailed, specific information about what waits occurred and the context in which they occurred.

 2016-05-19 13:14:50 EDT [17094]: [1-1] user=postgres,db=postgres,app=psql,client=[local] LOG: process 17094 still waiting for RowExclusiveLock on relation 16847 of database 12403 after 1000.794 ms at character 13
 2016-05-19 13:14:50 EDT [17094]: [2-1] user=postgres,db=postgres,app=psql,client=[local] DETAIL: Process holding the lock: 17086. Wait queue: 17094.
 2016-05-19 13:14:50 EDT [17094]: [3-1] user=postgres,db=postgres,app=psql,client=[local] STATEMENT: delete from pgbench_tellers ;

 

These are what I call ‘reasonable defaults’ for logging in postgres.  Again, these are the settings that I configure every time I setup a new cluster, whether it’s for dev / test / toy / prod.

Happy querying!