Is my query stuck? PostgreSQL 9.6

One of the most common queries that I keep in my pocket is

         now() - query_start AS runtime,
FROM pg_stat_activity
WHERE waiting='true';

This gives you a quick look at any queries that are ‘waiting’.  Specifically, you will see queries that are explicitly blocked on some other database activity.  I’ve got a few db’s that I’ve moved to 9.6 for dev and test.  I got a call that ‘something seems gummed’ up and my muscle memory kicked in with my tried and true query above….

postgres@postgres=# SELECT pid,
postgres-#          datname,
postgres-#          usename,
postgres-#          now() - query_start AS runtime,
postgres-#          waiting,
postgres-#          state,
postgres-#          query
postgres-# FROM pg_stat_activity
postgres-# WHERE waiting='true';

ERROR:  column "waiting" does not exist
LINE 5:          waiting,

For a second, I was actually considering catalog corruption and other nefarious things… then I remembered that this was changing in 9.6!  The latest and greatest release of PostgreSQL has added deeper introspection into wait events!  Pre 9.6 gave you a boolean to see if the query was blocked, then, it was up to you to dig through all of pg_locks to see what was going on.  Now, waiting has been split into two new columns:



There’s a large list of values that can be filled in, but, essentially:

If the ‘wait_event’ column

is NULL AND state = ‘active          – Running query, not blocked

is NOT NULL and state = ‘active’   – Query is executing, but, blocked 

So, if we modify my favorite query to find useful info about ‘blocked’ queries, it becomes:

         now() - query_start AS runtime,
FROM pg_stat_activity
WHERE wait_event is NOT NULL 
  AND state = 'active';

postgres@postgres=# SELECT pid,
postgres-#          datname,
postgres-#          usename,
postgres-#          now() - query_start AS runtime,
postgres-#          wait_event,
postgres-#          wait_event_type,
postgres-#          state,
postgres-#          query
postgres-# FROM pg_stat_activity
postgres-# WHERE wait_event is NOT NULL
postgres-#   AND state = 'active';

  pid  | datname  | usename  |     runtime     | wait_event | wait_event_type | state  |        query
 15975 | postgres | postgres | 00:01:17.781512 | relation   | Lock            | active | select * from test;
(1 row)


So, I’ve intentionally created a situation where the ‘test’ table would be blocked.  Now, I can very clearly see that my query, ‘select * from test’ is waiting for a Lock  on a relation.  I still have to go to pg_locks to see exactly the other session that is blocking me, but, there’s a lot less work in involved because I already know it’s a relation lock.  

There’s quite a list of possible wait_event / wait_event_type combinations on the postgresql documentation site. It’s very easy to see if you’re blocked because of some internal postgres behavior, slow responses from the OS / hardware or, if it’s your very own workload that’s causing the issue.

Happy PostgreSQL-ing!

Integrating Cassandra, Spark, PostgreSQL and Hadoop as a Hybrid Data Platform – Part 3

In a previous blog post we talked about the technical details behind the
new CassandraFDW that enables PostgreSQL to connect to a Cassandra
cluster. This provides PostgreSQL users the ability to easily read and
write data to a very popular open source multi-master database that can
support writes in multiple data centers. This means if you are storing
sensor data or large files in an easily scalable Cassandra cluster you
can join those NoSQL tables to your local PostgreSQL data as well!

Today my colleague Balakrishna Gatla will show you how to connect,
read data, write data, and join to a remote Cassandra table.

First – Configure Cassandra

If you don’t have a Cassandra cluster yet, you can use the BigSQL
package manager to install a local Cassandra instance for development

Connect to Cassandra using CQLSH

$ cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.17 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.


In Cassandra, a keyspace is a container for your application data. It is
similar to the schema in a relational database.

cqlsh> CREATE KEYSPACE postgres
   ... WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };


cqlsh> USE postgres;

cqlsh:postgres> CREATE TABLE t1 (id int PRIMARY KEY, name text);
cqlsh:postgres> CREATE TABLE t2 (id int PRIMARY KEY, name text);

Insert some Data into the TABLEs

cqlsh:postgres> INSERT INTO t1 (id, name) VALUES (1, 'abc');
cqlsh:postgres> INSERT INTO t1 (id, name) VALUES (2, 'def');
cqlsh:postgres> INSERT INTO t1 (id, name) VALUES (3, 'ghi');

cqlsh:postgres> INSERT INTO t2 (id, name) VALUES (1, 'mno');
cqlsh:postgres> INSERT INTO t2 (id, name) VALUES (2, 'pqr');
cqlsh:postgres> INSERT INTO t2 (id, name) VALUES (3, 'stu');

Second – Set up Postgres Database

Install Cassandra_FDW in Postgres

It is very easy to install extensions using the BigSQL command-line
utility pgc:

(pgc) Install Cassandra_FDW

Go to the BigSQL directory and invoke the bigsql command-line tool to
install Cassandra FDW:

$ ./pgc install cassandra_fdw3-pg95
Get:1 cassandra_fdw3-pg95-3.0.0-1-linux64
 Unpacking cassandra_fdw3-pg95-3.0.0-1-linux64.tar.bz2

If you have not already, you can use pgc to also initialize a new
postgres cluster:

$ ./pgc init pg95

Connect via psql and create the extension in the Postgres Database:

postgres=# CREATE EXTENSION cassandra_fdw;

Like any FDW, you need to configure the remote server, the user
credentials to connect, and then the remote tables (think of as “views”
against the remote server). First, create the Foreign Server to provide
connectivity details for Cassandra:

postgres=# CREATE SERVER cass_serv FOREIGN DATA WRAPPER cassandra_fdw
             OPTIONS (host '', port '9042');

Next, we create the user mapping for authenticating with Cassandra:

postgres=# CREATE USER MAPPING FOR public SERVER cass_serv
             OPTIONS (username 'test', password 'test');

We can now use the Import Foreign Schema feature to bring in a
definition for a specific remote table without specifying its DDL

postgres=# IMPORT FOREIGN SCHEMA postgres LIMIT TO ("t1")
             FROM SERVER cass_serv INTO public;

We can now query the remote table t1 in Postgres like any local table
(again, think a “view” pointing to remote server).

postgres=# SELECT * FROM public.t1;
 id | name
  1 | abc
  2 | def
  3 | ghi
(3 rows)

Third – Write Data to Foreign Table in Cassandra

Create Foreign Table in Postgres using explicit DDL allowing us to
specify the primary_key option which is required to be able to write
to a Foreign Table:

postgres=# CREATE FOREIGN TABLE t2 (id int, name text) SERVER cass_serv
             OPTIONS (schema_name 'postgres', table_name 't2', primary_key 'id');

Let’s run a simple query against the Foreign Table:

postgres=# SELECT * FROM public.t2;
 id | name
  1 | mno
  2 | pqr
  3 | stu
(3 rows)

Now let us try writing to the table:

postgres=# INSERT INTO public.t2 (id, name) VALUES (4, 'vwx');
postgres=# INSERT INTO public.t2 (id, name) VALUES (5, 'asd');
postgres=# SELECT * FROM public.t2;
 id | name
  5 | asd
  1 | mno
  2 | pqr
  4 | vwx
  3 | stu
(5 rows)

postgres=# UPDATE public.t2 SET name = 'aaa' WHERE id = 5;
postgres=# SELECT * FROM public.t2;
 id | name
  5 | aaa
  1 | mno
  2 | pqr
  4 | vwx
  3 | stu
(5 rows)

postgres=# DELETE FROM public.t2 WHERE id = 5;
postgres=# SELECT * FROM public.t2;
 id | name
  1 | mno
  2 | pqr
  4 | vwx
  3 | stu
(4 rows)

Let us see the changes reflected from the Cassandra side:

cqlsh:postgres> SELECT * FROM t2;

 id | name
  1 |  mno
  2 |  pqr
  4 |  vwx
  3 |  stu

(4 rows)

Fourth – JOIN Postgres and Cassandra Tables

Finally, to make it more interesting, here’s an example of joining a
Postgres table to a remote Cassandra table:

Create a Postgres table:

postgres=# CREATE TABLE t3 (id int, city varchar(20));

Insert some data into it:

postgres=# INSERT INTO t3 VALUES (1, 'New York');
postgres=# INSERT INTO t3 VALUES (2, 'London');
postgres=# INSERT INTO t3 VALUES (3, 'Moscow');
postgres=# INSERT INTO t3 VALUES (4, 'New Delhi');
postgres=# INSERT INTO t3 VALUES (5, 'Dallas');
postgres=# SELECT * FROM t3;
 id |   city
  1 | New York
  2 | London
  3 | Moscow
  4 | New Delhi
  5 | Dallas
(5 rows)

Let us start with an equi-join between the Postgres and Cassandra

postgres=# SELECT,, FROM t1 a JOIN t3 b ON ( =;
 id | name |   city
  1 | abc  | New York
  2 | def  | London
  3 | ghi  | Moscow
(3 rows)

Next, LEFT OUTER JOIN the Postgres table to the Cassandra table:

postgres=# SELECT,, FROM t1 a LEFT OUTER JOIN t3 b ON ( =;
 id | name |   city
  1 | abc  | New York
  2 | def  | London
  3 | ghi  | Moscow
(3 rows)

Next, RIGHT OUTER JOIN the Postgres table to the Cassandra table:

postgres=# SELECT,, FROM t1 a RIGHT OUTER JOIN t3 b ON ( =;
 id | name |   city
  1 | abc  | New York
  2 | def  | London
  3 | ghi  | Moscow
    |      | New Delhi
    |      | Dallas
(5 rows)

And finally, let us FULL OUTER JOIN the Postgres table to the Cassandra table:

postgres=# SELECT,, FROM t1 a FULL OUTER JOIN t3 b ON ( =;
 id | name |   city
  1 | abc  | New York
  2 | def  | London
  3 | ghi  | Moscow
    |      | New Delhi
    |      | Dallas
(5 rows)

We hope this helps you get started with the Cassandra FDW. For more
information, please visit our project on BitBucket.

Getting Postgres data into a shell script

Getting Postgres data into a shell script

I use postgres for a lot of different things. Many times, it’s a full-blown application where I can directly access the database (i.e. python + psycopg2, java + jdbc, etc…). What if you’re using something a bit less formal, like a shell script?


psql is the native, command-line interface that ships with a postgresql server. This is usually the first tool that most people end up using with postgres. It’s simplicity masks a very powerful ‘swiss-army’ knife that lets you very quickly work with data.

Let’s take a basic table “hosts”

 host_id SERIAL,
 hostname TEXT,
 host_group TEXT,
 host_ip TEXT,
 host_ssh_port INT,
 host_ssh_user TEXT,
 host_ssh_key_path TEXT

There are much, much better ways to model this! This is just a simple example.

Let’s say I want to ssh to all of the my hosts and execute and ‘uptime’ command. How could I accomplish this from a shell script?

Let’s start by ssh-ing to one of the hosts

ssh -i private_key -p 22 user@host "uptime"

Okay, now, I want to do this for every host in my table. This means that somehow, I need to get the data out of the databases and into my shell script. Let’s write the query that we need:

SELECT host_ssh_key_path, host_ssh_port, host_ssh_user, host_ip FROM hosts;</p>

When I do that, I get nicely formatted data back

[scottm@scottsmac : ~/test1 : 13:30:46]
$ psql
psql (9.5.2)
Type "help" for help.

postgres@postgres=# select host_ssh_key_path, host_ssh_port, host_ssh_user, host_ip FROM hosts;
 host_ssh_key_path | host_ssh_port | host_ssh_user | host_ip
 /infra/keys/vagrantkey | 2222 | vagrant |
 /infra/keys/vagrantkey | 2201 | vagrant |
(2 rows)


Great, but, how do I get this into my shell script?

psql gives us the ‘-c’ flag (command) option. This will connect to the database, execute the SQL and print the result to stdout.

[scottm@scottsmac : ~/test1 : 13:36:24]
$ psql -c "select host_ssh_key_path, host_ssh_port, host_ssh_user, host_ip FROM hosts;"
 host_ssh_key_path | host_ssh_port | host_ssh_user | host_ip
 /infra/keys/vagrantkey | 2222 | vagrant |
 /infra/keys/vagrantkey | 2201 | vagrant |
(2 rows)

[scottm@scottsmac : ~/test1 : 13:36:29]

Great, except, there’s a lot of superfluous stuff there. We could do a lot of ‘sed’, ‘awk’, but, we don’t need to.

Now, we can use the ‘tuples-only’ option to psql (-t) to only print the results of our query:

[scottm@scottsmac : ~/test1 : 13:36:29]
$ psql -t -c "select host_ssh_key_path, host_ssh_port, host_ssh_user, host_ip FROM hosts;"
 /infra/keys/vagrantkey | 2222 | vagrant |
 /infra/keys/vagrantkey | 2201 | vagrant |

[scottm@scottsmac : ~/test1 : 13:39:44]

We’ve removed the header and the row count from the bottom.

Now, we’ve got a bunch of extra whitespace, let’s get rid of it. We can do this by disabling ‘aligned’ mode (-A).

[scottm@scottsmac : ~/test1 : 13:41:38]
$ psql -A -t -c "select host_ssh_key_path, host_ssh_port, host_ssh_user, host_ip FROM hosts;"
[scottm@scottsmac : ~/test1 : 13:41:42]

Okay, now we’ve got our data to something that’s parse-able with a simple awk script. So, now we can dump this into a ‘for’ loop in a script:


for host in `psql -A -t -c "select host_ssh_key_path, host_ssh_port, host_ssh_user, host_ip FROM hosts;"`
 printf "$host\n"

The output:

[scottm@scottsmac : ~/test1 : 13:44:16]
$ ./
[scottm@scottsmac : ~/test1 : 13:44:17]

Now, let’s do a quick ‘awk’ to parse these into proper variables for our ssh:


for host in `psql -A -t -c "select host_ssh_key_path, host_ssh_port, host_ssh_user, host_ip FROM hosts;"`
 key=`echo $host | awk -F'|' '{print $1}'`
 port=`echo $host | awk -F'|' '{print $2}'`
 user=`echo $host | awk -F'|' '{print $3}'`
 ip=`echo $host | awk -F'|' '{print $4}'`

 printf "Connecting to $ip ... "
 ssh -i $key -p $port $user@$ip "uptime"
 printf "\n"


Now, when I run the script:

scottm@scottsmac : ~/test1 : 13:48:18]
$ ./
Connecting to ... 18:48:20 up 23:16, 0 users, load average: 0.00, 0.01, 0.05

Connecting to ... 18:48:19 up 21 min, 0 users, load average: 0.00, 0.01, 0.04

[scottm@scottsmac : ~/test1 : 13:48:20]

Voila! postgres data, reliably from within your shell script.

Happy PostgreSQL-ing!

Which standby was that again?

In any modern PostgreSQL installation, there’s usually at least one streaming replication standby server feeding off your production master. In fact, there’s usually several. Being the good PostgreSQL administrator that you are, you’ve already enabled logging of replication commands in postgresql.conf thusly:

log_replication_commands = on

But what you’re probably not doing is having each standby automatically identify itself in your logs. So how do you accomplish this? It’s simple: “abuse” the application_name parameter in the standby’s recovery.conf:

primary_conninfo = 'host= application_name=$(hostname -s)'

With this simple little trick, you can now look at your logs (or your pgBadger report of the logs and see which standby is responsible for which log entry.

Simple and sweet.

PL/pgSQL Profiler 3 (plprofiler) is released

Finding performance problems in deeply nested PL/pgSQL business logic is difficult.

One of the reasons is that PL/pgSQL hides anything happening in the PL code. None of the SQL queries inside of all the nested function and trigger calls ever shows up in the PostgreSQL logs or pg_stat_activity. They can be tracked in pg_stat_statements, but then comes the hunt for where in the source code they are. With this "Cone of Silence" utilities like pgbadger cannot point to the actual problem. All pgbadger tells you is that "SELECT slowfunc(123)" took two hours.

To help DBAs and developers tackle this problem, I started working on the plprofiler code about 6 months ago. The result of this work, which was sponsored by OpenSCG, is plprofiler version 3. 

The feature highlights of version 3 are:

  • Can be used by unprivileged users to profile PL code executing in their session.
  • Can be used by a Postgres superuser to profile an entire, running application and collect the profiling data in shared memory.
  • A command line utility that wraps around a Python Class with functionality for:
    • Profiling of SQL statements that invoke functions or triggers. The SQL can be given on the command line or in a separate file.
    • Monitoring a running application for an arbitrary period of time.
    • Monitoring a specific running backend (by PID) for an arbitrary period of time.
    • Saving monitoring data from any of the above into permanent tables. These data sets are fully resolved and do no longer rely on the system catalog to identify functions/triggers or get their function source code.
    • Exporting/importing of saved data sets.
    • Generating a self contained HTML report from local, shared or saved data. The HTML report contains a FlameGraph based on the wall clock time spent in PL/pgSQL functions and triggers.
The best thing here is the actual end result of profiling, the HTML report. The FlameGraph in it makes it very easy to identify the functions/triggers that use excessive amounts of time, however deeply they may be nested. In addition the report contains full source code (as pop-ups) of a user defined number of "Top" functions. That source code is presented together with the per source line execution statistics of number of executions, cumulative time spent and maximum execution time of that particular PL statement.

Sample Report of a pgbench implemented with PL/pgSQL functions

The first real-world case, we used the new plprofiler to solve, was an application consisting of approx. 600 PL/pgSQL functions with over 100,000 lines of PL code sitting on 1,800 tables with 13,000 indexes. Quite some schema I would say and all the business logic implemented in PL/pgSQL. The problem we found was a SELECT statement in a function, 9 call levels deep. The SELECT's WHERE clause did not specify the first column of the only available index, but due to the row width of the table the PostgreSQL optimizer chose a bitmap scan using a full scan of the entire index. Creating one additional index on that table resulted in a net gain of factor 2.6 in overall application performance!

This was a really tough case and our remote DBAs had already spent days on the problem without really getting anywhere. What makes this so difficult is the combination of PL/pgSQL hiding all queries and that the query in question was still producing an index scan (of sorts), so the table wouldn't show up as suspicious in pg_stat_user_tables or the like either. Finding the problem with plprofiler version 3 took about an hour, and that without any prior knowledge of the customer's application code.

Granted, it still takes an experienced DBA to spot the particular problem that was the root cause in this case.

Anyhow, if you are interested and want to play around with plprofiler version 3, BigSQL has binary packages as part of the BigSQL distribution that work on Linux, Mac OSX and Windows.

Inaugural pgCMH mtg scheduled

As announced on our Twitter, we've scheduled our inaugural meeting for Jan 24, 2017 at 1800 hrs. The folks at CoverMyMeds will graciously provide the meeting space (and parking garage) while OpenSCG is buying the pizza!

At this first meeting, we'll be discussing what you, the members, would like to get from the meetings, we'll talk about future topics, and general organizational things. I know it's not exciting, but I need everyone's input to make this group something you get value from.

Please RSVP via the MeetUP event so we have sufficient food and drink!

See you there!

PostgreSQL bloat estimates

The PostgreSQL storage manager is tasked with the complex job of satisfying ACID compliance. This is a well documented series of algorithms that I won’t go into here. Like any storage system or database, PostgreSQL tables can start to take on free space (bloat as it is sometimes known).  While all free space is not a bad thing, there are some situations where it can get unwieldy.

At a high-level:

  • During an UPDATE or DELETE in postgres, the row is not removed. It is marked as ‘free space’ for re-use.
  • If there are no inbound rows that fit into that free space (or you have no new rows), the ‘dead rows’ that were marked as free are now taking up disk space.

Anecdotally, I’ve seen a table using 100GB of disk space, with only a few thousand rows totaling less than 1 GB. This was a queue used by an old version of the quartz scheduler. The table was high volume, pure INSERT & DELETE. There was never any opportunity for dead space to be found and re-used.

So, we’ve identified traffic patterns with high-volume UPDATE & DELETE that can cause bloat, but, how do I actually monitor for it?

There’s really two schools of though here:

  1. Deep scan table, read all live & dead rows
  2. Catalog-based estimates

Is bloat really bad?

Before moving on to finding bloat, let’s understand the implications. If you’ve spent a lot of time with DBAs, you’ve probably heard them discussing free space. Free space is disk space that has been allocated, and is available for use. This type of free space is very efficient to use vs. allocating new blocks. Our goal isn’t to eliminate all bloat. The goal is to eliminate excessive bloat. The queries here are going to either definitively find or estimate the amount of available space in a relation. I don’t recommend going on a quixotic quest against all of your built-up free space. Just eliminate the extra bloat.

Deep scan

The first strategy is going to be the most accurate method of determining bloat. You’ll be able to see exactly how much bloat is in a table. There are a few issues with this strategy:

  • Deep scan of relation takes time
  • Deep scan demands I/O from server

In other words: It’s going to take a long time and most likely impact your system performance.

I’m not going to deep-dive on this option today, if you’re interested, have a look at the pgstattuple module. (NB: pgstattuple also has a pgstattuple_approx) function in the latest versions. Although it does approximate the bloat (making a deep-scan slightly faster), it is still scanning the relation.


Postgres gathers statistics about tables and indexes in order to query them efficiently (This is done via ‘ANALYZE’, discussion for another day). A few well-placed queries can use these statistics to estimate the amount of bloat in a table. Since we are using the catalogs, no deep-scan is required. The downside is that we won’t have 100% accurate statistics, this is only an estimate.

I prefer a strategy that allows me to quickly generate a list of potential problems before digging in further. Essentially, I start with an estimate then drill-in looking for more detail.

There are two very popular bloat estimation queries available to us. The most well-known was developed as part of (PostgreSQL plugin for Nagios). Personally, I’ve also found the results of that a bit harder to parse manually.

My current favorite query gives a nice, human-readable overview of potential bloat issues (from:

/* WARNING: executed with a non-superuser role, the query inspect only tables you are granted to read.
* This query is compatible with PostgreSQL 9.0 and more
SELECT current_database(), schemaname, tblname, bs*tblpages AS real_size,
 (tblpages-est_tblpages)*bs AS extra_size,
 CASE WHEN tblpages - est_tblpages > 0
 THEN 100 * (tblpages - est_tblpages)/tblpages::float
 END AS extra_ratio, fillfactor, (tblpages-est_tblpages_ff)*bs AS bloat_size,
 CASE WHEN tblpages - est_tblpages_ff > 0
 THEN 100 * (tblpages - est_tblpages_ff)/tblpages::float
 END AS bloat_ratio, is_na
 -- , (pst).free_percent + (pst).dead_tuple_percent AS real_frag
 SELECT ceil( reltuples / ( (bs-page_hdr)/tpl_size ) ) + ceil( toasttuples / 4 ) AS est_tblpages,
 ceil( reltuples / ( (bs-page_hdr)*fillfactor/(tpl_size*100) ) ) + ceil( toasttuples / 4 ) AS est_tblpages_ff,
 tblpages, fillfactor, bs, tblid, schemaname, tblname, heappages, toastpages, is_na
 -- , stattuple.pgstattuple(tblid) AS pst
 ( 4 + tpl_hdr_size + tpl_data_size + (2*ma)
 - CASE WHEN tpl_hdr_size%ma = 0 THEN ma ELSE tpl_hdr_size%ma END
 - CASE WHEN ceil(tpl_data_size)::int%ma = 0 THEN ma ELSE ceil(tpl_data_size)::int%ma END
 ) AS tpl_size, bs - page_hdr AS size_per_block, (heappages + toastpages) AS tblpages, heappages,
 toastpages, reltuples, toasttuples, bs, page_hdr, tblid, schemaname, tblname, fillfactor, is_na
 tbl.oid AS tblid, ns.nspname AS schemaname, tbl.relname AS tblname, tbl.reltuples,
 tbl.relpages AS heappages, coalesce(toast.relpages, 0) AS toastpages,
 coalesce(toast.reltuples, 0) AS toasttuples,
 array_to_string(tbl.reloptions, ' ')
 FROM '%fillfactor=#"__#"%' FOR '#')::smallint, 100) AS fillfactor,
 current_setting('block_size')::numeric AS bs,
 CASE WHEN version()~'mingw32' OR version()~'64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END AS ma,
 24 AS page_hdr,
 23 + CASE WHEN MAX(coalesce(null_frac,0)) > 0 THEN ( 7 + count(*) ) / 8 ELSE 0::int END
 + CASE WHEN tbl.relhasoids THEN 4 ELSE 0 END AS tpl_hdr_size,
 sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 1024) ) AS tpl_data_size,
 bool_or(att.atttypid = ''::regtype) AS is_na
 FROM pg_attribute AS att
 JOIN pg_class AS tbl ON att.attrelid = tbl.oid
 JOIN pg_namespace AS ns ON ns.oid = tbl.relnamespace
 JOIN pg_stats AS s ON s.schemaname=ns.nspname
 AND s.tablename = tbl.relname AND s.inherited=false AND s.attname=att.attname
 LEFT JOIN pg_class AS toast ON tbl.reltoastrelid = toast.oid
 WHERE att.attnum > 0 AND NOT att.attisdropped
 AND tbl.relkind = 'r'
 GROUP BY 1,2,3,4,5,6,7,8,9,10, tbl.relhasoids
 ) AS s
 ) AS s2
) AS s3;
-- WHERE NOT is_na
-- AND tblpages*((pst).free_percent + (pst).dead_tuple_percent)::float4/100 >= 1


As I said, the complexity of these queries is high. Let’s take a look at the output ( as a picture because text wraps)

Bloat Out

current_database | schemaname |        tblname         | real_size  | extra_size |    extra_ratio    | fillfactor |  bloat_size  |    bloat_ratio    | is_na
postgres         | public     | pgbench_accounts       | 1411891200 |   39084032 |  2.76820423556716 |         10 | -12316139520 |                 0 | f
postgres         | public     | pgbench_branches       |    1105920 |    1097728 |  99.2592592592593 |         10 |      1064960 |  96.2962962962963 | f
postgres         | public     | pgbench_history        |   16867328 |      73728 | 0.437105390966489 |        100 |        73728 | 0.437105390966489 | f
postgres         | public     | pgbench_tellers        |   21422080 |   21372928 |  99.7705544933078 |         10 |     20979712 |  97.9349904397706 | f
postgres         | public     | t_stats                |      32768 |          0 |                 0 |        100 |            0 |                 0 | f
postgres         | snapshots  | snap                   |      65536 |          0 |                 0 |        100 |            0 |                 0 | f
postgres         | snapshots  | snap_databases         |    2424832 |     327680 |  13.5135135135135 |        100 |       327680 |  13.5135135135135 | t
postgres         | snapshots  | snap_indexes           |    9330688 |     327680 |  3.51185250219491 |        100 |       327680 |  3.51185250219491 | t
postgres         | snapshots  | snap_pg_locks          |    5980160 |     483328 |  8.08219178082192 |        100 |       483328 |  8.08219178082192 | f
postgres         | snapshots  | snap_settings          |      24576 |          0 |                 0 |        100 |            0 |                 0 | f
postgres         | snapshots  | snap_stat_activity     |    1449984 |      65536 |  4.51977401129944 |        100 |        65536 |  4.51977401129944 | t
postgres         | snapshots  | snap_statio_all_tables |   29868032 |     974848 |  3.26385079539221 |        100 |       974848 |  3.26385079539221 | t
postgres         | snapshots  | snap_user_tables       |    5472256 |     270336 |  4.94011976047904 |        100 |       270336 |  4.94011976047904 | t

What we get here is a very nice overview that shows us (all sizes in bytes) the real_size of the relation, how much is extra and what the ratio of extra is. The last column, is_na (is not applicable) is important. Essentially, this column is true if you’re using the ‘name’ datatype. The ‘name’ datatype throws off the bloat estimation and can give you invalid data.

Since these are estimates, we have to take them with a grain of salt. You probably want to compare it to your usage rates from pg_stat_user_tables if you’re not familiar with the workload already.

Reclaiming space, that’s a topic for another day, but, take a look at VACUUM FULL (requies full table lock for compaction) or pg_repack (extension that reclaims space with minimal locking).

Now you have some information about your bloat. At this point, you can drill in with pgstattuple, or begin targeting some suspects with more aggressive vacuums / or move to compaction.

The other (before mentioned) query that is very popular is from the popular script for Nagios:

 current_database() AS db,
 reltuples::bigint AS tups,
 relpages::bigint AS pages,
 WHEN otta=0
 OR sml.relpages=0
 OR sml.relpages=otta THEN
 ELSE sml.relpages/otta::numeric END,1) AS tbloat,
 WHEN relpages < otta THEN
 ELSE relpages::bigint - otta
 END AS wastedpages,
 WHEN relpages < otta THEN
 ELSE bs*(sml.relpages-otta)::bigint
 END AS wastedbytes,
 WHEN relpages < otta THEN
 '0 bytes'::text
 ELSE (bs*(relpages-otta))::bigint || ' bytes'
 END AS wastedsize, iname, ituples::bigint AS itups, ipages::bigint AS ipages, iotta, ROUND(CASE
 WHEN iotta=0
 OR ipages=0
 OR ipages=iotta THEN
 ELSE ipages/iotta::numeric END,1) AS ibloat,
 WHEN ipages < iotta THEN
 ELSE ipages::bigint - iotta
 END AS wastedipages,
 WHEN ipages < iotta THEN
 ELSE bs*(ipages-iotta)
 END AS wastedibytes,
 WHEN ipages < iotta THEN
 '0 bytes'
 ELSE (bs*(ipages-iotta))::bigint || ' bytes'
 END AS wastedisize,
 WHEN relpages < otta THEN
 WHEN ipages < iotta THEN
 ELSE bs*(ipages-iotta::bigint)
 WHEN ipages < iotta THEN
 ELSE bs*(relpages-otta::bigint + ipages-iotta::bigint)
 END END AS totalwastedbytes
 (SELECT nn.nspname AS schemaname,
 cc.relname AS tablename,
 0) AS reltuples,
 0) AS relpages,
 0) AS bs,
 COALESCE(CEIL((cc.reltuples*((datahdr+ma- (CASE
 WHEN datahdr%ma=0 THEN
 ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)),0) AS otta, COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes ALL cols
 FROM pg_class cc
 JOIN pg_namespace nn
 ON cc.relnamespace = nn.oid
 AND nn.nspname <> 'information_schema'
 (SELECT ma,
 WHEN hdr%ma=0 THEN
 ELSE hdr%ma END)))::numeric AS datahdr, (maxfracsum*(nullhdr+ma-(case
 WHEN nullhdr%ma=0 THEN
 ELSE nullhdr%ma END))) AS nullhdr2
 (SELECT ns.nspname,
 2048)) AS datawidth,
 0)) AS maxfracsum,
 (SELECT 1+count(*)/8
 FROM pg_stats s2
 WHERE null_frac<>0
 AND s2.schemaname = ns.nspname
 AND s2.tablename = tbl.relname ) AS nullhdr
 FROM pg_attribute att
 JOIN pg_class tbl
 ON att.attrelid = tbl.oid
 JOIN pg_namespace ns
 ON ns.oid = tbl.relnamespace
 LEFT JOIN pg_stats s
 ON s.schemaname=ns.nspname
 AND s.tablename = tbl.relname
 AND s.inherited=false
 AND s.attname=att.attname,
 (SELECT current_setting('block_size')::numeric) AS bs,
 FROM '#"[0-9]+.[0-9]+#"%' for '#') IN ('8.0','8.1','8.2') THEN
 ELSE 23
 END AS hdr,
 WHEN v ~ 'mingw32'
 OR v ~ '64-bit' THEN
 END AS ma
 (SELECT version() AS v) AS foo ) AS constants
 WHERE att.attnum > 0
 AND tbl.relkind='r'
 GROUP BY 1,2,3,4,5 ) AS foo ) AS rs
 ON cc.relname = rs.relname
 AND nn.nspname = rs.nspname
 LEFT JOIN pg_index i
 ON indrelid = cc.oid
 LEFT JOIN pg_class c2
 ON c2.oid = i.indexrelid ) AS sml;

Happy PostgreSQL-ing!

To reload or restart? (with a test script!)

Even the most seasoned DBA sometimes forgets if a setting change needs a restart or a reload. So, how do we check, in a PostgreSQL database if I can reload or restart a setting?

Config File

The quickest way is to look at the default configuration file for the version of PostgreSQL that you’re running. The default file is very well commented, the most notable comment in the file will let you know that “(change requires restart)”. See the following excerpt from postgresql.conf with the associated comments:

listen_addresses = '*'      # what IP address(es) to listen on;
          # comma-separated list of addresses;
          # defaults to 'localhost'; use '*' for all
          # (change requires restart)
port = 5436       # (change requires restart)
max_connections = 100     # (change requires restart)
#superuser_reserved_connections = 3 # (change requires restart)

If you’re new to postgres or really need a super-fast way of determining if you have to restart or not, this is the quickest way.


There’s actually more than one way to ‘set a setting’ in postgres. The config file method (above) shows us if we need a restart or not. I’d actually like to know a bit more:

  • Can I set this per-user
  • Can I set this per-database
  • Do I need to modify this globally
  • etc…

How can we tell? Let’s look in the database at the pg_catalog.pg_settings catalog view. (This is from my 9.5.2 instance).

postgres@postgres=# \d+ pg_catalog.pg_settings
 View "pg_catalog.pg_settings"
 Column | Type | Modifiers | Storage | Description
 name | text | | extended |
 setting | text | | extended |
 unit | text | | extended |
 category | text | | extended |
 short_desc | text | | extended |
 extra_desc | text | | extended |
 context | text | | extended |
 vartype | text | | extended |
source | text | | extended |
 min_val | text | | extended |
 max_val | text | | extended |
 enumvals | text[] | | extended |
 boot_val | text | | extended |
 reset_val | text | | extended |
 sourcefile | text | | extended |
 sourceline | integer | | plain |
 pending_restart | boolean | | plain |
View definition:
 FROM pg_show_all_settings() a(name, setting, unit, category, short_desc, extra_desc, context, vartype, source, min_val, max_val, enumvals, boot_val, reset_val, sourcefile, sourceline, pending_restart);
 pg_settings_n AS
 pg_settings_u AS
    ON UPDATE TO pg_settings
   WHERE = DO  SELECT set_config(, new.setting, false) AS set_config

This catalog view gives us quite a bit of information about each setting in postgres.

Aside: If you’re grokkin’ (or hatin’ on ) my psql prompt. Check it out here

So, how do we use this view, in the context of ‘do I need a restart, etc…’

postgres@postgres=# select name, context, setting::varchar(25) from pg_settings;
 name | context | setting
 allow_system_table_mods | postmaster | off
 application_name | user | psql
 archive_command | sighup | (disabled)
 archive_mode | postmaster | off
 archive_timeout | sighup | 0
 array_nulls | user | on
 authentication_timeout | sighup | 60
 autovacuum | sighup | on
 autovacuum_analyze_scale_factor | sighup | 0.1
 autovacuum_analyze_threshold | sighup | 50
 autovacuum_freeze_max_age | postmaster | 200000000
 autovacuum_max_workers | postmaster | 3
 autovacuum_multixact_freeze_max_age | postmaster | 400000000
 autovacuum_naptime | sighup | 60
 autovacuum_vacuum_cost_delay | sighup | 20
 autovacuum_vacuum_cost_limit | sighup | -1
 autovacuum_vacuum_scale_factor | sighup | 0.2
 autovacuum_vacuum_threshold | sighup | 50
 autovacuum_work_mem | sighup | -1
 backslash_quote | user | safe_encoding
 bgwriter_delay | sighup | 200
 bgwriter_lru_maxpages | sighup | 100
 bgwriter_lru_multiplier | sighup | 2
 block_size | internal | 8192

There’s actually 247 rows that get spit out as part of that query (I’ve cut it to this list to give you a cross section).

The context column is what we’re after here. This column tells us “in what context this setting can be applied”. Let’s take a look at what contexts we have.

SELECT context,
         count(context) contextcount
FROM pg_settings
GROUP BY context
ORDER BY  contextcount DESC;
postgres@postgres=# SELECT context,
postgres-#          count(context) contextcount
postgres-# FROM pg_settings
postgres-# GROUP BY context
postgres-# ORDER BY  contextcount DESC;
      context      | contextcount
 user              |           96
 sighup            |           55
 postmaster        |           47
 superuser         |           30
 internal          |           15
 backend           |            2
 superuser-backend |            2
(7 rows)

So, what do each of these mean?


This means that the configuration can be set by any one of:

  • In your psql session
postgres@postgres=# set work_mem='400MB';

NB: This will only last while you’re logged in.  You can also ‘reset work_mem’

  • ALTER USER [username] set [setting] = [value]
postgres@postgres=# alter user postgres set work_mem = '400MB';

NB: Each time the user logs in, this setting will be applied

  • ALTER DATABASE [dbname] set [setting = [value]
postgres@postgres=# alter database postgres set work_mem = '400MB';

NB: Whenever someone logs in to this database, they’ll inherit that value

NB: If you set a value at both the user and database layer, the ‘user’ setting wins.

  • Config file
  • As always, you can also set this globally by modifying the config file


This means that you can reload postgres after changing the setting


This means that you need to restart the database instance / cluster after making the change


You can set these in the config file. If you are a database superuser, you can set them in your session with ‘set’ command


These are settings that are quite a bit harder to change. These are typically things that are

  • set in source code and compiled
  • configured at initdb time
    Either way, you cannot change these settings for the instance you’re using. You would need to either recompile or re-initdb. And then, you would need to dump / restore your data. These settings ‘blocksize’ for example will completely change the way that the server accesses data.


These are settings that you can change in a config file, but, cannot be modified for a live session. (You can also set some of these on your connection request). Existing sessions need to drop and reconnect in order to pickup the change.


Same as backend above, but, only applies for users with the ‘superuser’ flag.

See For more details.

Testing setting contexts

From time to time, it’s important to see exactly how a setting behaves. This perl program will connect to a database and run forever (ctrl+c) to exit. You’ll need ‘DBD::Pg’ and ‘DBI’ installed.

Aside: Installing DBD::Pg is a bit out of scope, but, you can try:

perl -MCPAN -e 'install DBI'
perl -MCPAN -e 'install DBD::Pg'

If you have permissions issues, try that as root or with ‘sudo’…

Now, You can start this program. While it’s running, go ahead and change a setting and reload, the check runs every 2 seconds and will print the current value of ‘work_mem’ (change line 7 for the setting you care about).

use POSIX;
use DBI;
my $dbh = DBI->connect("dbi:Pg:host= port=5432 dbname=mydb",'user','mypass');
my $sth=$dbh->prepare('show work_mem');
print "Checking state of 'port'\n\n";
while ( true )
    while ( @row=$sth->fetchrow_array())
        foreach (@row)
            print $_;
        print "\n";
    sleep 2;

Happy PostgreSQL-ing!

Announcing pgCMH

I've been kicking around the idea of founding a Columbus-based PostgreSQL User Group for a while now. I even went so far as to float the idea to people at OLF in '14. After much hemming and hawing (and no one else stepping up in the interim), I've finally gone and done it.

pgCMH is the name of my newly formed group, and we're good to go. We've got our own Twitter (@pgCMH):

as well as our own MeetUp page. We've got a sponsor providing food, and another providing the meeting location. Our first meeting will be in Jan, thanks to all the scheduling conflicts the upcoming holidays create.

Watch this space for updates, follow our Twitter, and join the mailing list on MeetUp. I'd love to get your participation and input. Let's make this group as wildly successful as we can!