Incremental pgBadger

(reblogged from here)

You’re probably already running pgBadger to monitor your PostgreSQL logs. However, you’re probably not running it incrementally throughout the day. Most likely, you’ve setup a cron.daily job that runs pgBadger against yesterday’s log(s). And that’s great. Except when you get the dreaded “what just happened on the db?” email. Are you going to wait until tonight’s normal run of pgBadger to see what happened? Are you going to run a ‘one off’ pgBadger against today’s logfile and wait for it to process the entire log? Or are you going to copy the log off somewhere, edit it to cut it down, and then run pgBadger against this cut-down version (hoping you left enough in the log to see proper trending)?

No, most likely you’re going to look at your actual monitoring tool that does real-time monitoring of your db and try to figure things out from there. You are running some kind of db monitoring tool, right?

However, let’s say that for, uh, reasons, you only have pgBadger at your disposal right this instant. Well, if you were making use of pgBadger’s incremental mode you could simply fire off the next scheduled run and it would only process those log entries that were new since the last run. So, for example, if you had a cron.hourly run of pgBadger it would only process the last hour’s worth of entries to update today’s report. No waiting to process multiple hours of info that you don’t need, no editing of the logfile to remove things outside the window you care about, just run it and done.

Sounds nice, right? So let’s set this up shall we? I’m assuming you’ve already setup postgresql.conf appropriately, but if you haven’t please go that first. The pgBadger website has good documentation on how to do so. According to the docs:

-I | --incremental     : use incremental mode, reports will be generated by days in a separate directory, --outdir must be set.

is how we turn on incremental mode. You’ll note that we also need to specify an output dir:

-O | --outdir path     : directory where out file must be saved

I usually stick the pgBadger output into the pg_log directory. In my mind, having the logs and the report on the logs next to each makes sense, but feel free to stick yours wherever.

Finally, we probably don’t need pgBadger reports that are too old, and the docs say we can cull the cruft automatically:

-R | --retention N     : number of week to keep in incremental mode. Default to 0, disabled. Used to set the number of weel to keep in output directory. Older weeks and days directory are automatically removed.

(Ignore the typo, it’s that way in the code)

On my servers, I have PostgreSQL setup to log into a different file for each day of the week, with automatic rotation and truncation:

log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
log_rotation_age = 1d

so my cron.hourly pgBadger looks like:

pgbadger \
    -I \
    -O $PGDATA/pg_log \
    -R 12 \
    -q \
    $PGDATA/pg_log/postgresql-$(date --date yesterday +%a) \
    $PGDATA/pg_log/postgresql-$(date +%a)

which as you can see always feeds both yesterday’s and today’s log into pgBadger (since the cron runs at 2300 and then again at 0000, we need yesterday’s log to catch that last hour). Since we’re running in incremental mode, it knows at every run where it left off in the files the last time and does a seek to skip over that data. This cuts the run time down significantly even with the PostgreSQL logging cranked up. You can see it here:

DEBUG: Starting reading file postgresql-Wed.log...
DEBUG: Start parsing at offset 412677131 of file postgresql-Wed.log to 433543395
[======================>  ] Parsed 413815537 bytes of 433543395 (95.45%), queries

As you can see, it jumps right in at 95% of the file and only processes the newest 5%. In fact, this takes a mere statistics gathering took:20 wallclock secs (19.44 usr + 0.17 sys = 19.61 CPU) on my overloaded Macbook!

So there you have it. Not counting the time it takes you to ssh to your server, it would have taken all of 20 seconds to have an updated report of what just happened on your database!

Keep in mind, this is also with a single thread. pgBadger has the ability to run multi-threaded. See the --help for details.


Extending Postgres by BigSQL

Recently we had a customer request to build a custom extension against Postgres by BigSQL distribution. Even though BigSQL ships with a large set of commonly used extensions and good collection of FDWs, these kind of user build requirements always crop up, based on how powerful the Postgres extension model is.

BigSQL makes it easy to build custom extensions for new requirements by including all of the required header files and libraries along with the Postgres by BigSQL installation. This is one of the key benefits of the open source BigSQL distribution as opposed to a proprietary open source-based approach.

One may think that, it will be a challenge because a BigSQL user has the freedom to relocate the the installation anywhere he/she wants or have a DevOps Sandbox. That means Postgres related files can exist anywhere the user wants. In this blog I’ll show you how easy it is. Here is an example of how to build extensions, where postgres binaries sits in my user home directory (I am using DevOps Sandbox on CentOS 6.7)

Building extension from Source

Extension build will use pg_config to understand the the location of the files. So only thing we have to do is to set the path to pg_config wherever it is:

export PATH=/home/vagrant/bigsql/pg95/bin:$PATH

After that, building any custom extension should be a breeze.
Get the required branch of extension:

git clone -b VERSION3_1_STABLE

Install any external library requirements

$sudo yum install openssl-devel.x86_64
$sudo yum install readline-devel

(of course, we need build tools like gcc and make)


The build should go through without any issues.

Getting Extension from PGXN (PostgreSQL Extension Network)

The Postgres community has many extensions available, and PGXN makes it easy to find and install open source extensions published by other community members. Postgres by BigSQL works perfectly with PGXN.

Just like above case, we need to add the path to pg_config before executing the PGXN

$ export PATH=/home/vagrant/bigsql1/pg95/bin:$PATH

Now we can install any extenion we need like:

$ pgxn install temporal_tables

Once installation complete, extension will be ready to use

postgres=# create extension temporal_tables;

Up Next

We just covered how easy it is to build and install PostgreSQL extensions on Linux. In my next blog post, I’ll show how to build and install extensions on Windows. Postgres by BigSQL uses Mingw (and gcc) to compile Postgres and related extensions on Windows which is great because it keeps the full tool chain open source (no proprietary compiler required) and enables extension authors to support Windows without making an MSVC specific build system or code and configuration changes.

Hello, Hola, Salut, Ciao, Hallo, Ola’

new_me Hey everyone!

My name’s Douglas Hunley, and I recently joined OpenSCG as a Sr Architect. I’ve been working with PostgreSQL since around version 7.3.4 (back in ’03) and I’m a certified PostgreSQL DBA (and trainer), an RHCSA, and recently a certified AWS Technical Professional.  Several lifetimes ago I was also a certified Oracle DBA on 8i (what can I say? I was young and stupid). I’m really excited to be joining OpenSCG in their technical services organization and I’m pretty stoked about our BigSQL offering too! I’ll be blogging here from time to time about various PostgreSQL and OpenSCG things and I’m also now managing the @openscg twitter account, so give me a shout there if you like!

And in case I don’t see you, good afternoon, good evening, and goodnight! :)

Auditing database changes using pgaudit

It is a very common requirement in the database world to audit database activities such as Sarbanes-Oxley financial regulations or service level impacts. Reason for auditing, granularity of audit information and type of auditing may differ. The PostgreSQL Audit extension (pgaudit) for Postgres provides most of the necessary features for auditing. The BigSQL project ships the ready-to-use pgaudit extension along with Postgres binaries for different platforms.


Let us explore how easy it is to set it up along with Postgres 9.6

PostgreSQL should be loading the pgaudit extension libraries during start up. To achieve this, we just need to specify a parameter and restart the PostgreSQL instance.

 postgres=# alter system set shared_preload_libraries='pgaudit';
$ ./pgc restart pg96
pg96 stopping
pg96 starting on port 5432

Its good to verify this step once postgres instance comes up

postgres=# select name,setting,pending_restart from pg_settings where name='shared_preload_libraries';
name                      | setting | pending_restart
shared_preload_libraries  | pgaudit | f

Example Use Case

A typical case of auditing is the change control in a controlled production environment. Security policies may require every change to database structure (DDLs) be audited / auditable.

Enabling DDL auditing at Instance Level

 postgres=# alter system set pgaudit.log = 'ddl';

This doesn’t require a bounce.

**pgaudit allows us to have more granularity at the database level or even at the particular role level. Please see the official documentation for all options available and extensive features.

Testing the DDL Audit.

Lets try creating a new table

postgres=# create table t1 (id int,name varchar(30));

Postgres keeps all audit information in postgresql’s standard log file as specified by log_directory and log_filename parameters.

Now the following audit lines appear in the log file:

2016-09-15 08:05:42 EDT [4143]: [9-1] user=postgres,db=postgres,app=psql,client=[local] LOG:  AUDIT: SESSION,2,1,DDL,CREATE TABLE,,,"create table t1 (id int,name varchar(30));",<not logged>

Let’s try altering the table:

postgres=# alter table t1 alter  column name type varchar(50);

The following line appears in the log file:

2016-09-15 08:08:10 EDT [4143]: [18-1] user=postgres,db=postgres,app=psql,client=[local] LOG:  AUDIT: SESSION,3,1,DDL,ALTER TABLE,,,alter table t1 alter  column name type varchar(50);,<not logged>

Now let’s try creating a function, I am going to use Javascript (V8):

postgres=# CREATE FUNCTION plv8max (a integer, b integer)
postgres-# RETURNS integer AS $$
postgres$#   if (a > b)
postgres$#     return a;
postgres$#   else
postgres$#     return b;
postgres$# $$ LANGUAGE plv8;

This is resulting in following audit entry:

2016-09-15 08:10:15 EDT [4143]: [19-1] user=postgres,db=postgres,app=psql,client=[local] LOG:  AUDIT: SESSION,4,1,DDL,CREATE FUNCTION,,,"CREATE FUNCTION plv8max (a integer, b integer)
RETURNS integer AS $$
if (a > b)
return a;
return b;
$$ LANGUAGE plv8;",<not logged>


The PgAudit extension can audit not only DDLs, But SELECTs, INSERTS, UPDATES, etc. as well. We should also note that extensive auditing can cause excessive I/O in the system and hurt the overall system performance.

Integrating Cassandra, Spark, PostgreSQL and Hadoop as a Hybrid Data Platform – Part 2

In the previous post, we provided a business and architectural
background for the Postgres FDWs that we are developing for Spark,
Hadoop and Cassandra. In particular, we highlighted the key benefits of
bringing Cassandra and PostgreSQL together.

With this post, we will start taking a more technical look at the
Cassandra FDW.

The C* FDW speaks natively with Cassandra on two levels; it:

  • uses the binary CQL protocol instead of the legacy Thrift protocol.
  • directly relies on the DataStax Native C++ driver for Cassandra.

The DataStax C++ driver is performant and feature-rich; various load
balancing and routing options are available and configurable. We are
already making use of some of these features and plan to provide more of
these to our users.

While exploring Cassandra as a Postgres user, the defaults such as
automatic inclusion of the ALLOW FILTERING clause are useful as they
allow gradual familiarity; especially useful in small development
environments. Our intent is to support tuning for large environments
but to default to a configuration geared toward existing PostgreSQL

At this point, let us consider whether we are introducing a new SPOF by
using PostgreSQL with a Cassandra system. We believe not; a PostgreSQL
node at the edge of a Cassandra cluster – as a transactional or open-SQL
end point – is not at all the same as a central master node critical to
the operation of an entire cluster. We see some trade-offs but mostly
we see benefits of bringing PostgreSQL to Cassandra in this way as we
intend to elucidate through this series.

In the next post, we will show you how to get started with the Cassandra

Integrating Cassandra, Spark, PostgreSQL and Hadoop as a Hybrid Data Platform

Today many organizations struggle to keep up with their database
requirements, for example, to:

  • store and analyze high-velocity and rapidly-growing data such as logs,
    package tracking events, sensor readings and machine-generated
  • ensure 24/7 availability of customer-facing websites, services and
    apps even when a subset of their data centers, servers or data are
  • support fast-growing internet-scale businesses by adding relatively
    inexpensive data servers rather than requiring million-dollar
    investments in high-end servers and storage.

Our industry is increasingly producing and exploring various Open Source
systems to provide solutions for requirements like these. However, many
such systems intending to offer degrees of Scalability and
Availability choose architectures that impose inherent limitations.

Many of these architectures have a node or a collection of nodes that
are treated as special. Think Master-Slave, NameNode-DataNode and so
forth. While each of these models serves a different set of use cases,
a common attribute across them is that they have a SPOF (Single Point
of Failure). Even when they offer some level of multiplicity to deal
with the SPOF issue, the problems continue: these special nodes can
become bottlenecks for the operations that only they are allowed to
carry out. Capacity Planning, Backup and Recovery, Fault
, Disaster Recovery and similar areas of operation all
become more complex. Moreover, the non-special nodes are typically
underutilized or entirely passive. Many of these architectures make it
virtually impossible to achieve peta-scale, multi-thousand-node clusters
with linear growth and failure tolerance atop today’s
dynamically-orchestrated infrastructure.

Enter Cassandra – A peer-to-peer, multi-datacenter active-active,
peta-scale, fault-tolerant distributed database system. Nowadays, it is
hard not to have heard of this excellent system as its user-base
continues to grow. The key to realize is that its peer-to-peer
architecture is the basis for its SPOF-free operation with the
understanding that failures are the norm in clustered environments.
Cassandra is also well known for lowering the latency relative to many
other big data systems. It is in use by over 1500 organizations
including Netflix, eBay, Instagram and CERN. To get an idea of the
scale, Apple’s production deployment has been well known in the
Cassandra community to comprise 75,000 nodes storing over 10 PB but in
September last year at the Cassandra Summit, their deployment was
reported to have exceeded 100,000 nodes.

We are great believers in Cassandra and Spark and are building a hybrid
data platform bringing the benefits of these systems to PostgreSQL. We
also hope that the benefits of the PostgreSQL platform will have a wider
reach through this. Our distribution, Postgres by BigSQL, provides easy
access to these two systems through our FDW extensions CassandraFDW and
HadoopFDW. The HadoopFDW extension provides not just access to Hadoop
but also to Spark which uses the same underlying network protocol and
SQL parser.

The combined array of advanced features that these two FDWs support is
impressive: write support (INSERT/UPDATE/DELETE), predicate pushdown,
IMPORT FOREIGN SCHEMA, and JOIN pushdown. We believe that of all the
externally-maintained FDWs, these two FDW extensions represent the
cutting-edge in terms of the PostgreSQL FDW technology as an
implementation of SQL/MED for big data systems.

With that context, we will focus on the CassandraFDW in the next blog
post in this series.

pgBackRest with PostgreSQL Sandbox on Debian / Ubuntu

pgBackRest is one of the most powerful backup solutions available for PostgreSQL. It has enterprise level features like compression, multiple channels (threads) of backup execution, incremental and differential backups etc.
The official documentation is Debian-centric in its focus. I wanted to test it out with the PostgreSQL sandbox from the BigSQL project.

Setting up PostgreSQL Sandbox and Installing pgBackRest

The BigSQL Project makes it easy to install and setup PostgreSQL and its associated components across different operating systems. In this document, we are going to look at how to set it up on Ubuntu 14.04. Linux binaries of the sandbox can be downloaded from the BigSQL download page

The sandbox installation requires only the unpacking of the downloaded file.

tar -xvf bigsql-9.5.3-5-linux64.tar.bz2
cd bigsql/

Using the command line utility (pgc) supplied with the sandbox, its very easy to initialize and start a PostgreSQL instance.

./pgc init pg95
./pgc start pg95

A PostgreSQL instance should now be up and running.
The same pgc utility can be used to install pgBackRest.

./pgc install backrest

Install Perl Dependencies

An important aspect to keep in mind is that pgBackrest is written in Perl and has many dependencies on different perl libraries and modules.
An easy way to install all the dependencies in one shot is to instruct the apt-get utility to install one of the leaf components in the dependency chain.

sudo apt-get install libdbd-pg-perl

This command should fetch all the perl dependencies of pgBackRest.

Setting Up a Backup Repository Directory

Set up a backup repository directory for pgBackRest with the following commands.

sudo mkdir /var/log/pgbackrest
sudo chmod 750 /var/log/pgbackrest

IMPORTANT for this test:

  1. pgbackrest and the postgres server process should run as the same OS user.
  2. The backup repository directory should be owned by the same OS user.

Change the ownership of the repository directory to the user under which the postgres process is running. If the user is “postgres” and the group is “postgres” then:

sudo chown -R postgres:postgres /var/log/pgbackrest

pgBackRest configuration

sudo vi /etc/pgbackrest.conf

Append the following entries to this file.



Note: if the entries already exist, modify them accordingly.

Change the ownership of this configuration file to the OS user that owns the postgres and pgbackrest process

sudo chown -R postgres:postgres /etc/pgbackrest.conf
chmod +x pgbackrest.conf

Modification Database Parameters

The archive_command needs to be modified to use pgbackrest. If the pgbackrest executable doesn’t exist in the path, please make sure that the full path is mentioned

alter system set archive_command = '/home/postgres/bigsql/backrest/bin/pgbackrest --stanza=demo archive-push %p';

A few other parameters that are also important for the proper working of pgBackRest:

alter system set archive_mode=on;
alter system set listen_addresses = '*';
alter system set max_wal_senders=3;
alter system set wal_level = 'hot_standby';

Modification of all these parameters requires a restart of the PostgreSQL instance.

./pgc restart pg95

In the event that our operating system user doesn’t exist as a superuser in our database, we need to create the user and assign superuser privileges

postgres=# create user vagrant with password 'vagrant';
postgres=# alter user vagrant with superuser;

Backing up database using pgBackRest

pgBackRest uses .pgpass file for authentication.
Add a line to .pgpass with the password of the superuser in the following format:


once this is done, we are ready to backup the PostgreSQL instance.

backrest/bin/pgbackrest --stanza=demo --db-socket-path=/tmp --log-level-console=info backup

Restoring from backup

Imagine a scenario where the files in your data directory are corrupt or lost and you want to restore it from backup.
The first step is to bring down the PostgreSQL instance. This should release all file descriptors pointing to the current data directory.

Clean up the Data directory:
Before restoring the backup make sure that the data directory is clean and is stored on a reliable medium. The full path to the new data directory should be the same as the previous one (we can override this default, but for the sake of simplicity lets assume that the location remains the same).

Run the pgBackRest “restore” command to restore the data directory from the latest backup.

backrest/bin/pgbackrest --stanza=demo --db-socket-path=/tmp --log-level-console=info restore

Now we should be able to start up the PostgreSQL instance with the restored data directory.

./pgc start pg95

Our PostgreSQL cluster is now back online from the backup we restored.

MySQL Foreign Data Wrapper : A quick tour

Data centers are no longer dominated by a single DBMS. Many companies have heterogeneous environments and may want their Postgres database to talk to other database systems. Foreign Data Wrappers can be the right solution for many scenarios. The BigSQL Project provides a well tested, ready to use MySQL FDW with Postgres. This makes life easy for a DevOps or DataCenter person.

Here is a quick tour on how to configure Foreign Data Wrappers for MySQL, so that Postgres can query a MySQL table. For this quick guide, I use a CentOS Linux machine. This, or a similar setup, should work fine on all other operating systems.

Setting up a test MySQL server for the test

In this demo I’m going to create a table in MySQL  which should be available to Postgres though the FDW.
The FDW can talk to any MySQL distribution including Oracle’s MySQL, Percona Server or MariaDB. I’m going to use MariaDB, which is more community friendly.

Install MariaDB Server and Start the service

$ sudo yum install mariadb-server.x86_64
$ sudo systemctl start mariadb

Connect as root user of mariadb and create a database

$ mysql -uroot
MariaDB [(none)]> create database postgres;

Connect to Database and create a table

MariaDB [(none)]> use postgres;
MariaDB [postgres]> create table t1m(id int,name varchar(30));

Insert some data in the table:

MariaDB [postgres]> insert into t1m values (1,'abc');
Query OK, 1 row affected (0.04 sec)

MariaDB [postgres]> insert into t1m values (2,'def');
Query OK, 1 row affected (0.00 sec)

MariaDB [postgres]> insert into t1m values (3,'hij');
Query OK, 1 row affected (0.03 sec)

Setting up Postgres Database

Install Postgres

For this test, I’m going to use the Postgres DevOps Sandbox from the BigSQL project.
Download the Sandbox from BigSQL
Since this is a sandbox, you just need to unpack it

$ tar -xvf bigsql-9.5.3-5-linux64.tar.bz2

Install MySQL FDW

Go to the unpacked directory and invoke the bigsql command line tool to install MySQL FDW

$ cd bigsql
$ ./pgc list
Category | Component | Version | Status | Port | Updates
PostgreSQL pg92 9.2.17-5 NotInstalled
PostgreSQL pg93 9.3.13-5 NotInstalled
PostgreSQL pg94 9.4.8-5 NotInstalled
PostgreSQL pg95 9.5.3-5 NotInitialized
Extensions cassandra_fdw3-pg95 3.0.0-1 NotInstalled
Extensions hadoop_fdw2-pg95 2.5.0-1 NotInstalled
Extensions mysql_fdw2-pg95 2.1.2-1 NotInstalled
Extensions oracle_fdw1-pg95 1.4.0-1 NotInstalled
Extensions orafce3-pg95 3.3.0-1 NotInstalled
Extensions pgtsql9-pg95 9.5-1 NotInstalled
Extensions pljava15-pg95 1.5.0-1 NotInstalled
Extensions plv814-pg95 1.4.8-1 NotInstalled
Extensions postgis22-pg95 2.2.2-2 NotInstalled
Extensions slony22-pg95 2.2.5-2 NotInstalled
Extensions tds_fdw1-pg95 1.0.7-1 NotInstalled
Servers bam2 1.5.0 NotInstalled
Servers cassandra30 3.0.6 NotInstalled
Servers hadoop26 2.6.4 NotInstalled
Servers hive2 2.0.1 NotInstalled
Servers pgbouncer17 1.7.2-1 NotInstalled
Servers pgha2 2.1b NotInstalled
Servers pgstudio2 2.0.1-2 NotInstalled
Servers spark16 1.6.1 NotInstalled
Servers tomcat8 8.0.35 NotInstalled
Servers zookeeper34 3.4.8 NotInstalled
Applications backrest 1.02 NotInstalled
Applications birt 4.5.0 NotInstalled
Applications ora2pg 17.4 NotInstalled
Applications pgbadger 8.1 NotInstalled
Frameworks java8 8u92 NotInstalled
$ ./pgc install mysql_fdw2-pg95
Get:1 mysql_fdw2-pg95-2.1.2-1-linux64
Unpacking mysql_fdw2-pg95-2.1.2-1-linux64.tar.bz2

Note:- We can use the same command line tool to initalize a new postgres cluster

$ ./pgc init pg95

## Initializing pg95 #######################

Superuser Password [password]:
Confirm Password:
Giving current user permission to data dir

Initializing Postgres DB at:
-D "/home/vagrant/bigsql/data/pg95"

Using PostgreSQL Port 5432

Password securely remembered in the file: /home/vagrant/.pgpass

to load this postgres into your environment, source the env file:

Create the extension in the postgres database

create extension mysql_fdw;

Create foreign server

postgres=# CREATE SERVER mysql_svr
OPTIONS (host 'localhost', port '3306');

Create foreign table

postgres=# CREATE FOREIGN TABLE mysql_tab (
postgres(# id int,
postgres(# name varchar(30)
postgres(# )
postgres-# SERVER mysql_svr
postgres-# OPTIONS (dbname 'postgres', table_name 't1m');

Create user mapping

postgres-# SERVER mysql_svr
postgres-# OPTIONS (username 'root');

(if your user is having password authentication to mysql, you have to pass that also in the format (username ‘username’, password ‘password’))

Now everything is set, You can test by querying the table.

postgres=# select * from mysql_tab;
id | name
1 | abc
2 | def
3 | hij
(3 rows)

Note:- MySQL FDW for Postgres requires MySQL Client Libraries. Please make sure that is there in the LD_LIBARY_PATH. if this file name is something different like “″, you may have to create a softlink with name “”