Extending Postgres by BigSQL

Recently we had a customer request to build a custom extension against Postgres by BigSQL distribution. Even though BigSQL ships with a large set of commonly used extensions and good collection of FDWs, these kind of user build requirements always crop up, based on how powerful the Postgres extension model is.

BigSQL makes it easy to build custom extensions for new requirements by including all of the required header files and libraries along with the Postgres by BigSQL installation. This is one of the key benefits of the open source BigSQL distribution as opposed to a proprietary open source-based approach.

One may think that, it will be a challenge because a BigSQL user has the freedom to relocate the the installation anywhere he/she wants or have a DevOps Sandbox. That means Postgres related files can exist anywhere the user wants. In this blog I’ll show you how easy it is. Here is an example of how to build extensions, where postgres binaries sits in my user home directory (I am using DevOps Sandbox on CentOS 6.7)

Building extension from Source

Extension build will use pg_config to understand the the location of the files. So only thing we have to do is to set the path to pg_config wherever it is:

export PATH=/home/vagrant/bigsql/pg95/bin:$PATH

After that, building any custom extension should be a breeze.
Get the required branch of extension:

git clone -b VERSION3_1_STABLE https://github.com/ossc-db/pg_bulkload.git

Install any external library requirements

$sudo yum install openssl-devel.x86_64
$sudo yum install readline-devel

(of course, we need build tools like gcc and make)


The build should go through without any issues.

Getting Extension from PGXN (PostgreSQL Extension Network)

The Postgres community has many extensions available, and PGXN makes it easy to find and install open source extensions published by other community members. Postgres by BigSQL works perfectly with PGXN.

Just like above case, we need to add the path to pg_config before executing the PGXN

$ export PATH=/home/vagrant/bigsql1/pg95/bin:$PATH

Now we can install any extenion we need like:

$ pgxn install temporal_tables

Once installation complete, extension will be ready to use

postgres=# create extension temporal_tables;

Up Next

We just covered how easy it is to build and install PostgreSQL extensions on Linux. In my next blog post, I’ll show how to build and install extensions on Windows. Postgres by BigSQL uses Mingw (and gcc) to compile Postgres and related extensions on Windows which is great because it keeps the full tool chain open source (no proprietary compiler required) and enables extension authors to support Windows without making an MSVC specific build system or code and configuration changes.

Relocatable PG RPMs

Relocatable PostgreSQL RPMs – You asked and we delivered!

In our years of dealing with PostgreSQL users, one of the main gripes they have with PostgreSQL is not a missing feature or capability but rather it is the fact that the provided rpms are non relocatable rpms. The reasons for wanting relocatable rpms are many including the company they work for has specific standards or they are installing in a lab with other installations and they want to keep their install separate or they just simple want to place things in a more logical/accessible place.

Before now, the only ways available to users to do this were to use the graphical installer or build Postgres from source.  If you are setting up a Devops environment on Linux or the cloud, using a graphical installer doesn’t work well and, let’s be honest, who wants to be pestered  with advertising while doing an installation? Building Postgres from source is definitely another option but most DBAs don’t have the time to deal with working through that process and handling any errors that might occur.  They could hand off the task to their dev or sys admin team but then they will need to wait for that to happen and/or lose control of the way it is built. And besides, you want a consistent binary build on each server in your organization, not a compiled from source version that uses different versions of libraries or config settings on different servers over time.

So, in our Postgres by BigSQL distribution, we decided to make it easy for users to install Postgres wherever they want simply by using the 100% open source rpm that we build and make available to everyone. Our rpm does not modify system libraries and by default installs to a user friendly location:


But you have total control - you simply use the —prefix <path> option to tell the package manager to install the software where you want.  For example:
# rpm -ivh --prefix /home/postgres/PG96 postgresql-9.6.0-1-x64-bigsql.rpm
warning: postgresql-9.6.0-1-x64-bigsql.rpm: Header V3 DSA/SHA1 Signature, key ID 38379aec: NOKEY
Preparing...                   ################################# [100%]
Updating / installing...
   1:postgresql96-9.6.0-1      ################################# [100%]
PostgreSQL 9.6 is now installed in /home/postgres/PG96.

To initialize, run /etc/init.d/postgresql-96 start as root user.

And there you have it….PostgreSQL installed where you want it!

Now that you have PostgreSQL installed, be sure to check out how to take the next steps and initialize the PostgreSQL database and use it here:




Screen Shot 2016-10-14 at 10.35.27 AM

Holy Easy PostgreSQL deployment

Holy Easy PostgreSQL deployment!

In case you missed it, the BigSQL team released an awesome package manager for installing and configuring PostgreSQL and many related, useful components. The package manager can be found here: https://www.bigsql.org/package-manager.jsp.

Playfully named pgc, for ‘pretty good command line’, pgc is a utility similar to yum or apt-get that allows you to install, configure, update and manage Postgres related components including foreign data wrappers, stored procedure languages, connectors, devops tools, HA tools and monitoring tools. Common uses:

  • Provision Postgres (9.2 through 9.6, including multiple versions on same server)
  • Installing pgBouncer, Backrest, and other community projects
  • Scripting configurations in chef or other devops tools

PGC runs on Linux, Windows and OS X and supports the same exact cli so it is an ideal provisioning/management tool for multi OS environments.

PGC not only allows you to get and install these components but you can use pgc to update each component as new updates become available. Read more

Auditing database changes using pgaudit

It is a very common requirement in the database world to audit database activities such as Sarbanes-Oxley financial regulations or service level impacts. Reason for auditing, granularity of audit information and type of auditing may differ. The PostgreSQL Audit extension (pgaudit) for Postgres provides most of the necessary features for auditing. The BigSQL project ships the ready-to-use pgaudit extension along with Postgres binaries for different platforms.


Let us explore how easy it is to set it up along with Postgres 9.6

PostgreSQL should be loading the pgaudit extension libraries during start up. To achieve this, we just need to specify a parameter and restart the PostgreSQL instance.

 postgres=# alter system set shared_preload_libraries='pgaudit';
$ ./pgc restart pg96
pg96 stopping
pg96 starting on port 5432

Its good to verify this step once postgres instance comes up

postgres=# select name,setting,pending_restart from pg_settings where name='shared_preload_libraries';
name                      | setting | pending_restart
shared_preload_libraries  | pgaudit | f

Example Use Case

A typical case of auditing is the change control in a controlled production environment. Security policies may require every change to database structure (DDLs) be audited / auditable.

Enabling DDL auditing at Instance Level

 postgres=# alter system set pgaudit.log = 'ddl';

This doesn’t require a bounce.

**pgaudit allows us to have more granularity at the database level or even at the particular role level. Please see the official documentation for all options available and extensive features.

Testing the DDL Audit.

Lets try creating a new table

postgres=# create table t1 (id int,name varchar(30));

Postgres keeps all audit information in postgresql’s standard log file as specified by log_directory and log_filename parameters.

Now the following audit lines appear in the log file:

2016-09-15 08:05:42 EDT [4143]: [9-1] user=postgres,db=postgres,app=psql,client=[local] LOG:  AUDIT: SESSION,2,1,DDL,CREATE TABLE,,,"create table t1 (id int,name varchar(30));",<not logged>

Let’s try altering the table:

postgres=# alter table t1 alter  column name type varchar(50);

The following line appears in the log file:

2016-09-15 08:08:10 EDT [4143]: [18-1] user=postgres,db=postgres,app=psql,client=[local] LOG:  AUDIT: SESSION,3,1,DDL,ALTER TABLE,,,alter table t1 alter  column name type varchar(50);,<not logged>

Now let’s try creating a function, I am going to use Javascript (V8):

postgres=# CREATE FUNCTION plv8max (a integer, b integer)
postgres-# RETURNS integer AS $$
postgres$#   if (a > b)
postgres$#     return a;
postgres$#   else
postgres$#     return b;
postgres$# $$ LANGUAGE plv8;

This is resulting in following audit entry:

2016-09-15 08:10:15 EDT [4143]: [19-1] user=postgres,db=postgres,app=psql,client=[local] LOG:  AUDIT: SESSION,4,1,DDL,CREATE FUNCTION,,,"CREATE FUNCTION plv8max (a integer, b integer)
RETURNS integer AS $$
if (a > b)
return a;
return b;
$$ LANGUAGE plv8;",<not logged>


The PgAudit extension can audit not only DDLs, But SELECTs, INSERTS, UPDATES, etc. as well. We should also note that extensive auditing can cause excessive I/O in the system and hurt the overall system performance.

How to install and configure the Oracle Foreign Data Wrapper

How to install and configure the Oracle Foreign Data Wrapper

Preinstall Checklist:

   – If you have not already, please download and install “Microsoft Visual C++ 2015 Redistributal” and Java Runtime Environment (version 7 or 8)

Oracle Instant Client install and config:

          01. Download and unzip into the same directory:




          02. Create “Oracle Instant Client” in Program Files and copy instantclient_12_1 into it.

          03. Double click on “odbc_install.exe”

          04. Create system variables:

                   name: TNS_ADMIN

                   value: C:\Program Files\Oracle Instant Client\instantclient_12_1

                   name: ORACLE_HOME

                   value: C:\Program Files\Oracle Instant Client\instantclient_12_1

          05. Edit a system variable:

                   name: PATH

                   value: C:\Program Files\Oracle Instant Client\instantclient_12_1

          06. (optional). Attempt to run “sqlplus” from the command line. If all is properly configured, you should be prompted to enter a username/password. You can exit “sqlplus” at this point if you                  wish and continue with this guide.

NOTE 1: For Windows 7, to edit system variables, right click on “My Computer”, then click on “Advanced System Settings” in the left of the screen. A popup window titled “System Properties” will appear. Click on the “Advanced” tab and look to the bottom of the window. Above “OK” and “Cancel”, there will be an option titled “Environment Variables”. Click on it and find the pane “System Variables” in the newly created popup window. Here you can create and edit your system variables.

NOTE 2: If you recieve an error concerning “MSvcr100.dll” being missing, it may be necesarry to copy this dll out of your /Java/VERSION/bin folder and into your oracle instant client directory

Assuming PostgreSQL has previously been installed and properly configured,

Configure the Oracle FDW

          01. Open up a PSQL shell, and enter:

                   “CREATE EXTENSION oracle_fdw;”

          Expected Output: CREATE EXTENSION

Example configuration

This instructions below are an example of how to setup PostgreSQL to connect to an oracle database through the Oracle Foreign Data Wrapper. Your specific settings will be different, but the syntax should remain the same. For our purposes, I will show you how to connect to an oracle database with the “SCOTT” schema. This installed. This part of the how-to assumes your oracle database is already set up and configured to accept remote database connections.

          01. CREATE SCHEMA scott;

                   Expected Output: CREATE SCHEMA

          02. CREATE SERVER oracle FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver ‘//’ );

                   Expected Output: CREATE SERVER

          03. CREATE USER MAPPING FOR postgres SERVER oracle OPTIONS (user ‘SCOTT’, password ‘tiger’);

                   Expected Output: CREATE USER MAPPING

          04. IMPORT FOREIGN SCHEMA “SCOTT” from SERVER oracle INTO scott;

                   Expected Output: IMPORT FOREIGN SCHEMA

          05. set search_path=’scott';

                   Expected Output: SET

Now we can explore the imported data


Building PostgreSQL on Windows for Performance

Before PostgreSQL 9.1, adding additional features like data types was non-standard and awkward, but 9.1 brought extensions. By adding extensions as well as the plumbing for things like foreign data wrappers, the use of extensions have exploded over the last few years. This is great for people who are using PostgreSQL on Linux or similar type environments, but the people in Windows have been pretty much out of luck. Most people need to hope the extension they want to use is popular enough that someone skilled with Windows build environments decided to release the extension on Windows. It just was not as simple as Linux where you just run a “make install”.

For years, there has been the ability to replicate the Linux build experience on Windows with the Mingw project, but historically the resulting binaries have been significantly slower than the binaries produced by the Microsoft compiler. Since I last played with compliers on Windows, I have been pleasantly surprised that the open source build tools have caught up with the commercial tools and in some ways surpassed them.

To check the performance of the resulting binaries from the compilers, I tried a couple of CPU intensive tests. The first one was just a SELECT only pgbench test. The second was running the square roots of random numbers and then sorting the results. For the Windows binaries built with Microsoft Visual Studio, I used the PostgreSQL Windows installer from the PostgreSQL community site at http://www.postgresql.org/download/windows/. The Mingw-w64 build came from the bigsql.org site at http://www.bigsql.org/se/postgresql/packages.jsp

The version strings of the 2 binaries are below.

test=# select version();
 PostgreSQL 9.5.1, compiled by Visual C++ build 1800, 64-bit
(1 row)

test=# select version();
 PostgreSQL 9.5.1 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit
(1 row)

For these tests, the only tuning of the postgresql.conf file was to set shared_buffers to 2GB. All other settings were left at the defaults.

The SELECT only pgbench test was conducted with the following command:

pgbench -n -S -c 10 -t 100000 -U postgres test
MSVC 34246 TPS
GCC 34706 TPS

That is a minor 1.3% performance improvement. That is not a ground breaking gain, but the gcc built binaries were consistently faster.

The second test was a bit more CPU intensive running the following query:

select sqrt((random() * g)::numeric) from generate_series(1, 1000) g order by 1;

This was put in a text file and run through pgbench with the following command:

pgbench.exe -n -f test.sql -c 8 -t 10000 -U postgres test

This resulted in a larger performance gain of 3.8%.

I started this testing with trying to find an easier way to build extensions on Windows with hopes of not taking too much of a performance hit, but ended up finding that the easier way was actually a littler faster. I am thinking its time to ditch those proprietary build tools and stick with open source.


Stinger Delivers Fast Hive SQL via YARN & TEZ

The latest BigSQL bundle now includes Stinger and Tez to dramatically improve performance and extend the SQL dialect. In the latest distribution of BigSQL Release 2, it is extremely easy to get started exploring Tez and Stinger in your own sandbox environment. Taking advantage of YARN, Stinger is now delivering on its three main goals of Speed, SQL Enhancememnts & Scale..



The Stinger initiative (Hive 0.13 & Tez-0.3) calls for a 100X performance over current Hive (Hive 0.10) batch queries. Key developments helping to achieve these results are:

  • Leveraging ORC Files these columnar formats allow for faster read, writes and processing of Hive data. The format supports predicate pushdowns, compression and even allows light-weight indexes to be stored within the files.
  • Continued Optimizations in the Hive Query Engine Continuous enhancements include avoiding unneeded writes, support for hot containers (saves launch shutdown overhead), in memory cache, leveraging wider Metastore statistics have all helped improve latency and throughput.



Full support DECIMAL, TIMESTAMP, DATE & VARCHAR datatypes as well as a rich set of semantic improvements.



Designed for large scale data environments, Stinger’s goal is to deliver even when you are analyzing Petabytes of data.


Stinger in Action!

Below is a screenshot from the BigSQL YARN console showing the execution of a Tez job.

At OpenSCG we focus on helping you get the most value and leverage out of your data environments. Its an exciting time in DATA and we thrive on helping our clients understand the latest changes in data technology. We realize that there are great opportunities in new NoSQL and clustered data solutions and take pride in helping you evaluate your data options.

tPostgres Features Sybase & SQL Server Compatibility

We are pleased to announce version 9.3.2 of tPostgres. Find out more at http://www.tpostgres.org

The transformation of the tPostgres: Sybase & SQL Server Compatibility Project project is now
complete. We are fully based on the PGTSQL open source PostgreSQL Extension and are fully woking on
the very latest and greatest point release of PG (9.3.2).

Additionally, we have done away with the bulky and awkward 32-bit linux virtual machine. We now have safe and simple to use native command line sandbox installers for Windows, OSX & Linux.

We’ve still got documentation work to do, but there are excellent regression test transact-sql programming examples in the PGTSQL code. Please take a look for yourself and consider contributing to the project at https://bitbucket.org/openscg/pgtsql



Oozie- Hadoop Workflow Scheduler in BigSQL

This tutorial will show you how to create and Oozie Workflow using either BigSQL Hue or the Oozie command line tools, in the latest version of BigSQL. Oozie is a workflow scheduler that will allow you to manage Hadoop jobs. You can also create an Oozie workflow that incorporates shell scripts, map-reduce applications, and even Pig, Sqoop, and Hive jobs. This tutorial will just use the sample map-reduce word count job from Oozie.

First, you will need to set up the workflow.xml and libraries for the job:

   $ cd $BIGSQL_HOME/examples/more/oozie 
   $ tar -xvf oozie-examples.tar.gz

This example is in the examples/apps/map-reduce directory. The input data is examples/input-data/text/data.txt. You will need to make two changes in the file examples/apps/map-reduce/job.properties:

  • change localhost to your computer’s host name
  • change the port number from 8020 to 9000

Now, move the whole example directory to your user directory in HDFS:

   $ hadoop fs -put examples /user/cadym/.

If you want to just run the map-reduce example from the command line you would then use this command:

   $ $OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run

To check the status of that job, you can use the command:

   $ $OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie --info <Application ID>

The file workflow.xml in examples/apps/map-reduce is the description of the workflow. For example, the line below shows that when preparing to start the job, the output directory will be deleted. The job.properties file will pass variables, such as the nameNode address, the root example directory, and the output directory to the workflow.xml file.

     <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}"/>

The configuration section defines the configuration for the mapreduce job, like the mapper class that will be used:


Next, the workflow.xml file explains what needs to happen after the mapreduce action finishes. Since this is a simple example, after the mapreduce job it will just go to the end rather than to another step. If there is an error, the application will be killed with the error message shown below:

     <ok to="end"/>
     <error to="fail"/>
<kill name="fail">
   <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>

To run this Oozie job in Hue, you can just click on editor from the workflow menu in the top tool bar. Once on that page, you can click on the import button on the right side of the screen.

Here, you can give the workflow a name. Then, next to “Local workflow.xml file”, you will need to click browse and select the examples/apps/map-reduce/workflow.xml file. Next, click import.

Now, you need to edit the properties to the mr-node step of your workflow.

All you need to add here is the location of the jar file for this workflow. Click on the “..” after the “Jar name” filed.

Select the jar: examples/apps/map-reduce/lib/oozie-examples- Then click done.

Now, you can save your workflow and click submit on the left toolbar. A window will pop up asking you to specify the following variables:

This will bring you to the Oozie Dashboard where you can track the progress of each step in your workflow.

Once the job finishes, you can go the file browser and see a word count file in examples/output-data/map-reduce directory