Screen Shot 2016-10-14 at 10.35.27 AM

Holy Easy PostgreSQL deployment

Holy Easy PostgreSQL deployment!

In case you missed it, the BigSQL team released an awesome package manager for installing and configuring PostgreSQL and many related, useful components. The package manager can be found here:

Playfully named pgc, for ‘pretty good command line’, pgc is a utility similar to yum or apt-get that allows you to install, configure, update and manage Postgres related components including foreign data wrappers, stored procedure languages, connectors, devops tools, HA tools and monitoring tools. Common uses:

  • Provision Postgres (9.2 through 9.6, including multiple versions on same server)
  • Installing pgBouncer, Backrest, and other community projects
  • Scripting configurations in chef or other devops tools

PGC runs on Linux, Windows and OS X and supports the same exact cli so it is an ideal provisioning/management tool for multi OS environments.

PGC not only allows you to get and install these components but you can use pgc to update each component as new updates become available. Read more

Integrating Cassandra, Spark, PostgreSQL and Hadoop as a Hybrid Data Platform – Part 2

In the previous post, we provided a business and architectural
background for the Postgres FDWs that we are developing for Spark,
Hadoop and Cassandra. In particular, we highlighted the key benefits of
bringing Cassandra and PostgreSQL together.

With this post, we will start taking a more technical look at the
Cassandra FDW.

The C* FDW speaks natively with Cassandra on two levels; it:

  • uses the binary CQL protocol instead of the legacy Thrift protocol.
  • directly relies on the DataStax Native C++ driver for Cassandra.

The DataStax C++ driver is performant and feature-rich; various load
balancing and routing options are available and configurable. We are
already making use of some of these features and plan to provide more of
these to our users.

While exploring Cassandra as a Postgres user, the defaults such as
automatic inclusion of the ALLOW FILTERING clause are useful as they
allow gradual familiarity; especially useful in small development
environments. Our intent is to support tuning for large environments
but to default to a configuration geared toward existing PostgreSQL

At this point, let us consider whether we are introducing a new SPOF by
using PostgreSQL with a Cassandra system. We believe not; a PostgreSQL
node at the edge of a Cassandra cluster – as a transactional or open-SQL
end point – is not at all the same as a central master node critical to
the operation of an entire cluster. We see some trade-offs but mostly
we see benefits of bringing PostgreSQL to Cassandra in this way as we
intend to elucidate through this series.

In the next post, we will show you how to get started with the Cassandra

Improvement in read-write transactions turning off update_process_title

Improve PostgreSQL on Windows performance by 100%

It sounds like click-bait, or one of those late night TV promotions – “Improve your database performance by 100% – by changing just this one setting!” But in this case, it’s true – you can drastically improve PostgreSQL on Windows performance by changing one configuration setting – and we made this the default in our Postgres by BigSQL distribution for 9.2 thru 9.6.

tl;dr – if you have high query load, change “update_process_title” to ‘off’ on Windows, and get 100% more throughput.

Improve postgresql performance by turning off update_process_title

Performance Improvement by turning off update_process_title

Most Postgres DBA’s already know that they need to tune settings for shared buffers, WAL segments, checkpoints, etc, to get the best performance from their database. If you are running PostgreSQL on Windows, there’s another setting that you need to look at, specifically “update_process_title”. Changing this setting from “on” to “off” can improve throughput on a heavy query load by close to 100%

We ran a series of benchmark tests in our performance lab and you can see the dramatic improvement in the graphs displayed. We tested PostgreSQL 9.5 on a 16-core Windows server with fast SSD drives using a standard pgbench run in both read-only and read-write modes. Scaling from 4 to 40 clients shows a plateau in throughput (measured by TPS) after 8 clients when the setting is set to “on”. Changing the update_process_title setting to “off” allows PostgreSQL to continue to scale throughput, showing increasing TPS up to 40 clients. The throughput at 32 read-only clients increases from 20K TPS to 58K TPS (180% higher) and at 40 clients continues to climb to 76K TPS (270% higher).

Improvement in read-write transactions turning off update_process_title

Improvement in read-write transactions turning off update_process_title

This performance gain is seen for both read-only and read-write workloads. With 32 clients, the write throughput increases from 2,700 TPS to 7,700 TPS (180% higher) and at 40 clients continues to climb to 8,200 (200% higher).

The update_process_title setting controls whether or not Postgres will update the process description that you see when querying the system list of running commands based on the current SQL statement being processed. On Linux this is done using ps, on Windows it requires the ProcessExplorer tool. Updating the process description becomes a bottleneck on Windows, and limits the throughput even on a high-end server. Not many Windows admins actually use this information on a regular basis, so unless you are actively debugging a slow or long running process using this process information, you should leave this turned off.

Takayuki Tsunakawa originally tracked down this bottleneck and created a patch for PostgreSQL 9.6 that has changed the default to be ‘off’ on Windows. We have made the same setting change in BigSQL distributions of version 9.2 thru 9.5 as well as 9.6. So even if you’re not ready to move to the new 9.6 version, when you install Postgres by BigSQL on Windows you are getting the best performance out-of-the-box.



Building PostgreSQL on Windows for Performance

Before PostgreSQL 9.1, adding additional features like data types was non-standard and awkward, but 9.1 brought extensions. By adding extensions as well as the plumbing for things like foreign data wrappers, the use of extensions have exploded over the last few years. This is great for people who are using PostgreSQL on Linux or similar type environments, but the people in Windows have been pretty much out of luck. Most people need to hope the extension they want to use is popular enough that someone skilled with Windows build environments decided to release the extension on Windows. It just was not as simple as Linux where you just run a “make install”.

For years, there has been the ability to replicate the Linux build experience on Windows with the Mingw project, but historically the resulting binaries have been significantly slower than the binaries produced by the Microsoft compiler. Since I last played with compliers on Windows, I have been pleasantly surprised that the open source build tools have caught up with the commercial tools and in some ways surpassed them.

To check the performance of the resulting binaries from the compilers, I tried a couple of CPU intensive tests. The first one was just a SELECT only pgbench test. The second was running the square roots of random numbers and then sorting the results. For the Windows binaries built with Microsoft Visual Studio, I used the PostgreSQL Windows installer from the PostgreSQL community site at The Mingw-w64 build came from the site at

The version strings of the 2 binaries are below.

test=# select version();
 PostgreSQL 9.5.1, compiled by Visual C++ build 1800, 64-bit
(1 row)

test=# select version();
 PostgreSQL 9.5.1 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit
(1 row)

For these tests, the only tuning of the postgresql.conf file was to set shared_buffers to 2GB. All other settings were left at the defaults.

The SELECT only pgbench test was conducted with the following command:

pgbench -n -S -c 10 -t 100000 -U postgres test
MSVC 34246 TPS
GCC 34706 TPS

That is a minor 1.3% performance improvement. That is not a ground breaking gain, but the gcc built binaries were consistently faster.

The second test was a bit more CPU intensive running the following query:

select sqrt((random() * g)::numeric) from generate_series(1, 1000) g order by 1;

This was put in a text file and run through pgbench with the following command:

pgbench.exe -n -f test.sql -c 8 -t 10000 -U postgres test

This resulted in a larger performance gain of 3.8%.

I started this testing with trying to find an easier way to build extensions on Windows with hopes of not taking too much of a performance hit, but ended up finding that the easier way was actually a littler faster. I am thinking its time to ditch those proprietary build tools and stick with open source.

Video on FDW by Jim

Video – Behold! The Power of Foreign Data Wrappers!

If you’ve never heard of PostgreSQL’s “foreign data wrappers” — you’ve been missing out on a very powerful tool to connect and integrate heterogenous databases. By leveraging FDW’s, you can put PostgreSQL at the center of a integrated database federation and read and write data from other PostgreSQL servers, other relational databases (Oracle, MySQL), many NoSQL systems (MongoDB, Hadoop) and even Twitter!

Heck – FDW’s are so powerful that MongoDB just released their BI connector as a PostgreSQL FDW!! (excellent write up by John De Goes of SlamData)

At this year’s PostgresOpen, Jim Mlodgenski from OpenSCG gave a great talk about Federated PostgreSQL using FDW’s. Watch this video to learn the basics of what FDW’s are, how they work, and how you can easily combine data from PostgreSQL, other DBMS’s, and NoSQL db’s all in a single SQL query.

Thanks to Daily Tech Video for highlighting this talk in their post [Video 378] Jim Mlodgenski: PostgreSQL Federation.


PostgreSQL Studio 2.0 is out

PostgreSQL Studio is a fully open source web based UI for PostgreSQL that is ideal for Postgres in the cloud. With the impending release of PostgreSQL 9.5, PostgreSQL Studio 2.0 has just been released supporting some of the new features in Postgres. The big new feature that requires UI changes is support for Row Level Security. With Row Level Security, policies need to be created on each table for the role accessing the data. These policies can become rather complex so seeing tied to the table can greatly help with understanding the security placed on your data.


In addition to the new features, we also updated the versions of the libraries and stopped support for JDK 6.

You can download PostgreSQL Studio at:

And while I have you attention…
The call for presentations for PGConf US 2016 is now open! Submissions will be allowed through January 31st, 2016. All speakers will be notified by February 15, 2016. Please note that as with our past conferences, there will be no extensions of the submission deadline. For more information and submission details, please visit

PL/pgSQL Profiler

PostgreSQL PL/pgSQL Profiler

Some of our customers really like writing their business logic inside of PostgreSQL. While this is really cool that PostgreSQL is capable of handling, trying to performance tune large amounts of PL/pgSQL code becomes unwieldy. If your functions are small enough, it’s possible add some logging statements, but that is not possible with hundreds or even thousands lines of legacy code.

Several years ago as part of the PL/pgSQL debugger, Korry Douglas wrote a PL/pgSQL profiler, but over the years, it seems to have suffered from bit rot. A profiler for PL/pgSQL code helps solve a lot of problems and gives us insight into how your server side code is running.

Below is an example output from the profiler showing how many times each line of code executed and what was the time taken for each line.

PL/pgSQL Profiler



The plprofiler has not been tested is many different environments yet, so be careful in rolling it out to production servers. Check it out and let me know if you find any issues.


Synchronous Commit

Synchronous Commit

While I was at PGConf.EU a couple of weeks ago in Madrid, I attended a talk by Grant McAlister discussing Amazon RDS for PostgreSQL.  While it was interesting to see how Amazon had made it very simple for developers to get a production PostgreSQL instance quickly, the thing that really caught my eye was the performance benchmarks comparing the fsync and synchronous commit parameters.

sync_commitFrighteningly, it is not that uncommon for people to turn off fsync to get a performance gain out of their PostgreSQL database. While the performance gain is dramatic, it carries the risk that your database could become corrupt. In some cases, this may be OK, but these cases are really rather rare. A more common case is a database where it is OK to lose a little data in the event of a crash. This is where synchronous commit comes in. When synchronous commit is off, the server returns back success immediately to the client, but waits to flush the data to disk for a short period of time. When the data is ultimately flushed it is still properly sync to disk so there is no chance of data corruption. The only risk if the event of a crash is that you may lose some transactions. The default setting for this window is 200ms.

In Grant’s talk, he performed a benchmark that showed turning off synchronous commit gave a bigger performance gain than turning off fsync. He performed an insert only test so I wanted to try a standard pgbench test. I didn’t come up with the same results, but the I still saw a compelling case for leaving fsync on while turning off synchronous commit.

I ran a pgbench test with 4 clients and a scaling factor of 100 on a small EC2 instance running 9.3.5. What I saw was turning off fsync resulted in a 150% performance. Turning off synchronous commit resulted in a 128% performance gain. Both are dramatic performance gains, but the synchronous commit option has a lot less risk.


Speaking of conferences, the call for papers is open for PGConf US 2015. If there is a topic you’d like to present in New York in March, submit it here.


Stinger Delivers Fast Hive SQL via YARN & TEZ

The latest BigSQL bundle now includes Stinger and Tez to dramatically improve performance and extend the SQL dialect. In the latest distribution of BigSQL Release 2, it is extremely easy to get started exploring Tez and Stinger in your own sandbox environment. Taking advantage of YARN, Stinger is now delivering on its three main goals of Speed, SQL Enhancememnts & Scale..



The Stinger initiative (Hive 0.13 & Tez-0.3) calls for a 100X performance over current Hive (Hive 0.10) batch queries. Key developments helping to achieve these results are:

  • Leveraging ORC Files these columnar formats allow for faster read, writes and processing of Hive data. The format supports predicate pushdowns, compression and even allows light-weight indexes to be stored within the files.
  • Continued Optimizations in the Hive Query Engine Continuous enhancements include avoiding unneeded writes, support for hot containers (saves launch shutdown overhead), in memory cache, leveraging wider Metastore statistics have all helped improve latency and throughput.



Full support DECIMAL, TIMESTAMP, DATE & VARCHAR datatypes as well as a rich set of semantic improvements.



Designed for large scale data environments, Stinger’s goal is to deliver even when you are analyzing Petabytes of data.


Stinger in Action!

Below is a screenshot from the BigSQL YARN console showing the execution of a Tez job.

At OpenSCG we focus on helping you get the most value and leverage out of your data environments. Its an exciting time in DATA and we thrive on helping our clients understand the latest changes in data technology. We realize that there are great opportunities in new NoSQL and clustered data solutions and take pride in helping you evaluate your data options.

BigSQL Adds Cassandra to the Hadoop and PostgreSQL Bundle

The BigSQL developer’s bundle has added a significant new component: Cassandra 2.0.4.  All of the bundles and VM’s now allow developers to get up and running with the latest Hadoop ecosystem and Cassandra.   Join our free Beta program now!  

As a significant number of our clients are leveraging Cassandra for Highly Available Multi Partition solutions, it was a natural fit to add to our bundle.    

In addition to Cassandra the latest beta includes all of the standard BigSQL components and many enhancements:

  • Continued HadoopFDW development (Postgres  – Hadoop Foreign data wrapper) fixes and improvements
  • JDK 7 is now a pre-req on all platforms
  • upgrade PostgreSQL from 9.3.1 to 9.3.2
  • upgrade DataFu from 1.1.1 to 1.2.0
  • upgrade PostgresqlStudio from 1.0 to 1.1
  • upgrade Hue from 3.0.0 to 3.5.
  • autostart Oozie & Hue in the vm
  • upgrade Sqoop from 1.4.4 to 1.99.3 (and autostart it)
  • include pgTSQL functionality from 
  • compact our quickstart virtual machine from 3 GB to 1.5 GB
  • configure eclipse for hadoop in the vm
At OpenSCG we are focus on helping you get the most value and leverage out of your data environments.  Its an exciting time in DATA and we thrive on helping our clients understand the latest changes in data technology.  We realize that there are great opportunities in new NOSQL and clustered data solutions and take pride in helping you evaluate your data options.