Extending Postgres by BigSQL

Recently we had a customer request to build a custom extension against Postgres by BigSQL distribution. Even though BigSQL ships with a large set of commonly used extensions and good collection of FDWs, these kind of user build requirements always crop up, based on how powerful the Postgres extension model is.

BigSQL makes it easy to build custom extensions for new requirements by including all of the required header files and libraries along with the Postgres by BigSQL installation. This is one of the key benefits of the open source BigSQL distribution as opposed to a proprietary open source-based approach.

One may think that, it will be a challenge because a BigSQL user has the freedom to relocate the the installation anywhere he/she wants or have a DevOps Sandbox. That means Postgres related files can exist anywhere the user wants. In this blog I’ll show you how easy it is. Here is an example of how to build extensions, where postgres binaries sits in my user home directory (I am using DevOps Sandbox on CentOS 6.7)

Building extension from Source

Extension build will use pg_config to understand the the location of the files. So only thing we have to do is to set the path to pg_config wherever it is:

export PATH=/home/vagrant/bigsql/pg95/bin:$PATH

After that, building any custom extension should be a breeze.
Get the required branch of extension:

git clone -b VERSION3_1_STABLE https://github.com/ossc-db/pg_bulkload.git

Install any external library requirements

$sudo yum install openssl-devel.x86_64
$sudo yum install readline-devel

(of course, we need build tools like gcc and make)


The build should go through without any issues.

Getting Extension from PGXN (PostgreSQL Extension Network)

The Postgres community has many extensions available, and PGXN makes it easy to find and install open source extensions published by other community members. Postgres by BigSQL works perfectly with PGXN.

Just like above case, we need to add the path to pg_config before executing the PGXN

$ export PATH=/home/vagrant/bigsql1/pg95/bin:$PATH

Now we can install any extenion we need like:

$ pgxn install temporal_tables

Once installation complete, extension will be ready to use

postgres=# create extension temporal_tables;

Up Next

We just covered how easy it is to build and install PostgreSQL extensions on Linux. In my next blog post, I’ll show how to build and install extensions on Windows. Postgres by BigSQL uses Mingw (and gcc) to compile Postgres and related extensions on Windows which is great because it keeps the full tool chain open source (no proprietary compiler required) and enables extension authors to support Windows without making an MSVC specific build system or code and configuration changes.

Why Open Source over Open Source Based?

The Postgres landscape has become very competitive in recent years as the database project has matured and grown exponentially in popularity. Since more and more users are deploying Postgres to support a wide variety of applications, there is a need for experts to help make sure those deployments are successful. This has led to the birth of a number of Postgres focused companies as well as existing companies either expanding or creating a Postgres practice. All in all, this growth of expertise is a good thing for the Postgres community. More people using it makes it a better product and when more companies are using it and hiring Postgres talent, many give back to the community thus making the original Postgres project stronger and better each year. That is one of the main benefits of open source, i.e joined forces working together for the betterment of the project. The other benefit is, of course, that is it 100% free to use.

However, there is a often misunderstood or, more accurately, misrepresented side of this story. Over the years, many folks have taken the hard work of the community and turned it into something they call ‘open source based’ solutions. There is nothing actually wrong with anyone doing this and there certainly isn’t anything that prevents this in the Postgres world. The issue at hand is how these solutions are represented. Many of these companies incorrectly market themselves as open source companies in an effort to ride the open source wave and take advantage of the market’s momentum. These so called ‘open source based’ solutions, to be perfectly clear, are not open source. They are proprietary solutions which means that in order to use them, you must pay someone for the right to use it. The issue here is not that someone is trying to build a business on their own solutions but rather how they cause confusion in the market. These companies often walk a fine line when discussing or marketing their solutions by saying ‘open source’ but when the marketing layers are peeled back, you find it is a proprietary product with the same traps as traditional commercial database solutions.

The reason this distinction is so important is that more and more companies/people are looking to get away from proprietary databases today to avoid the pain of being locked into any one technology, paying too much for a commodity and having to deal with vendors that will historically raise prices as they see fit. Using a true open source database means you are not tied to any one vendor, ever. You can choose to pay for support or services at any time you like but you have the freedom to stop paying whenever you want and still continue to use the open source database. With open source based solutions, they make it seem like you are free to do the same but if you decide to stop paying for the right to use, then you must de-install or risk being out of compliance and liable. That means you are forced into a migration effort.  Who wants to be in that situation?

The solution is to skip the open source based solution altogether. Yes, the move to a pure open source database from a proprietary solution may take a little longer and thus cost a bit more but it is a one time cost and the end result: being free of technology lock-in and being free of vendor lock-in have substantial long time benefits thus you will see a return on the investment relatively quickly. Even those open source based solutions that promise you a easier migration and quicker return on investment fail to point out that the migration may still take quite some time and be difficult(so much so that it is nearly the same effort to rewrite things).

There are also a bunch of options available to you to still get support for your open source databases and you don’t risk losing the type of quality support you are accustomed to getting from a proprietary vendor.  That substantially reduces the perceived risk of using open source databases. Industry experts such as Gartner have acknowledged that open source databases have matured to the point that they are viable alternatives to commercial databases and that in the coming years, large percentages of applications will either be developed or migrated to open source databases, not open source based databases.

The choice is ultimately yours but do yourself the favor and do the appropriate due diligence and make sure all paths and associated risks are clearly understood.

My PL/pgSQL code is stuck, but where?

In the short time that I am enhancing the PL profiler (https://bitbucket.org/openscg/plprofiler) I have been asked multiple times if it can be abused as a debugging aid. Not directly. The conversation goes something like this:

Q: When my PL code is stuck somewhere, can I turn on profiling and see where?
A: No.
Q: Will this feature be added?
A: No.

Of course would that be a useful feature. I don't argue that. And it seems to be that this is precisely how Oracle users find out where their PL/SQL code gets stuck or moves like molasses. However, the reason why I am against adding this is because a profiler, or even parts of it, should not be enabled 24x7 on a production database server for the purpose of eventually using it to extract some debugging information some day. There are other ways to get that information and I will show you one of them.

If a program is stuck somewhere, one uses a debugger to get a backtrace. This works with C programs, like the database server, provided that symbol information is available. In that backtrace (and some variables) we also find the PL/pgSQL backtrace. Having symbol information available is also useful in case of a program crash, to find out why it crashed by loading the core dump into the debugger. 

Every line of PL code, that is executed, goes through the PL/pgSQL executor function exec_stmt(). At that place we find the current PL code's line in a variable. Likewise every function execution goes through either plpgsql_exec_function() or plpgsql_exec_trigger(). In those stack frames we find the OID of the function as well as its signature (name and call argument types).

Doing this eventually several times for a deeply nested PL/pgSQL program is tedious and no fun. So here is a little script called plbacktrace (https://github.com/wieck/plbacktrace.git) that does all of that. It is invoked with the PID of the PostgreSQL backend and will output information like

[postgres@db1 tmp]$ ./plbacktrace.py 13239
fn_oid=105672 lineno=5 func="life_get_cell_pl(life_board,integer,integer)"
fn_oid=105673 lineno=12 func="life_num_neighbors_pl(life_board,integer,integer)"
fn_oid=105675 lineno=11 func="life_tick_pl(life_board)"
fn_oid=105674 lineno=22 func="life_pl(integer,integer,boolean)"
 I will explain another time why I have a database server playing Conway's Game of Life written in PL/pgSQL. 

Regards, Jan


PostgreSQL Studio 2.0 is out

PostgreSQL Studio is a fully open source web based UI for PostgreSQL that is ideal for Postgres in the cloud. With the impending release of PostgreSQL 9.5, PostgreSQL Studio 2.0 has just been released supporting some of the new features in Postgres. The big new feature that requires UI changes is support for Row Level Security. With Row Level Security, policies need to be created on each table for the role accessing the data. These policies can become rather complex so seeing tied to the table can greatly help with understanding the security placed on your data.


In addition to the new features, we also updated the versions of the libraries and stopped support for JDK 6.

You can download PostgreSQL Studio at: http://www.postgresqlstudio.org/

And while I have you attention…
The call for presentations for PGConf US 2016 is now open! Submissions will be allowed through January 31st, 2016. All speakers will be notified by February 15, 2016. Please note that as with our past conferences, there will be no extensions of the submission deadline. For more information and submission details, please visit http://www.pgconf.us/2016/submit/

Synchronous Commit

Synchronous Commit

While I was at PGConf.EU a couple of weeks ago in Madrid, I attended a talk by Grant McAlister discussing Amazon RDS for PostgreSQL.  While it was interesting to see how Amazon had made it very simple for developers to get a production PostgreSQL instance quickly, the thing that really caught my eye was the performance benchmarks comparing the fsync and synchronous commit parameters.

sync_commitFrighteningly, it is not that uncommon for people to turn off fsync to get a performance gain out of their PostgreSQL database. While the performance gain is dramatic, it carries the risk that your database could become corrupt. In some cases, this may be OK, but these cases are really rather rare. A more common case is a database where it is OK to lose a little data in the event of a crash. This is where synchronous commit comes in. When synchronous commit is off, the server returns back success immediately to the client, but waits to flush the data to disk for a short period of time. When the data is ultimately flushed it is still properly sync to disk so there is no chance of data corruption. The only risk if the event of a crash is that you may lose some transactions. The default setting for this window is 200ms.

In Grant’s talk, he performed a benchmark that showed turning off synchronous commit gave a bigger performance gain than turning off fsync. He performed an insert only test so I wanted to try a standard pgbench test. I didn’t come up with the same results, but the I still saw a compelling case for leaving fsync on while turning off synchronous commit.

I ran a pgbench test with 4 clients and a scaling factor of 100 on a small EC2 instance running 9.3.5. What I saw was turning off fsync resulted in a 150% performance. Turning off synchronous commit resulted in a 128% performance gain. Both are dramatic performance gains, but the synchronous commit option has a lot less risk.


Speaking of conferences, the call for papers is open for PGConf US 2015. If there is a topic you’d like to present in New York in March, submit it here.


FTP I/O from PostgreSQL

PostgreSQL applications at times need to communicate with external servers using FTP. Oracle has the UTL_TCP package that is very useful for PL/SQL applications. PostgreSQL doesn’t have an equivalent package, but it does have something really powerful … the ability to write functions in various scripting languages. PostgreSQL has four languages that are part of the core distribution: PL/pgSQL, PL/Tcl, PL/Perl, & PL/Python. The example that I give below uses PL/Python to write to an FTP location.

Consider a scenario where you want to run a few PL/pgSQL routines to create daily data files and then place them on an FTP server for your customer. Theoretically, this is possible using external shell scripts, but you could very easily run into synchronization issues between your database’s functions and the scripts in your OS. A very easy way out is to write a small PL/Python function like the one below that gets called once your daily routines are done within the database.

    CREATE OR REPLACE FUNCTION put_file_on_ftp (file_name TEXT)
        from ftplib import FTP
        ftp = FTP('ftp.testing.com')
        ftp.login('user_name', 'password')
        ftp.storbinary('STOR my_file.bin', open(file_name)
    COST 100;

Let me explain the body of the above function line by line.

from ftplib import FTP

This line imports the FTP class from ftplib module. FTP class implements the client side of FTP protocol, which can be used in the subsequent steps of your function.

ftp = FTP('ftp.testing.com')

This returns an instance of the FTP class connected to the host listed as parameter.

ftp.login('user_name', 'password')

The instance returned by the FTP class above is being used here to actually login to the server. In case no user name and password are provided, the function would assume an anonymous login. Note that most FTP commands are allowed only after the user has successfully logged in.

Obviously you shouldn’t hard code your username and password in a function, but I am doing it here just to simplify the example.


This line switches to the target directory specified as a parameter.

ftp.storbinary('STOR my_file.bin', open(file_name)

This function stores the file specified in file_name (including complete path on the source server) on the target server as my_file.bin. The file transfer is in binary format. The first argument should always be a valid STOR command.

For more commands that you can use for FTP using PL/Python, please refer to the link: https://docs.python.org/2/library/ftplib.html.

Hope this was helpful!


Hierarchical Queries using PostgreSQL

One of the major hurdles faced when converting Oracle’s PL/SQL code to any other database is to find the equivalent of START WITH and CONNECT BY PRIOR. This SQL in Oracle provides a very powerful way of writing hierarchical queries, something that comes in really handy when dealing with hierarchical data – this is fairly common in any enterprise.

Well, there is good news. There is an equivalent available in PostgreSQL … and it is not too hard!

Basically, in order to write the equivalent query in PostgreSQL, we use the WITH RECURSIVE syntax, essentially referring the query to its own output and hence recursively iterating through the result-set.

Let me explain with the help of a simple example.

Consider a simple table ‘places’ that lists out the name of a place under column ‘place’ and where it is located under the column ‘is_in’.

example=# \d places
   Table "public.places"
 Column | Type | Modifiers
  place | text |
  is_in | text |

Consider the following data in this table:

example=# select * from places;
     place      |     is_in
 U.S.A          |
 California     | U.S.A
 Florida        | U.S.A
 Massachusetts  | U.S.A
 San Francisco  | California
 Los Angeles    | California
 San Diego      | California
 Miami          | Florida
 Orlando        | Florida
 Tampa          | Florida
 United Kingdom |
 London         | United Kingdom
 Leeds          | United Kingdom
(13 rows)

As you can see, the data is arranged in a hierarchy represented by the chart below:


An example resultset someone would want to retrieve from such data is ‘give me all the places that are in U.S.A.’. In Oracle, the query would like the following:

 SELECT op.places, op.is_in
   FROM places op
  START WITH op.is_in = 'U.S.A'
CONNECT BY PRIOR op.place = op.is_in

The PostgreSQL equivalent would be:

                       FROM places
                      WHERE place = 'U.S.A'
                      UNION ALL
                     SELECT m.place, m.is_in
                       FROM places m
                       JOIN q ON q.place = m.is_in)
        SELECT place, is_in FROM q

This gives the following result:

     place     |   is_in
 U.S.A         |
 California    | U.S.A
 Florida       | U.S.A
 Massachusetts | U.S.A
 San Francisco | California
 Los Angeles   | California
 San Diego     | California
 Miami         | Florida
 Orlando       | Florida
 Tampa         | Florida
(10 rows)

Let’s break this query down to understand what’s going on.

The general structure of a WITH RECURSIVE query is:

a)    The non-recursive term

b)    UNION or UNION ALL, depending on whether or not you want to have duplicates

c)    Recursive term containing a reference to the queries own output

Query evaluation is done as follows (taken from PostgreSQL docs):

  1. Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.
  2. So long as the working table is not empty, repeat these steps:
    1. Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.
    2. Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.

An important thing to ensure while writing hierarchical queries like this is the end state, i.e. making sure that eventually, the recursive portion of the query will not return any rows. Without this end state, the query will go into an infinite loop. To illustrate, let’s suppose in the above example, we have another rows in the ‘places’ table that says ‘California’ is in ‘San Francisco’.

example=# select * from places;
      place     |     is_in
 U.S.A          |
 California     | U.S.A
 Florida        | U.S.A
 Massachusetts  | U.S.A
 San Francisco  | California
 Los Angeles    | California
 San Diego      | California
 Miami          | Florida
 Orlando        | Florida
 Tampa          | Florida
 United Kingdom |
 London         | United Kingdom
 Leeds          | United Kingdom
 California     | San Francisco

As you can see, the data essentially says that ‘San Francisco’ is in ‘California’ and ‘California’ is in ‘San Francisco’. This data will throw our query above in an infinite loop. The simplest way to avoid such a situation is to use UNION instead of UNION ALL to join the recursive and non-recursive portions of the hierarchical query. With that change, the results are as follows:

example=# WITH RECURSIVE q AS (SELECT place, is_in
                                 FROM places
                                WHERE place = 'U.S.A'
                               SELECT m.place, m.is_in
                                 FROM places m
                                 JOIN q ON q.place = m.is_in)
                  SELECT place, is_in FROM q;
     place     |     is_in
 U.S.A         |
 California    | U.S.A
 Florida       | U.S.A
 Massachusetts | U.S.A
 San Francisco | California
 Los Angeles   | California
 San Diego     | California
 Miami         | Florida
 Orlando       | Florida
 Tampa         | Florida
 California    | San Francisco
(11 rows)

I hope you find this blog helpful!

PostgreSQL Studio 1.2 Released

I am happy to announce that version 1.2 of PostgreSQL Studio, an open source web interface for PostgreSQL, was released on February 23, 2014. This releases contains some new functionality mostly around improving SQL Worksheet as well as some bug fixes.

The changes in this release are:


  • Added Open and Save buttons to SQL Worksheet
  • Moved the close button on SQL Worksheet to the upper right corner
  • Improved the look and feel of pop ups windows
  • Auto select the first object in a schema
  • Added a limit drop down to SQL Worksheet


  • Removed extra new line characters when displaying a function
  • Added a click handler to reselect an already selected row
  • Added a scroll panel for the Data tab for wide tables

The new version is available from http://www.postgresqlstudio.org/download/

tPostgres Features Sybase & SQL Server Compatibility

We are pleased to announce version 9.3.2 of tPostgres. Find out more at http://www.tpostgres.org

The transformation of the tPostgres: Sybase & SQL Server Compatibility Project project is now
complete. We are fully based on the PGTSQL open source PostgreSQL Extension and are fully woking on
the very latest and greatest point release of PG (9.3.2).

Additionally, we have done away with the bulky and awkward 32-bit linux virtual machine. We now have safe and simple to use native command line sandbox installers for Windows, OSX & Linux.

We’ve still got documentation work to do, but there are excellent regression test transact-sql programming examples in the PGTSQL code. Please take a look for yourself and consider contributing to the project at https://bitbucket.org/openscg/pgtsql