Win-task-manager

Building PostgreSQL on Windows for Performance

Before PostgreSQL 9.1, adding additional features like data types was non-standard and awkward, but 9.1 brought extensions. By adding extensions as well as the plumbing for things like foreign data wrappers, the use of extensions have exploded over the last few years. This is great for people who are using PostgreSQL on Linux or similar type environments, but the people in Windows have been pretty much out of luck. Most people need to hope the extension they want to use is popular enough that someone skilled with Windows build environments decided to release the extension on Windows. It just was not as simple as Linux where you just run a “make install”.

For years, there has been the ability to replicate the Linux build experience on Windows with the Mingw project, but historically the resulting binaries have been significantly slower than the binaries produced by the Microsoft compiler. Since I last played with compliers on Windows, I have been pleasantly surprised that the open source build tools have caught up with the commercial tools and in some ways surpassed them.

To check the performance of the resulting binaries from the compilers, I tried a couple of CPU intensive tests. The first one was just a SELECT only pgbench test. The second was running the square roots of random numbers and then sorting the results. For the Windows binaries built with Microsoft Visual Studio, I used the PostgreSQL Windows installer from the PostgreSQL community site at http://www.postgresql.org/download/windows/. The Mingw-w64 build came from the bigsql.org site at http://www.bigsql.org/se/postgresql/packages.jsp

The version strings of the 2 binaries are below.

test=# select version();
 version
-------------------------------------------------------------
 PostgreSQL 9.5.1, compiled by Visual C++ build 1800, 64-bit
(1 row)

test=# select version();
 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 9.5.1 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit
(1 row)

For these tests, the only tuning of the postgresql.conf file was to set shared_buffers to 2GB. All other settings were left at the defaults.

The SELECT only pgbench test was conducted with the following command:

pgbench -n -S -c 10 -t 100000 -U postgres test
MSVC 34246 TPS
GCC 34706 TPS

That is a minor 1.3% performance improvement. That is not a ground breaking gain, but the gcc built binaries were consistently faster.

The second test was a bit more CPU intensive running the following query:

select sqrt((random() * g)::numeric) from generate_series(1, 1000) g order by 1;

This was put in a text file and run through pgbench with the following command:

pgbench.exe -n -f test.sql -c 8 -t 10000 -U postgres test
MSVC 610 TPS
GCC 634 TPS

This resulted in a larger performance gain of 3.8%.

I started this testing with trying to find an easier way to build extensions on Windows with hopes of not taking too much of a performance hit, but ended up finding that the easier way was actually a littler faster. I am thinking its time to ditch those proprietary build tools and stick with open source.

pgstudio-small

PostgreSQL Studio 2.0 is out

PostgreSQL Studio is a fully open source web based UI for PostgreSQL that is ideal for Postgres in the cloud. With the impending release of PostgreSQL 9.5, PostgreSQL Studio 2.0 has just been released supporting some of the new features in Postgres. The big new feature that requires UI changes is support for Row Level Security. With Row Level Security, policies need to be created on each table for the role accessing the data. These policies can become rather complex so seeing tied to the table can greatly help with understanding the security placed on your data.

rls_policy

In addition to the new features, we also updated the versions of the libraries and stopped support for JDK 6.

You can download PostgreSQL Studio at: http://www.postgresqlstudio.org/

And while I have you attention…
The call for presentations for PGConf US 2016 is now open! Submissions will be allowed through January 31st, 2016. All speakers will be notified by February 15, 2016. Please note that as with our past conferences, there will be no extensions of the submission deadline. For more information and submission details, please visit http://www.pgconf.us/2016/submit/

PL/pgSQL Profiler

PostgreSQL PL/pgSQL Profiler

Some of our customers really like writing their business logic inside of PostgreSQL. While this is really cool that PostgreSQL is capable of handling, trying to performance tune large amounts of PL/pgSQL code becomes unwieldy. If your functions are small enough, it’s possible add some logging statements, but that is not possible with hundreds or even thousands lines of legacy code.

Several years ago as part of the PL/pgSQL debugger, Korry Douglas wrote a PL/pgSQL profiler, but over the years, it seems to have suffered from bit rot. A profiler for PL/pgSQL code helps solve a lot of problems and gives us insight into how your server side code is running.

Below is an example output from the profiler showing how many times each line of code executed and what was the time taken for each line.

PL/pgSQL Profiler

 

 

The plprofiler has not been tested is many different environments yet, so be careful in rolling it out to production servers. Check it out and let me know if you find any issues.

https://bitbucket.org/openscg/plprofiler

 

Synchronous Commit

Synchronous Commit

While I was at PGConf.EU a couple of weeks ago in Madrid, I attended a talk by Grant McAlister discussing Amazon RDS for PostgreSQL.  While it was interesting to see how Amazon had made it very simple for developers to get a production PostgreSQL instance quickly, the thing that really caught my eye was the performance benchmarks comparing the fsync and synchronous commit parameters.

sync_commitFrighteningly, it is not that uncommon for people to turn off fsync to get a performance gain out of their PostgreSQL database. While the performance gain is dramatic, it carries the risk that your database could become corrupt. In some cases, this may be OK, but these cases are really rather rare. A more common case is a database where it is OK to lose a little data in the event of a crash. This is where synchronous commit comes in. When synchronous commit is off, the server returns back success immediately to the client, but waits to flush the data to disk for a short period of time. When the data is ultimately flushed it is still properly sync to disk so there is no chance of data corruption. The only risk if the event of a crash is that you may lose some transactions. The default setting for this window is 200ms.

In Grant’s talk, he performed a benchmark that showed turning off synchronous commit gave a bigger performance gain than turning off fsync. He performed an insert only test so I wanted to try a standard pgbench test. I didn’t come up with the same results, but the I still saw a compelling case for leaving fsync on while turning off synchronous commit.

I ran a pgbench test with 4 clients and a scaling factor of 100 on a small EC2 instance running 9.3.5. What I saw was turning off fsync resulted in a 150% performance. Turning off synchronous commit resulted in a 128% performance gain. Both are dramatic performance gains, but the synchronous commit option has a lot less risk.

 

Speaking of conferences, the call for papers is open for PGConf US 2015. If there is a topic you’d like to present in New York in March, submit it here.

Wildbeast

PostgreSQL Dollar Quoting

I recently attended a excellent meetup about Redshift and one of the comments by the presenter was the trouble of the running of the UNLOAD command. The trouble they were having was that the UNLOAD command takes an SQL statement as a parameter, but if that SQL statement has strings, you need to escape everything which makes it fairly unreadable.

We can see an example of this in PostgreSQL using the dblink extension:

SELECT *
  FROM dblink('dbname=postgres', 'SELECT * FROM test WHERE b = ''2014-02-02''')
    AS t(a int, b date);

Since Redshift is a derivative of PostgreSQL, the dollar quoting syntax also works. Dollar quoting is a non-standard way of denoting string constants, but it makes things much simpler to read.

SELECT *
  FROM dblink('dbname=postgres', $$ SELECT * FROM test WHERE b = '2014-02-02' $$)
    AS t(a int, b date);

Trigger Overhead (Part 2)

I found a bit more time dig into timing of triggers and their overhead so I wanted to see how much overhead the choice of procedural language affected performance. I followed the same testing methodology from my original trigger test. For this test I created an empty trigger in the following languages:

PL/pgSQL

CREATE FUNCTION empty_trigger() RETURNS trigger AS $$
BEGIN
 RETURN NEW;
END;
$$ LANGUAGE plpgsql;

C

#include "postgres.h"
#include "commands/trigger.h"
PG_MODULE_MAGIC;
Datum empty_c_trigger(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(empty_c_trigger);
Datum
empty_c_trigger(PG_FUNCTION_ARGS)
{
 TriggerData *tg;
 HeapTuple ret;
tg = (TriggerData *) (fcinfo->context);
 if (TRIGGER_FIRED_BY_UPDATE(tg->tg_event))
 ret = tg->tg_newtuple;
 else
 ret = tg->tg_trigtuple;
return PointerGetDatum(ret);
}

PL/Pythonu

CREATE FUNCTION empty_python_trigger() RETURNS trigger AS $$
return
$$ LANGUAGE plpythonu;

PL/Perl

CREATE FUNCTION empty_perl_trigger() RETURNS trigger AS $$
 return; 
$$ LANGUAGE plperl;

PL/TCL

CREATE FUNCTION empty_tcl_trigger() RETURNS trigger AS $$
 return [array get NEW]
$$ LANGUAGE pltcl;

PL/Java

package org.postgresql.pljava;
import java.sql.SQLException;
import java.sql.ResultSet;
import org.postgresql.pljava.TriggerData;
import org.postgresql.pljava.TriggerException;
public class TriggerTest {
 static void test(TriggerData td) throws SQLException {
 ResultSet _new = td.getNew();
 }
}

PL/v8

CREATE FUNCTION empty_v8_trigger() RETURNS trigger AS $$
 return NEW;
$$
LANGUAGE plv8;

PL/R

CREATE FUNCTION empty_r_trigger() RETURNS trigger AS $$
 return(pg.tg.new)
$$ LANGUAGE plr;

All of the triggers essentially return NEW so we’re basically measuring the overhead starting up the trigger function. I then timed inserting 100,000 rows with the triggers in place and compared them to inserting into a table without a trigger. Some of the timings that I found were obvious such as C being the fastest, but others were pretty surprising.

Some of the bigger things that I noticed that out of the 3 built-in higher level languages, Python has much less overhead than Perl and TCL.

The other notable point was how little overhead PL/Java had compared to the other languages. PL/Java only had more overhead than C, PL/pgSQL and PL/Python.

The moral of the story is that when writing triggers some choices matter a lot. If you’re writing a simple trigger that just ensures a column equals the current timestamp, don’t write in PL/v8 just because it cool. Use PL/pgSQL for the simple things and save the other languages for your more complex logic where the overhead of starting them up won’t be noticed.

Trigger Overhead

I recently had discussions with some folks about triggers in PostgreSQL. They had two main questions.

  1. What is the overhead of putting a trigger on a table?
  2. Should a trigger function be generic with IF statements to do different things for INSERT, UPDATE and DELETE?

So I created a simple test to verify some assumptions.

First, I created a simple table and made it UNLOGGED. I didn’t want the overhead of the WAL to possibly dwarf the timings of the triggers.

CREATE UNLOGGED TABLE trigger_test (
 key serial primary key, 
 value varchar, 
 insert_ts timestamp, 
 update_ts timestamp
);

I then create two scripts to push through pgbench and get some timings.

INSERTS.pgbench

INSERT INTO trigger_test (value) VALUES (‘hello’);
UPDATES.pgbench

\set keys :scale
\setrandom key 1 :keys
UPDATE trigger_test SET value = 'HELLO' WHERE key = :key;

I ran these with the following pgbench commands:
pgbench -n -t 100000 -f INSERTS.pgbench postgres
pgbench -n -s 100000 -t 10000 -f UPDATES.pgbench postgres

The result is that I created 100,000 rows in the test table and then randomly updated 10,000 of them. I ran these commands several times with dropping and recreating the test table between each iteration and the average tps values I was seeing were:
Inserts: 4510 tps
Updates: 4349 tps
Then to get the overhead of a trigger, I created a trigger function that just returns. I then repeated the process of running the pgbench commands as before.

CREATE FUNCTION empty_trigger() RETURNS trigger AS $$
BEGIN
 RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER empty_trigger BEFORE INSERT OR UPDATE ON trigger_test
 FOR EACH ROW EXECUTE PROCEDURE empty_trigger();

The result with the empty trigger were:
Inserts: 4296 tps
Updates: 3988 tps

That results in a 4.8% overhead for inserts and an 8.3% overhead for updates. I didn’t dig further as to why is appears that the overhead for a trigger on an update is almost twice as high as on an insert. I’ll leave that to a follow-up when I have some more time. A 4%-8% overhead of placing a trigger on a table will likely not be noticed in most real-world applications, the overhead of what is executed inside the trigger function can be noticed, which led to the next topic.
I then wanted to see the overhead of having a single trigger function versus having separate trigger functions for inserts and updates.

For a single trigger function, I used the following:

CREATE FUNCTION single_trigger() RETURNS trigger AS $$
BEGIN
 IF (TG_OP = 'INSERT') THEN
 NEW.insert_ts = CURRENT_TIMESTAMP;
 RETURN NEW;
 ELSIF (TG_OP = 'UPDATE') THEN
 NEW.update_ts = CURRENT_TIMESTAMP;
 RETURN NEW;
 END IF;
 RETURN NULL; 
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER single_trigger BEFORE INSERT OR UPDATE ON trigger_test
 FOR EACH ROW EXECUTE PROCEDURE single_trigger();

And for separate trigger functions, I used:

CREATE FUNCTION insert_trigger() RETURNS trigger AS $$
BEGIN
 NEW.insert_ts = CURRENT_TIMESTAMP;
 RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION update_trigger() RETURNS trigger AS $$
BEGIN
 NEW.update_ts = CURRENT_TIMESTAMP;
 RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER insert_trigger BEFORE INSERT ON trigger_test
 FOR EACH ROW EXECUTE PROCEDURE insert_trigger();
CREATE TRIGGER update_trigger BEFORE UPDATE ON trigger_test
 FOR EACH ROW EXECUTE PROCEDURE update_trigger();

I then reran the same process as before to see the overhead.

Single Trigger Inserts: 3569 tps
Single Trigger Updates: 3450 tps

Separate Triggers Inserts: 3623 tps
Separate Triggers Updates: 3870 tps

It turns out that splitting the trigger function into separate functions does make a difference. For the insert trigger, keeping things as a single trigger only added 1.5% to the overhead, but for the update trigger, a single trigger function added nearly 11% overhead. The is most likely due to the update case being handle second in the trigger function. That’s another thing to dig into when there is time.

 

PostgreSQL Videos

With all of the new users to PostgreSQL, one of the things we wanted to do for PGConf NYC this year was to have a video showing the history of PostgreSQL. We ended up with a very professional video showing how the features have grown over the years and by the applause of the attendees delivered on our goals. You can see it at:
https://www.youtube.com/watch?v=2RSkI9dxdbo

However, our proof of concept video was very different. Since I was done by me and my video editing skills are very limited, its not nearly as clean and professional. I also have the feeling that Jonathon Katz was afraid we’d freak out some people with the Five Finger Death Punch background music so we ended up going to more professional route. I still think it had some potential so I posted it up to YouTube:
http://youtu.be/AYn2ukNEmQM

Also, thanks to Kirk Roybal for putting together the animation of PostgreSQL source code changes over the years.
https://www.youtube.com/watch?v=HsxwNvlKZRU

 

PostgreSQL Studio is now on Heroku

One of my main objectives when writing PostgreSQL Studio was to help new users learn PostgreSQL. One of the main drivers of new PostgreSQL users is the different cloud players with Heroku being one of the biggest so it only made sense to start there. I’ve gone through the process of making PostgreSQL Studio an add-on for Heroku so now you can add it to your existing Heroku applications. This lets you work with your Heroku Postgres databases through a web-based UI, graphically see you schemas and run your queries without the need of installing anything client side. The PostgreSQL Studio add-on is currently in Beta so its free to add to your applications.

Try it out, report bugs, submit patches.

You can find more about the add-on at:
https://addons.heroku.com/pgstudio

And you can learn more about PostgreSQL Studio at:
http://www.postgresqlstudio.org/

PostgreSQL Studio 1.2 Released

I am happy to announce that version 1.2 of PostgreSQL Studio, an open source web interface for PostgreSQL, was released on February 23, 2014. This releases contains some new functionality mostly around improving SQL Worksheet as well as some bug fixes.

The changes in this release are:

Features

  • Added Open and Save buttons to SQL Worksheet
  • Moved the close button on SQL Worksheet to the upper right corner
  • Improved the look and feel of pop ups windows
  • Auto select the first object in a schema
  • Added a limit drop down to SQL Worksheet

Bugs

  • Removed extra new line characters when displaying a function
  • Added a click handler to reselect an already selected row
  • Added a scroll panel for the Data tab for wide tables

The new version is available from http://www.postgresqlstudio.org/download/