Win-task-manager

Building PostgreSQL on Windows for Performance

Before PostgreSQL 9.1, adding additional features like data types was non-standard and awkward, but 9.1 brought extensions. By adding extensions as well as the plumbing for things like foreign data wrappers, the use of extensions have exploded over the last few years. This is great for people who are using PostgreSQL on Linux or similar type environments, but the people in Windows have been pretty much out of luck. Most people need to hope the extension they want to use is popular enough that someone skilled with Windows build environments decided to release the extension on Windows. It just was not as simple as Linux where you just run a “make install”.

For years, there has been the ability to replicate the Linux build experience on Windows with the Mingw project, but historically the resulting binaries have been significantly slower than the binaries produced by the Microsoft compiler. Since I last played with compliers on Windows, I have been pleasantly surprised that the open source build tools have caught up with the commercial tools and in some ways surpassed them.

To check the performance of the resulting binaries from the compilers, I tried a couple of CPU intensive tests. The first one was just a SELECT only pgbench test. The second was running the square roots of random numbers and then sorting the results. For the Windows binaries built with Microsoft Visual Studio, I used the PostgreSQL Windows installer from the PostgreSQL community site at http://www.postgresql.org/download/windows/. The Mingw-w64 build came from the bigsql.org site at http://www.bigsql.org/se/postgresql/packages.jsp

The version strings of the 2 binaries are below.

test=# select version();
 version
-------------------------------------------------------------
 PostgreSQL 9.5.1, compiled by Visual C++ build 1800, 64-bit
(1 row)

test=# select version();
 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 9.5.1 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit
(1 row)

For these tests, the only tuning of the postgresql.conf file was to set shared_buffers to 2GB. All other settings were left at the defaults.

The SELECT only pgbench test was conducted with the following command:

pgbench -n -S -c 10 -t 100000 -U postgres test
MSVC 34246 TPS
GCC 34706 TPS

That is a minor 1.3% performance improvement. That is not a ground breaking gain, but the gcc built binaries were consistently faster.

The second test was a bit more CPU intensive running the following query:

select sqrt((random() * g)::numeric) from generate_series(1, 1000) g order by 1;

This was put in a text file and run through pgbench with the following command:

pgbench.exe -n -f test.sql -c 8 -t 10000 -U postgres test
MSVC 610 TPS
GCC 634 TPS

This resulted in a larger performance gain of 3.8%.

I started this testing with trying to find an easier way to build extensions on Windows with hopes of not taking too much of a performance hit, but ended up finding that the easier way was actually a littler faster. I am thinking its time to ditch those proprietary build tools and stick with open source.

Video on FDW by Jim

Video – Behold! The Power of Foreign Data Wrappers!

If you’ve never heard of PostgreSQL’s “foreign data wrappers” — you’ve been missing out on a very powerful tool to connect and integrate heterogenous databases. By leveraging FDW’s, you can put PostgreSQL at the center of a integrated database federation and read and write data from other PostgreSQL servers, other relational databases (Oracle, MySQL), many NoSQL systems (MongoDB, Hadoop) and even Twitter!

Heck – FDW’s are so powerful that MongoDB just released their BI connector as a PostgreSQL FDW!! (excellent write up by John De Goes of SlamData)

At this year’s PostgresOpen, Jim Mlodgenski from OpenSCG gave a great talk about Federated PostgreSQL using FDW’s. Watch this video to learn the basics of what FDW’s are, how they work, and how you can easily combine data from PostgreSQL, other DBMS’s, and NoSQL db’s all in a single SQL query.

Thanks to Daily Tech Video for highlighting this talk in their post [Video 378] Jim Mlodgenski: PostgreSQL Federation.

pgstudio-small

PostgreSQL Studio 2.0 is out

PostgreSQL Studio is a fully open source web based UI for PostgreSQL that is ideal for Postgres in the cloud. With the impending release of PostgreSQL 9.5, PostgreSQL Studio 2.0 has just been released supporting some of the new features in Postgres. The big new feature that requires UI changes is support for Row Level Security. With Row Level Security, policies need to be created on each table for the role accessing the data. These policies can become rather complex so seeing tied to the table can greatly help with understanding the security placed on your data.

rls_policy

In addition to the new features, we also updated the versions of the libraries and stopped support for JDK 6.

You can download PostgreSQL Studio at: http://www.postgresqlstudio.org/

And while I have you attention…
The call for presentations for PGConf US 2016 is now open! Submissions will be allowed through January 31st, 2016. All speakers will be notified by February 15, 2016. Please note that as with our past conferences, there will be no extensions of the submission deadline. For more information and submission details, please visit http://www.pgconf.us/2016/submit/

PL/pgSQL Profiler

PostgreSQL PL/pgSQL Profiler

Some of our customers really like writing their business logic inside of PostgreSQL. While this is really cool that PostgreSQL is capable of handling, trying to performance tune large amounts of PL/pgSQL code becomes unwieldy. If your functions are small enough, it’s possible add some logging statements, but that is not possible with hundreds or even thousands lines of legacy code.

Several years ago as part of the PL/pgSQL debugger, Korry Douglas wrote a PL/pgSQL profiler, but over the years, it seems to have suffered from bit rot. A profiler for PL/pgSQL code helps solve a lot of problems and gives us insight into how your server side code is running.

Below is an example output from the profiler showing how many times each line of code executed and what was the time taken for each line.

PL/pgSQL Profiler

 

 

The plprofiler has not been tested is many different environments yet, so be careful in rolling it out to production servers. Check it out and let me know if you find any issues.

https://bitbucket.org/openscg/plprofiler

 

Synchronous Commit

Synchronous Commit

While I was at PGConf.EU a couple of weeks ago in Madrid, I attended a talk by Grant McAlister discussing Amazon RDS for PostgreSQL.  While it was interesting to see how Amazon had made it very simple for developers to get a production PostgreSQL instance quickly, the thing that really caught my eye was the performance benchmarks comparing the fsync and synchronous commit parameters.

sync_commitFrighteningly, it is not that uncommon for people to turn off fsync to get a performance gain out of their PostgreSQL database. While the performance gain is dramatic, it carries the risk that your database could become corrupt. In some cases, this may be OK, but these cases are really rather rare. A more common case is a database where it is OK to lose a little data in the event of a crash. This is where synchronous commit comes in. When synchronous commit is off, the server returns back success immediately to the client, but waits to flush the data to disk for a short period of time. When the data is ultimately flushed it is still properly sync to disk so there is no chance of data corruption. The only risk if the event of a crash is that you may lose some transactions. The default setting for this window is 200ms.

In Grant’s talk, he performed a benchmark that showed turning off synchronous commit gave a bigger performance gain than turning off fsync. He performed an insert only test so I wanted to try a standard pgbench test. I didn’t come up with the same results, but the I still saw a compelling case for leaving fsync on while turning off synchronous commit.

I ran a pgbench test with 4 clients and a scaling factor of 100 on a small EC2 instance running 9.3.5. What I saw was turning off fsync resulted in a 150% performance. Turning off synchronous commit resulted in a 128% performance gain. Both are dramatic performance gains, but the synchronous commit option has a lot less risk.

 

Speaking of conferences, the call for papers is open for PGConf US 2015. If there is a topic you’d like to present in New York in March, submit it here.

stinger

Stinger Delivers Fast Hive SQL via YARN & TEZ




The latest BigSQL bundle now includes Stinger and Tez to dramatically improve performance and extend the SQL dialect. In the latest distribution of BigSQL Release 2, it is extremely easy to get started exploring Tez and Stinger in your own sandbox environment. Taking advantage of YARN, Stinger is now delivering on its three main goals of Speed, SQL Enhancememnts & Scale..

 

SPEED

The Stinger initiative (Hive 0.13 & Tez-0.3) calls for a 100X performance over current Hive (Hive 0.10) batch queries. Key developments helping to achieve these results are:

  • Leveraging ORC Files these columnar formats allow for faster read, writes and processing of Hive data. The format supports predicate pushdowns, compression and even allows light-weight indexes to be stored within the files.
  • Continued Optimizations in the Hive Query Engine Continuous enhancements include avoiding unneeded writes, support for hot containers (saves launch shutdown overhead), in memory cache, leveraging wider Metastore statistics have all helped improve latency and throughput.

 

SQL ENHANCEMENTS

Full support DECIMAL, TIMESTAMP, DATE & VARCHAR datatypes as well as a rich set of semantic improvements.

 

Scale

Designed for large scale data environments, Stinger’s goal is to deliver even when you are analyzing Petabytes of data.

 

Stinger in Action!

Below is a screenshot from the BigSQL YARN console showing the execution of a Tez job.

At OpenSCG we focus on helping you get the most value and leverage out of your data environments. Its an exciting time in DATA and we thrive on helping our clients understand the latest changes in data technology. We realize that there are great opportunities in new NoSQL and clustered data solutions and take pride in helping you evaluate your data options.
cassandraeye2

BigSQL Adds Cassandra to the Hadoop and PostgreSQL Bundle

The BigSQL developer’s bundle has added a significant new component: Cassandra 2.0.4.  All of the bundles and VM’s now allow developers to get up and running with the latest Hadoop ecosystem and Cassandra.   Join our free Beta program now!  

As a significant number of our clients are leveraging Cassandra for Highly Available Multi Partition solutions, it was a natural fit to add to our bundle.    

In addition to Cassandra the latest beta includes all of the standard BigSQL components and many enhancements:

  • Continued HadoopFDW development (Postgres  – Hadoop Foreign data wrapper) fixes and improvements
  • JDK 7 is now a pre-req on all platforms
  • upgrade PostgreSQL from 9.3.1 to 9.3.2
  • upgrade DataFu from 1.1.1 to 1.2.0
  • upgrade PostgresqlStudio from 1.0 to 1.1
  • upgrade Hue from 3.0.0 to 3.5.
  • autostart Oozie & Hue in the vm
  • upgrade Sqoop from 1.4.4 to 1.99.3 (and autostart it)
  • include pgTSQL functionality from tpostgresq.org 
  • compact our quickstart virtual machine from 3 GB to 1.5 GB
  • configure eclipse for hadoop in the vm
At OpenSCG we are focus on helping you get the most value and leverage out of your data environments.  Its an exciting time in DATA and we thrive on helping our clients understand the latest changes in data technology.  We realize that there are great opportunities in new NOSQL and clustered data solutions and take pride in helping you evaluate your data options.

BigSQL Hadoop + Postgres now Available as a VM

With the continued success of the BigSQL developer’s bundle, we are proud to announce the availability of a new VM image with BigSQL preinstalled.  Primarily created to provide our Windows friends access to the technology bundle, the Image is based on CentOS 6.4 Desktop including many tools to begin exploring today.  Join our free Beta program now!  

We see this as a quick way to open up the platform for Windows clients while we complete our native Windows bundle distribution which is on our roadmap and should be in alpha by late Q1 of 2014.

The beta includes all of the standard BigSQL components plus:

  • Hadoop-2.2
  • Hive-0.12
  • HBase-0.96
At OpenSCG we are focused on integrating existing SQL infrastructure WITH NoSQL extensions to ensure our clients get the most out of their investment dollars.  We understand DATA and thrive on helping our clients get maximum value out of DATA!
Slide09

Installing and Running BigSQL, the Hadoop and Postgres Bundle

This tutorial will show you how to install and run BigSQL. You can download the bundle from BigSQL.org and get started using Hadoop, Hive, Hbase, and PostgreSQL in less than ten minutes. The video below goes over how to:

  • Complete the prerequisites (the commands used can be found here)
  • Start the bundle
  • Open the psql, hive, and hbase shells
  • Use the web administration console

[youtube url=”http://www.youtube.com/watch?v=y6hG4QM24QE”]

header

JasperReports with Hadoop – Tutorial

This tutorial covers step by step instruction on running Jasper Reports with Hadoop using Jaspersoft Studio easy and simple to use environment . Also, publishing the created sample report with Jasperserver that is connected to Jaspersoft Studio .

1. Prerequisites

a) BigSQL Tutorial is required to be followed till Step 3 http://www.bigsql.org/se/tutorial.jsp

Step 1 : Creating example.customer_history in HIVE and loading values.
Step 2 : Creating example.customer_history as foreign table in Postgres
Step 3 : Verifying example.customer_history

2. Things required to be downloaded

a) Download Jaspersoft Studio for your OS version from http://community.jaspersoft.com/project/jaspersoft-studio/releases

  • Select your OS version from Link column

  • After downloading completes , install the Studio

b) Download JasperReports Server from http://community.jaspersoft.com/project/jasperreports-server/releases for your OS version

c) Download JDBC file from http://jdbc.postgresql.org/download/postgresql-9.2-1003.jdbc4.jar

3. Simple Data Adaptor configuration

a) Verify if BIGSQL is running

$ ./bigsql status
#################################################
# BIGSQL:- your BIGSQL version
# TIME: current time
# HOST: your hostname
# JAVA: path to your JAVA_HOME
# OS: your OS
#################################################

## Checking critical ports ################
Postgres port 5432 is busy.
ZooKeeper port 2181 is busy.
HDFS port 50070 is busy.
HBase port 60010 is busy.
Hive port 9083 is busy.
Tomcat port 8080 is busy.

b) Run the installed Jaspersoft Studio instance

1) For DataAdapter , click Under Repositor Tab → Right click on Data Adaptors → Create New Data Adaptor → Next → Database JDBC Connection → Next

2) Database Location

    • Enter Name “BigSQL”
    • JDBC Driver : org.postgresql.Driver
    • JDBC Url : jdbc:postgresql://localhost:5432/postgres
    • Username : postgres
    • Password : password

3) Driver Classpath

    • Click on Add Button → Select Driver Path → Click Open

c) Click Finish Button to add the Data Adaptor

4. Easy Report Creation method

a) Create a new Project

  • File → New → Project → JasperReports Project .

b) Create New Jasper Reports

  • File → New → Jasper Report → Blank A4 → Enter Name: bigsql.jrxml → Finish.

c) Select created report and open “source” view .

  • Select all and Delete selected.

d) Hadoop JasperTest Report

“copy the below report in xml and below paste it in the ‘source’ tab of the created report”

<?xml version="1.0" encoding="UTF-8"?>
<!-- Created with Jaspersoft Studio version last-->
<jasperReportxmlns="http://jasperreports.sourceforge.net/jasperreports"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://jasperreports.sourceforge.net/jasperreports http://jasperreports.sourceforge.net/xsd/jasperreport.xsd"name="BigSQL"language="groovy"pageWidth="595"pageHeight="842"columnWidth="535"leftMargin="20"rightMargin="20"topMargin="20"bottomMargin="20"uuid="73c28c9e-7a45-457e-bb9c-d3b4a665aec8">
<propertyname="com.jaspersoft.studio.data.defaultdataadapter"value="BigSQL"/>
<stylename="Title"fontName="Times New Roman"fontSize="50"isBold="true"pdfFontName="Times-Bold"/>
<stylename="SubTitle"forecolor="#736343"fontName="Arial"fontSize="18"/>
<stylename="Column header"forecolor="#666666"fontName="Arial"fontSize="12"isBold="true"/>
<stylename="Detail"fontName="Arial"fontSize="12"/>
<stylename="Row"mode="Transparent">
<conditionalStyle>
<conditionExpression><![CDATA[$V{REPORT_COUNT}%2 == 0]]></conditionExpression>
<stylebackcolor="#E6DAC3"/>
</conditionalStyle>
</style>
<queryString>
<![CDATA[Select * from example.customer_history;]]>
</queryString>
<fieldname="hist_id"class="java.lang.Integer"/>
<fieldname="h_date"class="java.sql.Timestamp"/>
<fieldname="h_amount"class="java.math.BigDecimal"/>
<fieldname="h_data"class="java.lang.String"/>
<background>
<bandsplitType="Stretch"/>
</background>
<title>
<bandheight="136"splitType="Stretch">
<staticText>
<reportElementuuid="3cbe62a2-fc4f-4470-8a41-04ba5a76f4ff"style="Title"x="170"y="40"width="263"height="62"/>
<textElementverticalAlignment="Middle">
<fontpdfFontName="Times-Roman"/>
</textElement>
<text><![CDATA[BigSQL Report]]></text>
</staticText>
<staticText>
<reportElementuuid="9c023796-93e4-4fbf-a609-9537018b189c"style="SubTitle"x="350"y="50"width="196"height="22"/>
<textElement>
<fontfontName="Times New Roman"pdfFontName="Times-Roman"/>
</textElement>
<text><![CDATA[with Jasper Reports]]></text>
</staticText>
<image>
<reportElementuuid="805e8dcf-c46e-49df-9647-4f14646c972d"x="-1"y="0"width="161"height="110"/>
<imageExpression><![CDATA["/Users/alisaggu/Desktop/logo10.png"]]></imageExpression>
</image>
</band>
</title>
<pageHeader>
<bandsplitType="Stretch"/>
</pageHeader>
<columnHeader>
<bandheight="16"splitType="Stretch">
<line>
<reportElementuuid="26e9c0f5-5aea-4bd6-848c-8f9820d2cb6c"positionType="FixRelativeToBottom"x="0"y="15"width="555"height="1"/>
<graphicElement>
<penlineWidth="0.5"lineColor="#999999"/>
</graphicElement>
</line>
<staticText>
<reportElementuuid="ad2d59e2-f15f-4475-9fc4-8068d571e552"style="Column header"x="0"y="0"width="138"height="15"forecolor="#736343"/>
<textElement/>
<text><![CDATA[Hist Id]]></text>
</staticText>
<staticText>
<reportElementuuid="90d84a3f-e23f-4090-9db8-42fca0ced0ab"style="Column header"x="138"y="0"width="138"height="15"forecolor="#736343"/>
<textElement/>
<text><![CDATA[Date Due]]></text>
</staticText>
<staticText>
<reportElementuuid="50862542-096a-4072-8665-88c1238fa7c5"style="Column header"x="276"y="0"width="138"height="15"forecolor="#736343"/>
<textElement/>
<text><![CDATA[Total Amount]]></text>
</staticText>
<staticText>
<reportElementuuid="8e8735a4-a896-4eef-876a-0472fabf5493"style="Column header"x="414"y="0"width="138"height="15"forecolor="#736343"/>
<textElement/>
<text><![CDATA[Raw Data]]></text>
</staticText>
</band>
</columnHeader>
<detail>
<bandheight="15"splitType="Stretch">
<frame>
<reportElementuuid="840d241f-46d5-4267-abf2-fbfabe77a1ba"style="Row"mode="Opaque"x="0"y="0"width="555"height="15"/>
<textFieldisStretchWithOverflow="true">
<reportElementuuid="904e88f2-9f49-494c-9e20-7a0f2a66c46e"style="Detail"x="0"y="0"width="138"height="15"/>
<textElement/>
<textFieldExpression><![CDATA[$F{hist_id}]]></textFieldExpression>
</textField>
<textFieldisStretchWithOverflow="true">
<reportElementuuid="745e034c-c2b7-406d-97b4-19f6a5a1f2ec"style="Detail"x="138"y="0"width="138"height="15"/>
<textElement/>
<textFieldExpression><![CDATA[$F{h_date}]]></textFieldExpression>
</textField>
<textFieldisStretchWithOverflow="true">
<reportElementuuid="c174e25f-9c68-4386-8968-1bda99810e3e"style="Detail"x="276"y="0"width="138"height="15"/>
<textElement/>
<textFieldExpression><![CDATA[$F{h_amount}]]></textFieldExpression>
</textField>
<textFieldisStretchWithOverflow="true">
<reportElementuuid="aa181d78-3fb6-47dd-ae2f-bc1aafba23ae"style="Detail"x="414"y="0"width="138"height="15"/>
<textElement/>
<textFieldExpression><![CDATA[$F{h_data}]]></textFieldExpression>
</textField>
</frame>
</band>
</detail>
<columnFooter>
<bandheight="45"splitType="Stretch">
<line>
<reportElementuuid="7c17f014-3c79-4265-b557-b38cf17c5f0f"positionType="FixRelativeToBottom"x="0"y="3"width="555"height="1"/>
<graphicElement>
<penlineWidth="0.5"lineColor="#999999"/>
</graphicElement>
</line>
</band>
</columnFooter>
<pageFooter>
<bandheight="25"splitType="Stretch">
<frame>
<reportElementuuid="1f9a92db-af7d-429a-ba60-a43bd78a783e"mode="Opaque"x="-21"y="1"width="597"height="24"forecolor="#D0B48E"backcolor="#F2EBDF"/>
<textFieldevaluationTime="Report">
<reportElementuuid="17fc328e-c16b-477d-a061-a733a9167907"style="Column header"x="533"y="0"width="40"height="20"forecolor="#736343"/>
<textElementverticalAlignment="Middle">
<fontsize="10"isBold="false"/>
</textElement>
<textFieldExpression><![CDATA[" " + $V{PAGE_NUMBER}]]></textFieldExpression>
</textField>
<textField>
<reportElementuuid="8087e1e6-d232-4090-97c3-72d455154cab"style="Column header"x="453"y="0"width="80"height="20"forecolor="#736343"/>
<textElementtextAlignment="Right"verticalAlignment="Middle">
<fontsize="10"isBold="false"/>
</textElement>
<textFieldExpression><![CDATA["Page "+$V{PAGE_NUMBER}+" of"]]></textFieldExpression>
</textField>
<textFieldpattern="EEEEE dd MMMMM yyyy">
<reportElementuuid="2a3dc088-bcb4-4488-8bc8-7809bdaf9901"style="Column header"x="22"y="1"width="197"height="20"forecolor="#736343"/>
<textElementverticalAlignment="Middle">
<fontsize="10"isBold="false"/>
</textElement>
<textFieldExpression><![CDATA[new java.util.Date()]]></textFieldExpression>
</textField>
</frame>
</band>
</pageFooter>
<summary>
<bandsplitType="Stretch"/>
</summary>
</jasperReport>

5. Preview your created report

Save and click on Preview to view the report .

6. Finally , Publishing the Report

a) Install the downloaded JasperReports Server and launch it .

Login the web interface with admin account .
username : jasperadmin
password : jasperadmin

b) Create a new JasperReports Server connection under repository tab in Jaspersoft Studio

c) A dialog to insert the server data appears. Fill it in as follows:

  • Name: the name of the connection. You can use any name you want. For this example we will leave the default name: JasperReports Server.
  • URL: the address of the server. The default address is already correct if we are using a local server. For this example the correct address is http://localhost:8080/jasperserver/
  • User: the username to access the server. The default for the local server is “jasperadmin”.
  • Password: as with the username, for the local server by default it is “jasperadmin”.

Then click the Test Connection button to test the connection. If everything is working, click Finish.

d) Open the report and click the button with a blue arrow in the upper-right corner of the designer. In the opened window you can browse the server directory structure to choose where to place the report. Select the Reports folder and name the report Then click Next.

e) Create the DataSource from Local Data source.

  • Select Datasource JDBC , click NEXT
  • Enter name and id then click next
  • Click Import from Jaspersoft Studio and Select BigSQL -> JDBC connection and click finish to create the datasource

f) On the web interface logged on with admin account , click on the folder Reports .

g) click on BigSQL published report to view it in the next screen