Relocatable PG RPMs

Relocatable PostgreSQL RPMs – You asked and we delivered!

In our years of dealing with PostgreSQL users, one of the main gripes they have with PostgreSQL is not a missing feature or capability but rather it is the fact that the provided rpms are non relocatable rpms. The reasons for wanting relocatable rpms are many including the company they work for has specific standards or they are installing in a lab with other installations and they want to keep their install separate or they just simple want to place things in a more logical/accessible place.

Before now, the only ways available to users to do this were to use the graphical installer or build Postgres from source.  If you are setting up a Devops environment on Linux or the cloud, using a graphical installer doesn’t work well and, let’s be honest, who wants to be pestered  with advertising while doing an installation? Building Postgres from source is definitely another option but most DBAs don’t have the time to deal with working through that process and handling any errors that might occur.  They could hand off the task to their dev or sys admin team but then they will need to wait for that to happen and/or lose control of the way it is built. And besides, you want a consistent binary build on each server in your organization, not a compiled from source version that uses different versions of libraries or config settings on different servers over time.

So, in our Postgres by BigSQL distribution, we decided to make it easy for users to install Postgres wherever they want simply by using the 100% open source rpm that we build and make available to everyone. Our rpm does not modify system libraries and by default installs to a user friendly location:


But you have total control - you simply use the —prefix <path> option to tell the package manager to install the software where you want.  For example:
# rpm -ivh --prefix /home/postgres/PG96 postgresql-9.6.0-1-x64-bigsql.rpm
warning: postgresql-9.6.0-1-x64-bigsql.rpm: Header V3 DSA/SHA1 Signature, key ID 38379aec: NOKEY
Preparing...                   ################################# [100%]
Updating / installing...
   1:postgresql96-9.6.0-1      ################################# [100%]
PostgreSQL 9.6 is now installed in /home/postgres/PG96.

To initialize, run /etc/init.d/postgresql-96 start as root user.

And there you have it….PostgreSQL installed where you want it!

Now that you have PostgreSQL installed, be sure to check out how to take the next steps and initialize the PostgreSQL database and use it here:


Why Open Source over Open Source Based?

The Postgres landscape has become very competitive in recent years as the database project has matured and grown exponentially in popularity. Since more and more users are deploying Postgres to support a wide variety of applications, there is a need for experts to help make sure those deployments are successful. This has led to the birth of a number of Postgres focused companies as well as existing companies either expanding or creating a Postgres practice. All in all, this growth of expertise is a good thing for the Postgres community. More people using it makes it a better product and when more companies are using it and hiring Postgres talent, many give back to the community thus making the original Postgres project stronger and better each year. That is one of the main benefits of open source, i.e joined forces working together for the betterment of the project. The other benefit is, of course, that is it 100% free to use.

However, there is a often misunderstood or, more accurately, misrepresented side of this story. Over the years, many folks have taken the hard work of the community and turned it into something they call ‘open source based’ solutions. There is nothing actually wrong with anyone doing this and there certainly isn’t anything that prevents this in the Postgres world. The issue at hand is how these solutions are represented. Many of these companies incorrectly market themselves as open source companies in an effort to ride the open source wave and take advantage of the market’s momentum. These so called ‘open source based’ solutions, to be perfectly clear, are not open source. They are proprietary solutions which means that in order to use them, you must pay someone for the right to use it. The issue here is not that someone is trying to build a business on their own solutions but rather how they cause confusion in the market. These companies often walk a fine line when discussing or marketing their solutions by saying ‘open source’ but when the marketing layers are peeled back, you find it is a proprietary product with the same traps as traditional commercial database solutions.

The reason this distinction is so important is that more and more companies/people are looking to get away from proprietary databases today to avoid the pain of being locked into any one technology, paying too much for a commodity and having to deal with vendors that will historically raise prices as they see fit. Using a true open source database means you are not tied to any one vendor, ever. You can choose to pay for support or services at any time you like but you have the freedom to stop paying whenever you want and still continue to use the open source database. With open source based solutions, they make it seem like you are free to do the same but if you decide to stop paying for the right to use, then you must de-install or risk being out of compliance and liable. That means you are forced into a migration effort.  Who wants to be in that situation?

The solution is to skip the open source based solution altogether. Yes, the move to a pure open source database from a proprietary solution may take a little longer and thus cost a bit more but it is a one time cost and the end result: being free of technology lock-in and being free of vendor lock-in have substantial long time benefits thus you will see a return on the investment relatively quickly. Even those open source based solutions that promise you a easier migration and quicker return on investment fail to point out that the migration may still take quite some time and be difficult(so much so that it is nearly the same effort to rewrite things).

There are also a bunch of options available to you to still get support for your open source databases and you don’t risk losing the type of quality support you are accustomed to getting from a proprietary vendor.  That substantially reduces the perceived risk of using open source databases. Industry experts such as Gartner have acknowledged that open source databases have matured to the point that they are viable alternatives to commercial databases and that in the coming years, large percentages of applications will either be developed or migrated to open source databases, not open source based databases.

The choice is ultimately yours but do yourself the favor and do the appropriate due diligence and make sure all paths and associated risks are clearly understood.

BigSQL Hadoop + Postgres now Available as a VM

With the continued success of the BigSQL developer’s bundle, we are proud to announce the availability of a new VM image with BigSQL preinstalled.  Primarily created to provide our Windows friends access to the technology bundle, the Image is based on CentOS 6.4 Desktop including many tools to begin exploring today.  Join our free Beta program now!  

We see this as a quick way to open up the platform for Windows clients while we complete our native Windows bundle distribution which is on our roadmap and should be in alpha by late Q1 of 2014.

The beta includes all of the standard BigSQL components plus:

  • Hadoop-2.2
  • Hive-0.12
  • HBase-0.96
At OpenSCG we are focused on integrating existing SQL infrastructure WITH NoSQL extensions to ensure our clients get the most out of their investment dollars.  We understand DATA and thrive on helping our clients get maximum value out of DATA!

Installing and Running BigSQL, the Hadoop and Postgres Bundle

This tutorial will show you how to install and run BigSQL. You can download the bundle from and get started using Hadoop, Hive, Hbase, and PostgreSQL in less than ten minutes. The video below goes over how to:

  • Complete the prerequisites (the commands used can be found here)
  • Start the bundle
  • Open the psql, hive, and hbase shells
  • Use the web administration console

[youtube url=””]

Subscription OpenSCG

Hadoop Tutorials Hive HBase Flume Scoop Thrift and More

Over the last several months the team at OpenSCG has created a number of Hadoop Tutorials covering key  projects in the Hadoop ecosystem including Hive, HBase, Flume, Scoop and Thrift.  These Tutorials are designed to quickly highlight the usage of Hadoop with working examples that can be quickly worked through and modified!  To help you find what you are looking for here is a quick overview of our tutorials complete with links!

As a prerequisite we use BigSQL as our developer bundle.  This can be installed on OSX and Linux in under 10 minutes and includes all the components needed for the tutorial series.

BigSQL comes with its own self contained tutorial that can be found here: BigSQL Tutorial 

A great place to start is the Hadoop Hive Tutorial Zero to Results in under 15 minutes. In addition to providing examples of Hive in action, the tutorial covers the installation of BigSql for first timers.

Interested in Hbase?  Check out our Hadoop HBase Tutorial

BigSQL bundles Postgres into the distribution.  This makes it very easy to learn Scoop and Flume!

Starting with Scoop we have  Hadoop Scoop Tutorial HDFS – Postgres made easy!

And Flume at Using Hadoop to Flume Twitter Data 

And finally we take a look at Thrift here Hadoop Thrift Tutorial

Keep checking back as we plan on continuing to create tutorials to help demystify Hadoop with real world working examples!!


Hadoop Sqoop Tutorial HDFS Hive & Postgres

The Hadoop Sqoop Tutorial is the third in the series providing easy to follow examples of using a Hadoop cluster in conjunction with Postgres.  This tutorial will build on the Hadoop Hive Tutorial which should be considered a pre requisite if you want to run the examples provided. In this tutorial, we will create a Postgres table from public domain data provided by the New York City health department.  The data has been slightly cleaned in the version attached to this blog article to save everyone time :-)   This data will then be moved to HDFS Hive using the Sqoop IMPORT command.   Finally we will run some Hive queries and Postgres queries to validate the transfer!  Assuming all of the prerequisites are in place, this should be a 15 minute exercise.

Sqoop is really not that difficult after getting your environment up and running!  The best way to learn Sqoop is to jump in and start testing out the various options and configuration setting Sqoop makes available.

Let’s Get Started!!

First though let’s just make sure that Sqoop is available and ready to go:

  • Change to the tutorial directory: (if your not already there)

$ cd tutorial

  • check the version of sqoop you are running:

$ sqoop version

Sqoop 1.4.3 git commit id 82fde21d0f6c89059bade4588f3ba6989989dc76 Compiled by hari on Tue Feb 26 11:30:27 PST 2013 

If you do not see a simular response go back to the first tutorial and ensure you have Hadoop properly installed.

Now let’s create a table in Postgres to hold the data we will eventually Import into HDFS using Sqoop:

Here is a zip file containing a subset of the data in CSV format Tutorial Data: Sample Data

  • Unzip the data in the tutorial directory.  

Use the PostgreSQL COPY command to load the data into the Postgres tables.

  • Of course we need to create the tables before using the COPY command.

CREATE  TABLE nycrestdata (
s_camis     TEXT,
s_dba       TEXT,
s_boro      TEXT,
s_building  TEXT,
s_street    TEXT,
s_zipcode   TEXT,
s_phone     TEXT,
s_cuscode   TEXT,
s_insp_date TEXT,
s_action    TEXT,
s_viocode   TEXT,
s_score     TEXT,
s_grade     TEXT,
s_g_date    TEXT,
s_r_date    TEXT

Copy, "paste into psql" and run the above code to create the table.

  • Now use the COPY command to load the text file into the table:

COPY nycrestdata FROM '/home/YOURHOME/bigsql/bigsql-9.3beta2-2/tutorial/restdata.csv' (DELIMITER ','); 

Sqoop prefers data with a Primary Key.

  • Add a PRIMARY KEY to the data:

dont forget to \q out of Postgres.

Lets verify that Sqoop can "see the table" in PostgreSQL:
$ sqoop list-tables --connect jdbc:postgresql://localhost:5432/postgres --username postgres --password password 

13/06/20 17:48:43 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 
13/06/20 17:48:43 INFO manager.SqlManager: Using default fetchSize of 1000 nycrestdata   

Great there is our table nycrestdata!   
before moving further lets quickly  create a Sqoop Options file to store the parameters we will be using repetitively. 
Here is the options file contents, create a file called sqoop_parms.txt containing this data and save it to the tutorial directory:

  • Create the Options File

# Options file for Sqoop import

# Connect parameter and value
# Username parameter and value
# Password

Now we will look at the tables in PostgreSQL again using the sqoop_parm.txt file:

sqoop list-tables --options-file /home/YOURHOME/bigsql/bigsql-9.3beta2-2/tutorial/Sqoop_parms.txt

Now let’s import the data from PostgreSQL to HDFS!

sqoop import --options-file /home/YOURHOME/bigsql/bigsql-9.3beta2-2/tutorial/Sqoop_parms.txt --table nycrestdata --hive-import

You should see the Map Reducer jobs executed, assuming success we should now have the tables in both PostgreSQL and Hadoop Hive.

Lets run some queries to ensure that the import worked and the data matches.

Lets take a quick look at the number of restaurants by each score: (Multiple scores per establishment may be in the data!)


  • Run a few  SQL  Test Queries

SELECT s_grade,count(*) FROM nycrestdata group by s_grade;

JFI a grade of Z is a temporary grade until the data is updated!  The C’s you have to look out for!!

SELECT DISTINCT s_dba from nycrestdata where s_grade LIKE '%C%';

Now lets see if the Hive data correlates!

from the hive> prompt

SELECT s_grade,count(*) FROM nycrestdata group by s_grade;

And now for the C’s again!

SELECT DISTINCT s_dba from nycrestdata where s_grade LIKE 'C';

Looks like the data matches as Expected! Our next tutorial will be extended functions of the FDW.

If you have not seen the foreign data wrapper in action check out this tutorial!


Meetup: Leveraging Hadoop in your PostgreSQL Environment

Our very own Jim “Jimbo” Mlodgenski will be hosting a Meetup Thursday June 27th at 6pm to discuss: Leveraging Hadoop in your PostgreSQL Environment!

The event will be graciously held at:
199 Lafayette Street
Suite 3B
New York, NY

To register click here


There is buzz everywhere about Apache Hadoop and rightfully so. It’s an open-source framework that enables the construction of distributed, data-intensive applications running on clusters of commodity hardware. In short, it’s letting people get real value out of massive amounts of data without spending a fortune on proprietary software or high end hardware. But where does this leave the traditional relational databases like PostgreSQL?

This talk will begin with a discussion of the strengths of PostgreSQL and Hadoop. We will then lead into a high level overview of Hadoop and its community of projects like Hive, Flume and Sqoop. Finally, we will dig down into various use cases detailing how you can leverage Hadoop technologies for your PostgreSQL databases today. The use cases will range from using HDFS for simple database backups to using PostgreSQL and Foreign Data Wrappers to do low latency analytics on your Big Data.


PostgreSQL FDW to Hadoop Hive Tutorial

PostgreSQL Foreign Data Wrapper to Hadoop Hive Tutorial 2

In our last tutorial, Hadoop Hive Tutorial – Zero to Results in 15 Minutes, we installed Postgres and Hadoop and ran some simple queries against a sample dataset directly against Hive tables.  In this tutorial we will create a FDW (Foreign Data Wrapper) in Postgres and access the same data using psql within Postgres.  If you have not completed the first tutorial, it is recommended that you at least ensure that you have set up bigsql, created the hive tables and populated the hive data.

Let’s Roll!

Verify that you have started BigSQL and ran the script.  (See the first tutorial if you need help with this step!)

The first step is to create the EXTENSION for the FDW, run the pslq interpreter and type the following:


Now we will create a SERVER using the EXTENSION we just created:

CREATE SERVER bigsql_server1 FOREIGN DATA WRAPPER hadoop_fdw
OPTIONS (url 'jdbc:hive://localhost:10000/<wbr>default');

Next we will create a USER MAPPING for the SERVER we just created:

OPTIONS (username 'user', password 'password');

Finally, we will create the FOREIGN TABLE with the mappings from the Hive data store:

 s_num       FLOAT,
 s_borough   INT,
 s_neighbor  TEXT,
 s_b_class   TEXT,
 s_c_p       TEXT,
 s_block     TEXT,
 s_lot       TEXT,
 s_easement  TEXT,
 w_c_p_2     TEXT,
 s_address   TEXT,
 s_app_num   TEXT,
 s_zip       TEXT,
 s_res_units TEXT,
 s_com_units TEXT,
 s_tot_units INT,
 s_sq_ft     FLOAT,
 s_g_sq_ft   FLOAT,
 s_yr_built  INT,
 s_tax_c     INT,
 s_b_class2  TEXT,
 s_price     FLOAT,
 s_sales_dt  TEXT)
 SERVER bigsql_server1
 OPTIONS (table 'salesdata');

Now Assuming everything went well… We should be able to run some queries against the Hive table!

SELECT * from salesdata LIMIT 5;

You should see 5 records from the table.

  • Take a quick look at the structure of the salesdata table:

\d salesdata;

postgres=# \d salesdata;

Foreign table "public.salesdata"
Column      |       Type       | Modifiers | FDW Options
s_num       | double precision |           |
s_borough   | integer          |           |
s_neighbor  | text             |           |
s_b_class   | text             |           |
s_c_p       | text             |           |
s_block     | text             |           |
s_lot       | text             |           |
s_easement  | text             |           |
w_c_p_2     | text             |           |
s_address   | text             |           |
s_app_num   | text             |           |
s_zip       | text             |           |
s_res_units | text             |           |
s_com_units | text             |           |
s_tot_units | integer          |           |
s_sq_ft     | double precision |           |
s_g_sq_ft   | double precision |           |
s_yr_built  | integer          |           |
s_tax_c     | integer          |           |
s_b_class2  | text             |           |
s_price     | double precision |           |
s_sales_dt  | text             |           |
Server: bigsql_server1
FDW Options: ("table" 'salesdata')
  • Lets run a query to see the sales by neighborhood in Manhattan;

SELECT s_neighbor, sum(s_price) FROM  salesdata GROUP BY s_neighbor;

Results should look simular to:

postgres=# SELECT s_neighbor,SUM(s_price) FROM salesdata GROUP BY s_neighbor;

         s_neighbor         |    sum
  FINANCIAL                 | 1402998865
  MANHATTAN-UNKNOWN         |          0
  LITTLE ITALY              |  250821741
  ALPHABET CITY             |  211818189
  MIDTOWN CBD               | 3203149051
  KIPS BAY                  |  422209848
  GRAMERCY                  | 1021019603
  HARLEM-UPPER              |  258439080
  JAVITS CENTER             |  148603249
  UPPER WEST SIDE (79-96)   | 1422565321
  UPPER WEST SIDE (96-116)  |  595009679
  MORNINGSIDE HEIGHTS       |   29809357
  FLATIRON                  | 1612799353
  HARLEM-CENTRAL            |  791802512
  LOWER EAST SIDE           |  658796147
  UPPER EAST SIDE (96-110)  |  210717582
  EAST VILLAGE              |  468213911
  FASHION                   | 1714396813
  CIVIC CENTER              |  612244158
  MANHATTAN VALLEY          |  220308024
  MIDTOWN EAST              | 1294451495
  GREENWICH VILLAGE-WEST    | 1670496319
  SOHO                      | 2929228590
  SOUTHBRIDGE               |  978509618
  UPPER EAST SIDE (59-79)   | 4325223036
  HARLEM-WEST               |  160041248
  CHELSEA                   | 2250264798
  CLINTON                   |  806269355
  MURRAY HILL               | 1441188759
  UPPER EAST SIDE (79-96)   | 3159172312
  CHINATOWN                 |  288838109
  INWOOD                    |  120332695
  HARLEM-EAST               |  217612503
  TRIBECA                   | 1663415012
  MIDTOWN WEST              | 6728533426
  UPPER WEST SIDE (59-79)   | 3404609800
 (39 rows)

How many properties in each community?

SELECT s_neighbor, count(s_price) FROM salesdata GROUP BY s_neighbor;

postgres=# SELECT s_neighbor, count(*) FROM salesdata GROUP BY s_neighbor;

         s_neighbor         | count
  FINANCIAL                 |   491
  MANHATTAN-UNKNOWN         |     3
  LITTLE ITALY              |   101
  ALPHABET CITY             |   128
  MIDTOWN CBD               |   235
  KIPS BAY                  |   304
  GRAMERCY                  |   504
  HARLEM-UPPER              |   161
  JAVITS CENTER             |    38
  UPPER WEST SIDE (79-96)   |  1028
  UPPER WEST SIDE (96-116)  |   461
  FLATIRON                  |   528
  HARLEM-CENTRAL            |   867
  LOWER EAST SIDE           |   439
  UPPER EAST SIDE (96-110)  |   131
  EAST VILLAGE              |   233
  FASHION                   |   249
  CIVIC CENTER              |   263
  MANHATTAN VALLEY          |   268
  MIDTOWN EAST              |  1196
  SOHO                      |   432
  SOUTHBRIDGE               |    70
  UPPER EAST SIDE (59-79)   |  2408
  HARLEM-WEST               |    67
  CHELSEA                   |   961
  CLINTON                   |   495
  MURRAY HILL               |   771
  UPPER EAST SIDE (79-96)   |  1929
  CHINATOWN                 |   175
  INWOOD                    |   117
  HARLEM-EAST               |   298
  TRIBECA                   |   694
  MIDTOWN WEST              |  5856
  UPPER WEST SIDE (59-79)   |  1985
 (39 rows)

You will notice the results match the results the queries run directly in Hive. In our next tutorial we will add some additional data, create some cross platform joins and continue to exercise BigSQL.
For more information on BigSQL, click here.


Hadoop Hive Tutorial – Zero to Results in 15 Minutes

This Tutorial will walk through using Hadoop’s Hive to access data stored in HDFS.  This is the first tutorial (more to follow) and covers installing Hadoop as part of BigSQL and running some simple queries.

The Platform:  Linux (RHEL, CentOS or Ubuntu) or Mac OSX.  The Hadoop and Postgres Install was from  Takes about 10 – 15 minutes to install the developer bundle and get up and running.  ( dont skip the pre-requisites  ).

The Data: The original source data is from the NYC finance department and contains Rolling Sales files for the last 12 months in New York City.   The data for Manhattan was used in the tutorial.   A slightly cleaned file that can be used for the tutorial is available below.

Let’s get started!

  • If you have not previously installed bigsql,  download the bundle and tar the file.  Make sure to use the exact filename downloaded:

tar -xf bigsql-9.3xxx-linux64.tar.bz2

  • navigate to the home directory of your BigSQL install.  

I have installed BigSQL in the following directory:  /home/cgerth/bigsql/bigsql-9.3beta1-7   simply  cd to this directory

  • start BigSQL

./bigsql start

  • execute the environment script


  • create a tutorial directory, change to that directory

mkdir tutorial

cd tutorial

  • put the following three files in the tutorial directory: ex2data.csv, ex1createHiveTable.sql and ex1loadHiveTables.sql .  The files are zipped here

use the tool of your choice!

  • create the hive table structure.  Take a look at the ex1createHiveTable.sql script which is used to create the Hive Table.

s_num FLOAT,
s_borough INT,
s_neighbor STRING,
s_b_class STRING,
s_c_p STRING,
s_block STRING,
s_lot STRING,
s_easement STRING,
w_c_p_2 STRING,
s_address STRING,
s_app_num STRING,
s_zip STRING,
s_res_units STRING,
s_com_units STRING,
s_tot_units INT,
s_sq_ft FLOAT,
s_g_sq_ft FLOAT,
s_yr_built INT,
s_tax_c INT,
s_b_class2 STRING,
s_price FLOAT,
s_sales_dt STRING


$ hive -v -f ex1createHiveTables.sql

  • load the datafile into the structure created.  The ex1loadHiveTable.sql script can be used for this.  Take a look at the script before running.


hive -v -f ex1loadHiveTables.sql

  • Run a quick query to test the data load.  Run the following command and you should see the 5 records from the table.


SELECT * from salesdata LIMIT 5;

You should see 5 records from the table.

  • Take a quick look at the hive table created.

DESCRIBE salesdata;

col_name    data_type    comment
s_num    float    
s_borough    int    
s_neighbor    string    
s_b_class    string    
s_c_p    string    
s_block    string    
s_lot    string    
s_easement    string    
w_c_p_2    string    
s_address    string    
s_app_num    string    
s_zip    string    
s_res_units    string    
s_com_units    string    
s_tot_units    int    
s_sq_ft    float    
s_g_sq_ft    float    
s_yr_built    int    
s_tax_c    int    
s_b_class2    string    
s_price    float    
s_sales_dt    string    
Time taken: 0.194 seconds
  • Lets run a query to see the sales by neighborhood in Manhattan;

SELECT s_neighbor, sum(s_price) FROM  salesdata GROUP BY s_neighbor;

Results should look simular to:
ALPHABET CITY 2.11818189E8
CHELSEA 2.250264798E9
CHINATOWN 2.88838109E8
CIVIC CENTER 6.12244158E8
CLINTON 8.06269355E8
EAST VILLAGE 4.68213911E8
FASHION 1.714396813E9
FINANCIAL 1.402998865E9
FLATIRON 1.612799353E9
GRAMERCY 1.021019603E9
HARLEM-EAST 2.17612503E8
HARLEM-UPPER 2.5843908E8
HARLEM-WEST 1.60041248E8
INWOOD 1.20332695E8
JAVITS CENTER 1.48603249E8
KIPS BAY 4.22209848E8
LITTLE ITALY 2.50821741E8
LOWER EAST SIDE 6.58796147E8
MIDTOWN CBD 3.203149051E9
MIDTOWN EAST 1.294451495E9
MIDTOWN WEST 6.728533424E9
MURRAY HILL 1.441188759E9
SOHO 2.929228594E9
SOUTHBRIDGE 9.78509618E8
TRIBECA 1.663415012E9
UPPER EAST SIDE (59-79) 4.325223036E9
UPPER EAST SIDE (79-96) 3.159172312E9
UPPER EAST SIDE (96-110) 2.10717582E8
UPPER WEST SIDE (59-79) 3.4046098E9
UPPER WEST SIDE (79-96) 1.422565321E9
UPPER WEST SIDE (96-116) 5.95009679E8
Time taken: 3.656 seconds
  • How many properties in each community?

SELECT s_neighbor, count(s_price) FROM salesdata GROUP BY s_neighbor;

ALPHABET CITY                128
 CHELSEA                      961
 CHINATOWN                    175
 CIVIC CENTER                 263
 CLINTON                      495
 EAST VILLAGE                 233
 FASHION                      249
 FINANCIAL                    491
 FLATIRON                     528
 GRAMERCY                     504
 HARLEM-CENTRAL               867
 HARLEM-EAST                  298
 HARLEM-UPPER                 161
 HARLEM-WEST                  67
 INWOOD                       117
 JAVITS CENTER                38
 KIPS BAY                     304
 LITTLE ITALY                 101
 LOWER EAST SIDE              439
 MANHATTAN VALLEY             268
 MIDTOWN CBD                  235
 MIDTOWN EAST                 1196
 MIDTOWN WEST                 5856
 MURRAY HILL                  771
 SOHO                         432
 SOUTHBRIDGE                  70
 TRIBECA                      694
 UPPER EAST SIDE (59-79)      2408
 UPPER EAST SIDE (79-96)      1929
 UPPER EAST SIDE (96-110)     131
 UPPER WEST SIDE (59-79)      1985
 UPPER WEST SIDE (79-96)      1028
 UPPER WEST SIDE (96-116)     461
 Time taken: 3.79 seconds
  • Interesting that UNKNOW had 3 transactions but no $ volume, lets take a closer look:

SELECT s_price FROM salesdata WHERE s_neighbor LIKE  ‘%UNKNOWN%’

Mapred Local Task Succeeded . Convert the Join into MapJoin
 Time taken: 3.479 seconds

Yes, as expected they are all zero value transactions.

This completes the first starting Hive tutorial.

The next Tutorial in the series is PostgreSQL FDW to Hive  .

Please share feedback and look for more tutorials moving forward.

Click here for more information on BigSql!