How to install and configure the Oracle Foreign Data Wrapper

How to install and configure the Oracle Foreign Data Wrapper

Preinstall Checklist:

   – If you have not already, please download and install “Microsoft Visual C++ 2015 Redistributal” and Java Runtime Environment (version 7 or 8)

Oracle Instant Client install and config:

          01. Download and unzip into the same directory:

                   instantclient-basic-windows

                   instantclient-odbc-windows

                   instantclient-sqlplus-windows

          02. Create “Oracle Instant Client” in Program Files and copy instantclient_12_1 into it.

          03. Double click on “odbc_install.exe”

          04. Create system variables:

                   name: TNS_ADMIN

                   value: C:\Program Files\Oracle Instant Client\instantclient_12_1

                   name: ORACLE_HOME

                   value: C:\Program Files\Oracle Instant Client\instantclient_12_1

          05. Edit a system variable:

                   name: PATH

                   value: C:\Program Files\Oracle Instant Client\instantclient_12_1

          06. (optional). Attempt to run “sqlplus” from the command line. If all is properly configured, you should be prompted to enter a username/password. You can exit “sqlplus” at this point if you                  wish and continue with this guide.

NOTE 1: For Windows 7, to edit system variables, right click on “My Computer”, then click on “Advanced System Settings” in the left of the screen. A popup window titled “System Properties” will appear. Click on the “Advanced” tab and look to the bottom of the window. Above “OK” and “Cancel”, there will be an option titled “Environment Variables”. Click on it and find the pane “System Variables” in the newly created popup window. Here you can create and edit your system variables.

NOTE 2: If you recieve an error concerning “MSvcr100.dll” being missing, it may be necesarry to copy this dll out of your /Java/VERSION/bin folder and into your oracle instant client directory

Assuming PostgreSQL has previously been installed and properly configured,

Configure the Oracle FDW

          01. Open up a PSQL shell, and enter:

                   “CREATE EXTENSION oracle_fdw;”

          Expected Output: CREATE EXTENSION

Example configuration

This instructions below are an example of how to setup PostgreSQL to connect to an oracle database through the Oracle Foreign Data Wrapper. Your specific settings will be different, but the syntax should remain the same. For our purposes, I will show you how to connect to an oracle database with the “SCOTT” schema. This installed. This part of the how-to assumes your oracle database is already set up and configured to accept remote database connections.

          01. CREATE SCHEMA scott;

                   Expected Output: CREATE SCHEMA

          02. CREATE SERVER oracle FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver ‘//192.168.1.254/YOURSERVERNAME’ );

                   Expected Output: CREATE SERVER

          03. CREATE USER MAPPING FOR postgres SERVER oracle OPTIONS (user ‘SCOTT’, password ‘tiger’);

                   Expected Output: CREATE USER MAPPING

          04. IMPORT FOREIGN SCHEMA “SCOTT” from SERVER oracle INTO scott;

                   Expected Output: IMPORT FOREIGN SCHEMA

          05. set search_path=’scott';

                   Expected Output: SET

Now we can explore the imported data

Win-task-manager

Building PostgreSQL on Windows for Performance

Before PostgreSQL 9.1, adding additional features like data types was non-standard and awkward, but 9.1 brought extensions. By adding extensions as well as the plumbing for things like foreign data wrappers, the use of extensions have exploded over the last few years. This is great for people who are using PostgreSQL on Linux or similar type environments, but the people in Windows have been pretty much out of luck. Most people need to hope the extension they want to use is popular enough that someone skilled with Windows build environments decided to release the extension on Windows. It just was not as simple as Linux where you just run a “make install”.

For years, there has been the ability to replicate the Linux build experience on Windows with the Mingw project, but historically the resulting binaries have been significantly slower than the binaries produced by the Microsoft compiler. Since I last played with compliers on Windows, I have been pleasantly surprised that the open source build tools have caught up with the commercial tools and in some ways surpassed them.

To check the performance of the resulting binaries from the compilers, I tried a couple of CPU intensive tests. The first one was just a SELECT only pgbench test. The second was running the square roots of random numbers and then sorting the results. For the Windows binaries built with Microsoft Visual Studio, I used the PostgreSQL Windows installer from the PostgreSQL community site at http://www.postgresql.org/download/windows/. The Mingw-w64 build came from the bigsql.org site at http://www.bigsql.org/se/postgresql/packages.jsp

The version strings of the 2 binaries are below.

test=# select version();
 version
-------------------------------------------------------------
 PostgreSQL 9.5.1, compiled by Visual C++ build 1800, 64-bit
(1 row)

test=# select version();
 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 9.5.1 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit
(1 row)

For these tests, the only tuning of the postgresql.conf file was to set shared_buffers to 2GB. All other settings were left at the defaults.

The SELECT only pgbench test was conducted with the following command:

pgbench -n -S -c 10 -t 100000 -U postgres test
MSVC 34246 TPS
GCC 34706 TPS

That is a minor 1.3% performance improvement. That is not a ground breaking gain, but the gcc built binaries were consistently faster.

The second test was a bit more CPU intensive running the following query:

select sqrt((random() * g)::numeric) from generate_series(1, 1000) g order by 1;

This was put in a text file and run through pgbench with the following command:

pgbench.exe -n -f test.sql -c 8 -t 10000 -U postgres test
MSVC 610 TPS
GCC 634 TPS

This resulted in a larger performance gain of 3.8%.

I started this testing with trying to find an easier way to build extensions on Windows with hopes of not taking too much of a performance hit, but ended up finding that the easier way was actually a littler faster. I am thinking its time to ditch those proprietary build tools and stick with open source.

stinger

Stinger Delivers Fast Hive SQL via YARN & TEZ




The latest BigSQL bundle now includes Stinger and Tez to dramatically improve performance and extend the SQL dialect. In the latest distribution of BigSQL Release 2, it is extremely easy to get started exploring Tez and Stinger in your own sandbox environment. Taking advantage of YARN, Stinger is now delivering on its three main goals of Speed, SQL Enhancememnts & Scale..

 

SPEED

The Stinger initiative (Hive 0.13 & Tez-0.3) calls for a 100X performance over current Hive (Hive 0.10) batch queries. Key developments helping to achieve these results are:

  • Leveraging ORC Files these columnar formats allow for faster read, writes and processing of Hive data. The format supports predicate pushdowns, compression and even allows light-weight indexes to be stored within the files.
  • Continued Optimizations in the Hive Query Engine Continuous enhancements include avoiding unneeded writes, support for hot containers (saves launch shutdown overhead), in memory cache, leveraging wider Metastore statistics have all helped improve latency and throughput.

 

SQL ENHANCEMENTS

Full support DECIMAL, TIMESTAMP, DATE & VARCHAR datatypes as well as a rich set of semantic improvements.

 

Scale

Designed for large scale data environments, Stinger’s goal is to deliver even when you are analyzing Petabytes of data.

 

Stinger in Action!

Below is a screenshot from the BigSQL YARN console showing the execution of a Tez job.

At OpenSCG we focus on helping you get the most value and leverage out of your data environments. Its an exciting time in DATA and we thrive on helping our clients understand the latest changes in data technology. We realize that there are great opportunities in new NoSQL and clustered data solutions and take pride in helping you evaluate your data options.

tPostgres Features Sybase & SQL Server Compatibility

We are pleased to announce version 9.3.2 of tPostgres. Find out more at http://www.tpostgres.org

The transformation of the tPostgres: Sybase & SQL Server Compatibility Project project is now
complete. We are fully based on the PGTSQL open source PostgreSQL Extension and are fully woking on
the very latest and greatest point release of PG (9.3.2).

Additionally, we have done away with the bulky and awkward 32-bit linux virtual machine. We now have safe and simple to use native command line sandbox installers for Windows, OSX & Linux.

We’ve still got documentation work to do, but there are excellent regression test transact-sql programming examples in the PGTSQL code. Please take a look for yourself and consider contributing to the project at https://bitbucket.org/openscg/pgtsql

–Luss

Oozie-eight

Oozie- Hadoop Workflow Scheduler in BigSQL

This tutorial will show you how to create and Oozie Workflow using either BigSQL Hue or the Oozie command line tools, in the latest version of BigSQL. Oozie is a workflow scheduler that will allow you to manage Hadoop jobs. You can also create an Oozie workflow that incorporates shell scripts, map-reduce applications, and even Pig, Sqoop, and Hive jobs. This tutorial will just use the sample map-reduce word count job from Oozie.

First, you will need to set up the workflow.xml and libraries for the job:

   $ cd $BIGSQL_HOME/examples/more/oozie 
   $ tar -xvf oozie-examples.tar.gz

This example is in the examples/apps/map-reduce directory. The input data is examples/input-data/text/data.txt. You will need to make two changes in the file examples/apps/map-reduce/job.properties:

  • change localhost to your computer’s host name
  • change the port number from 8020 to 9000

Now, move the whole example directory to your user directory in HDFS:

   $ hadoop fs -put examples /user/cadym/.

If you want to just run the map-reduce example from the command line you would then use this command:

   $ $OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run

To check the status of that job, you can use the command:

   $ $OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie --info <Application ID>

The file workflow.xml in examples/apps/map-reduce is the description of the workflow. For example, the line below shows that when preparing to start the job, the output directory will be deleted. The job.properties file will pass variables, such as the nameNode address, the root example directory, and the output directory to the workflow.xml file.

<prepare>
     <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}"/>
</prepare>

The configuration section defines the configuration for the mapreduce job, like the mapper class that will be used:

<property>
     <name>mapred.reducer.class</name>
     <value>org.apache.oozie.example.SampleReducer</value>
</property>

Next, the workflow.xml file explains what needs to happen after the mapreduce action finishes. Since this is a simple example, after the mapreduce job it will just go to the end rather than to another step. If there is an error, the application will be killed with the error message shown below:

     <ok to="end"/>
     <error to="fail"/>
   </action>
<kill name="fail">
   <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>

To run this Oozie job in Hue, you can just click on editor from the workflow menu in the top tool bar. Once on that page, you can click on the import button on the right side of the screen.

Here, you can give the workflow a name. Then, next to “Local workflow.xml file”, you will need to click browse and select the examples/apps/map-reduce/workflow.xml file. Next, click import.

Now, you need to edit the properties to the mr-node step of your workflow.

All you need to add here is the location of the jar file for this workflow. Click on the “..” after the “Jar name” filed.

Select the jar: examples/apps/map-reduce/lib/oozie-examples-4.0.0.2.0.6.0-101.jar. Then click done.

Now, you can save your workflow and click submit on the left toolbar. A window will pop up asking you to specify the following variables:

This will bring you to the Oozie Dashboard where you can track the progress of each step in your workflow.

Once the job finishes, you can go the file browser and see a word count file in examples/output-data/map-reduce directory

cassandraeye2

BigSQL Adds Cassandra to the Hadoop and PostgreSQL Bundle

The BigSQL developer’s bundle has added a significant new component: Cassandra 2.0.4.  All of the bundles and VM’s now allow developers to get up and running with the latest Hadoop ecosystem and Cassandra.   Join our free Beta program now!  

As a significant number of our clients are leveraging Cassandra for Highly Available Multi Partition solutions, it was a natural fit to add to our bundle.    

In addition to Cassandra the latest beta includes all of the standard BigSQL components and many enhancements:

  • Continued HadoopFDW development (Postgres  – Hadoop Foreign data wrapper) fixes and improvements
  • JDK 7 is now a pre-req on all platforms
  • upgrade PostgreSQL from 9.3.1 to 9.3.2
  • upgrade DataFu from 1.1.1 to 1.2.0
  • upgrade PostgresqlStudio from 1.0 to 1.1
  • upgrade Hue from 3.0.0 to 3.5.
  • autostart Oozie & Hue in the vm
  • upgrade Sqoop from 1.4.4 to 1.99.3 (and autostart it)
  • include pgTSQL functionality from tpostgresq.org 
  • compact our quickstart virtual machine from 3 GB to 1.5 GB
  • configure eclipse for hadoop in the vm
At OpenSCG we are focus on helping you get the most value and leverage out of your data environments.  Its an exciting time in DATA and we thrive on helping our clients understand the latest changes in data technology.  We realize that there are great opportunities in new NOSQL and clustered data solutions and take pride in helping you evaluate your data options.
huesqoop

Sqoop2 in BigSQL Hue

This tutorial will show you how to import PostgreSQL data into the Hadoop File System using the Sqoop 2 application within Hue. Hue, or the Hadoop User Experience, allows you to use Oozie, Hive, Pig, Postgres, Sqoop, Hbase, Zookeeper, and Hadoop from a browser. Since release three of the BigSQL Quick-start Virtual Machine comes with Hue already installed, I will be using that for this tutorial.

In the virtual machine, open a terminal window, go to the bigsql-9.3r2-b3 directory and start BigSQL:

$ cd bigsql-9.3r2-b3
$ ./bigsql.sh start

Next, set up the benchmarksql.customer history table:

$ . ./setENV.sh
$ psql

postgres=# CREATE USER benchmarksql WITH SUPERUSER PASSWORD ‘password';
postgres=# GRANT ALL PRIVILEGES ON DATABASE postgres TO benchmarksql;
postgres=# \q

$ cd examples/run/
$ ./runSQL.sh props.pg sqlTableCreates
$ ./runLoader.sh props.pg numWarehouses 1
$ ./runSQL.sh props.pg sqlIndexCreates

Now, open hue by going to localhost:8000/ in the virtual machine’s web browser. Log in with the username: bigsql and the password: bigsql.

In the Data Browsers menu, select Sqoop Transfer. Then click to add a new job.

Next, you will need to create a new connection to your PostgreSQL database. Click on +Add a new connection.

Name the connection, make sure generic-jdbc-connector is selected, and define the class as org.postgresql.Driver. For this example we’re connecting to the postgres database as the benchmarksql user. Enter  jdbc:postgresql://localhost:5432/postgres as the connection string, benchmarksql as the username, password as the password, and then click Save.

Now you can name the job, select that is an Import job type, select the connection that you just created and then click Next.

The next page is where you will fill out the information about where you are importing from. For this tutorial, specify that the schema name is Benchmarksql, that the table name is History, and then click Next.

Next, describe where you want Sqoop to move the data to. Choose HDFS as the storage type, Text File as the output format, no compression, and that the output directory will be /user/data/example/hive_customer_history. Then, click Save and run.
Once the job starts running, a notification will pop up next to the Job Browser. If you go here, you will be able the see the status of the mappers and reducers, and other information about the job.

Once the job has finished, you will be able to go to the File Browser and see the data in HDFS.

Sqoop 2, Importing from PostgreSQL to HDFS

This tutorial will show how to configure Sqoop 2, start the server and client, and import a table from Postgres to HDFS. The latest version of Sqoop is part of BigSQL release two, which you can download here.

The first part of this tutorial involves creating the customer history Postgres table using BenchmarkSQL:

$ . ./setENV.sh
$ psql

postgres=# CREATE USER benchmarksql WITH SUPERUSER PASSWORD 'password';
postgres=# GRANT ALL PRIVILEGES ON DATABASE postgres TO benchmarksql;
postgres=# \q

$ cd examples/run/
$ ./runSQL.sh props.pg sqlTableCreates
$ ./runLoader.sh props.pg numWarehouses 1
$ ./runSQL.sh props.pg sqlIndexCreates

Next, go to Sqoop home and set up Sqoop for BigSQL. Make sure that the sqoop/server/conf/catalina.properties file has the location of all of Hadoop’s jars listed in the common.loader property and that sqoop/server/conf/sqoop.properties file has your Hadoop configuration location for the property org.apache.sqoop.submission.engine.mapreduce.configuration.directory.

$ cd $SQOOP_HOME/
$ mkdir $SQOOP_HOME/lib
$ cp $PGHOME/jdbc/*.jar $SQOOP_HOME/lib/.

Next, start the Sqoop client shell:

$ ./sqoop.sh server start
$ ./sqoop.sh client

First, you need to define the connection to Postgres through the create connection command:

sqoop:000> create connection --cid 1
Creating connection for connector with id 1
Please fill following values to create new connection object
Name: benchmarksql

Connection configuration

JDBC Driver Class: org.postgresql.Driver
JDBC Connection String: jdbc:postgresql://localhost:5432/postgres
Username: benchmarksql
Password: password
JDBC Connection Properties: 
There are currently 0 values in the map:
entry# 

Security related configuration options

Max connections: 
New connection was successfully created with validation status FINE and persistent id 1

Next, create an import job that uses that connection:

sqoop:000> create job --xid 1 --type import
Creating job for connection with id 1
Please fill following values to create new job object
Name: Customer History

Database configuration

Schema name: benchmarksql
Table name: history
Table SQL statement: 
Table column names: 
Partition column name: 
Nulls in partition column: 
Boundary query: 

Output configuration

Storage type: 
  0 : HDFS
Choose: 0 
Output format: 
  0 : TEXT_FILE 
  1 : SEQUENCE_FILE 
Choose: 0 
Compression format:
  0 : NONE
  1 : DEFAULT 
  2 : DEFLATE 
  3 : GZIP 
  4 : BZIP2 
  5 : LZO 
  6 : LZ4 
  7 : SNAPPY 
Choose: 0 
Output directory: /user/data/example/hive_customer_history 

Throttling resources 
Extractors: 
Loaders: 

New job was successfully created with validation status FINE and persistent id 1

Start the job by using the following command:

sqoop:000> start job --jid 1
Submission details
Job ID: 1
Server URL: http://localhost:12000/sqoop/
Created by: cady
Creation date: 2013-12-26 13:45:48 EST
Lastly updated by: cady
External ID: job_1388079921472_0001

http://localhost:8088/proxy/application_1388079921472_0001/

2013-12-26 13:45:48 EST: BOOTING  - Progress is not available

You can track the job at the link provided, or use the status command:
sqoop:000> status job –jid 1

Submission details
Job ID: 1
Server URL: http://localhost:12000/sqoop/
Created by: cady
Creation date: 2013-12-26 13:45:48 EST
Lastly updated by: cady
External ID: job_1388079921472_0001

http://localhost:8088/proxy/application_1388079921472_0001/

2013-12-26 13:47:02 EST: RUNNING  - 15.00 %

Now if you browse the Hadoop file system, all of the data from the Postgres table will be in the /user/data/example/hive_customer_history directory.

Hive, Postgres, and Hadoop Foreign Data Wrapper Video Tutorial

This demo shows how to run an example in BigSQL that uses Hive, Hadoop, PostgreSQL, and the Hadoop Foreign Data Wrapper to leverage the power of Hadoop from within PostgreSQL. This tutorial uses BigSQL 9.3 Release 2 – Beta2, which includes Hadoop-2.2 and Hive-0.12. You can download the newest release, or the Quick start VM for VMware, at BigSQL.org.

In fewer than four minutes this tutorial will show you how to:

  • Create a Hive tables
  • Load a CSV file into a Hive table
  • Set up the Hadoop Foreign Data Wrapper in PostgreSQL
  • Access data in Hadoop from PostgreSQL

[youtube url=”http://www.youtube.com/watch?v=YI006EMo_SY” autohide=”0″ fs=”1″]