Sqoop2 in BigSQL Hue

This tutorial will show you how to import PostgreSQL data into the Hadoop File System using the Sqoop 2 application within Hue. Hue, or the Hadoop User Experience, allows you to use Oozie, Hive, Pig, Postgres, Sqoop, Hbase, Zookeeper, and Hadoop from a browser. Since release three of the BigSQL Quick-start Virtual Machine comes with Hue already installed, I will be using that for this tutorial.

In the virtual machine, open a terminal window, go to the bigsql-9.3r2-b3 directory and start BigSQL:

$ cd bigsql-9.3r2-b3
$ ./ start

Next, set up the benchmarksql.customer history table:

$ . ./
$ psql

postgres=# CREATE USER benchmarksql WITH SUPERUSER PASSWORD ‘password';
postgres=# GRANT ALL PRIVILEGES ON DATABASE postgres TO benchmarksql;
postgres=# \q

$ cd examples/run/
$ ./ sqlTableCreates
$ ./ numWarehouses 1
$ ./ sqlIndexCreates

Now, open hue by going to localhost:8000/ in the virtual machine’s web browser. Log in with the username: bigsql and the password: bigsql.

In the Data Browsers menu, select Sqoop Transfer. Then click to add a new job.

Next, you will need to create a new connection to your PostgreSQL database. Click on +Add a new connection.

Name the connection, make sure generic-jdbc-connector is selected, and define the class as org.postgresql.Driver. For this example we’re connecting to the postgres database as the benchmarksql user. Enter  jdbc:postgresql://localhost:5432/postgres as the connection string, benchmarksql as the username, password as the password, and then click Save.

Now you can name the job, select that is an Import job type, select the connection that you just created and then click Next.

The next page is where you will fill out the information about where you are importing from. For this tutorial, specify that the schema name is Benchmarksql, that the table name is History, and then click Next.

Next, describe where you want Sqoop to move the data to. Choose HDFS as the storage type, Text File as the output format, no compression, and that the output directory will be /user/data/example/hive_customer_history. Then, click Save and run.
Once the job starts running, a notification will pop up next to the Job Browser. If you go here, you will be able the see the status of the mappers and reducers, and other information about the job.

Once the job has finished, you will be able to go to the File Browser and see the data in HDFS.