This tutorial will show how to configure Sqoop 2, start the server and client, and import a table from Postgres to HDFS. The latest version of Sqoop is part of BigSQL release two, which you can download here.

The first part of this tutorial involves creating the customer history Postgres table using BenchmarkSQL:

$ . ./setENV.sh
$ psql

postgres=# CREATE USER benchmarksql WITH SUPERUSER PASSWORD 'password';
postgres=# GRANT ALL PRIVILEGES ON DATABASE postgres TO benchmarksql;
postgres=# \q

$ cd examples/run/
$ ./runSQL.sh props.pg sqlTableCreates
$ ./runLoader.sh props.pg numWarehouses 1
$ ./runSQL.sh props.pg sqlIndexCreates

Next, go to Sqoop home and set up Sqoop for BigSQL. Make sure that the sqoop/server/conf/catalina.properties file has the location of all of Hadoop’s jars listed in the common.loader property and that sqoop/server/conf/sqoop.properties file has your Hadoop configuration location for the property org.apache.sqoop.submission.engine.mapreduce.configuration.directory.

$ cd $SQOOP_HOME/
$ mkdir $SQOOP_HOME/lib
$ cp $PGHOME/jdbc/*.jar $SQOOP_HOME/lib/.

Next, start the Sqoop client shell:

$ ./sqoop.sh server start
$ ./sqoop.sh client

First, you need to define the connection to Postgres through the create connection command:

sqoop:000> create connection --cid 1
Creating connection for connector with id 1
Please fill following values to create new connection object
Name: benchmarksql

Connection configuration

JDBC Driver Class: org.postgresql.Driver
JDBC Connection String: jdbc:postgresql://localhost:5432/postgres
Username: benchmarksql
Password: password
JDBC Connection Properties: 
There are currently 0 values in the map:
entry# 

Security related configuration options

Max connections: 
New connection was successfully created with validation status FINE and persistent id 1

Next, create an import job that uses that connection:

sqoop:000> create job --xid 1 --type import
Creating job for connection with id 1
Please fill following values to create new job object
Name: Customer History

Database configuration

Schema name: benchmarksql
Table name: history
Table SQL statement: 
Table column names: 
Partition column name: 
Nulls in partition column: 
Boundary query: 

Output configuration

Storage type: 
  0 : HDFS
Choose: 0 
Output format: 
  0 : TEXT_FILE 
  1 : SEQUENCE_FILE 
Choose: 0 
Compression format:
  0 : NONE
  1 : DEFAULT 
  2 : DEFLATE 
  3 : GZIP 
  4 : BZIP2 
  5 : LZO 
  6 : LZ4 
  7 : SNAPPY 
Choose: 0 
Output directory: /user/data/example/hive_customer_history 

Throttling resources 
Extractors: 
Loaders: 

New job was successfully created with validation status FINE and persistent id 1

Start the job by using the following command:

sqoop:000> start job --jid 1
Submission details
Job ID: 1
Server URL: http://localhost:12000/sqoop/
Created by: cady
Creation date: 2013-12-26 13:45:48 EST
Lastly updated by: cady
External ID: job_1388079921472_0001
    http://localhost:8088/proxy/application_1388079921472_0001/
2013-12-26 13:45:48 EST: BOOTING  - Progress is not available

You can track the job at the link provided, or use the status command:
sqoop:000> status job –jid 1

Submission details
Job ID: 1
Server URL: http://localhost:12000/sqoop/
Created by: cady
Creation date: 2013-12-26 13:45:48 EST
Lastly updated by: cady
External ID: job_1388079921472_0001
    http://localhost:8088/proxy/application_1388079921472_0001/
2013-12-26 13:47:02 EST: RUNNING  - 15.00 %

Now if you browse the Hadoop file system, all of the data from the Postgres table will be in the /user/data/example/hive_customer_history directory.