Oozie- Hadoop Workflow Scheduler in BigSQL

This tutorial will show you how to create and Oozie Workflow using either BigSQL Hue or the Oozie command line tools, in the latest version of BigSQL. Oozie is a workflow scheduler that will allow you to manage Hadoop jobs. You can also create an Oozie workflow that incorporates shell scripts, map-reduce applications, and even Pig, Sqoop, and Hive jobs. This tutorial will just use the sample map-reduce word count job from Oozie.

First, you will need to set up the workflow.xml and libraries for the job:

   $ cd $BIGSQL_HOME/examples/more/oozie 
   $ tar -xvf oozie-examples.tar.gz

This example is in the examples/apps/map-reduce directory. The input data is examples/input-data/text/data.txt. You will need to make two changes in the file examples/apps/map-reduce/job.properties:

  • change localhost to your computer’s host name
  • change the port number from 8020 to 9000

Now, move the whole example directory to your user directory in HDFS:

   $ hadoop fs -put examples /user/cadym/.

If you want to just run the map-reduce example from the command line you would then use this command:

   $ $OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run

To check the status of that job, you can use the command:

   $ $OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie --info <Application ID>

The file workflow.xml in examples/apps/map-reduce is the description of the workflow. For example, the line below shows that when preparing to start the job, the output directory will be deleted. The job.properties file will pass variables, such as the nameNode address, the root example directory, and the output directory to the workflow.xml file.

     <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}"/>

The configuration section defines the configuration for the mapreduce job, like the mapper class that will be used:


Next, the workflow.xml file explains what needs to happen after the mapreduce action finishes. Since this is a simple example, after the mapreduce job it will just go to the end rather than to another step. If there is an error, the application will be killed with the error message shown below:

     <ok to="end"/>
     <error to="fail"/>
<kill name="fail">
   <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>

To run this Oozie job in Hue, you can just click on editor from the workflow menu in the top tool bar. Once on that page, you can click on the import button on the right side of the screen.

Here, you can give the workflow a name. Then, next to “Local workflow.xml file”, you will need to click browse and select the examples/apps/map-reduce/workflow.xml file. Next, click import.

Now, you need to edit the properties to the mr-node step of your workflow.

All you need to add here is the location of the jar file for this workflow. Click on the “..” after the “Jar name” filed.

Select the jar: examples/apps/map-reduce/lib/oozie-examples- Then click done.

Now, you can save your workflow and click submit on the left toolbar. A window will pop up asking you to specify the following variables:

This will bring you to the Oozie Dashboard where you can track the progress of each step in your workflow.

Once the job finishes, you can go the file browser and see a word count file in examples/output-data/map-reduce directory