header

JasperReports with Hadoop – Tutorial

This tutorial covers step by step instruction on running Jasper Reports with Hadoop using Jaspersoft Studio easy and simple to use environment . Also, publishing the created sample report with Jasperserver that is connected to Jaspersoft Studio .

1. Prerequisites

a) BigSQL Tutorial is required to be followed till Step 3 http://www.bigsql.org/se/tutorial.jsp

Step 1 : Creating example.customer_history in HIVE and loading values.
Step 2 : Creating example.customer_history as foreign table in Postgres
Step 3 : Verifying example.customer_history

2. Things required to be downloaded

a) Download Jaspersoft Studio for your OS version from http://community.jaspersoft.com/project/jaspersoft-studio/releases

  • Select your OS version from Link column

  • After downloading completes , install the Studio

b) Download JasperReports Server from http://community.jaspersoft.com/project/jasperreports-server/releases for your OS version

c) Download JDBC file from http://jdbc.postgresql.org/download/postgresql-9.2-1003.jdbc4.jar

3. Simple Data Adaptor configuration

a) Verify if BIGSQL is running

$ ./bigsql status
#################################################
# BIGSQL:- your BIGSQL version
# TIME: current time
# HOST: your hostname
# JAVA: path to your JAVA_HOME
# OS: your OS
#################################################

## Checking critical ports ################
Postgres port 5432 is busy.
ZooKeeper port 2181 is busy.
HDFS port 50070 is busy.
HBase port 60010 is busy.
Hive port 9083 is busy.
Tomcat port 8080 is busy.

b) Run the installed Jaspersoft Studio instance

1) For DataAdapter , click Under Repositor Tab → Right click on Data Adaptors → Create New Data Adaptor → Next → Database JDBC Connection → Next

2) Database Location

    • Enter Name “BigSQL”
    • JDBC Driver : org.postgresql.Driver
    • JDBC Url : jdbc:postgresql://localhost:5432/postgres
    • Username : postgres
    • Password : password

3) Driver Classpath

    • Click on Add Button → Select Driver Path → Click Open

c) Click Finish Button to add the Data Adaptor

4. Easy Report Creation method

a) Create a new Project

  • File → New → Project → JasperReports Project .

b) Create New Jasper Reports

  • File → New → Jasper Report → Blank A4 → Enter Name: bigsql.jrxml → Finish.

c) Select created report and open “source” view .

  • Select all and Delete selected.

d) Hadoop JasperTest Report

“copy the below report in xml and below paste it in the ‘source’ tab of the created report”

<?xml version="1.0" encoding="UTF-8"?>
<!-- Created with Jaspersoft Studio version last-->
<jasperReportxmlns="http://jasperreports.sourceforge.net/jasperreports"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://jasperreports.sourceforge.net/jasperreports http://jasperreports.sourceforge.net/xsd/jasperreport.xsd"name="BigSQL"language="groovy"pageWidth="595"pageHeight="842"columnWidth="535"leftMargin="20"rightMargin="20"topMargin="20"bottomMargin="20"uuid="73c28c9e-7a45-457e-bb9c-d3b4a665aec8">
<propertyname="com.jaspersoft.studio.data.defaultdataadapter"value="BigSQL"/>
<stylename="Title"fontName="Times New Roman"fontSize="50"isBold="true"pdfFontName="Times-Bold"/>
<stylename="SubTitle"forecolor="#736343"fontName="Arial"fontSize="18"/>
<stylename="Column header"forecolor="#666666"fontName="Arial"fontSize="12"isBold="true"/>
<stylename="Detail"fontName="Arial"fontSize="12"/>
<stylename="Row"mode="Transparent">
<conditionalStyle>
<conditionExpression><![CDATA[$V{REPORT_COUNT}%2 == 0]]></conditionExpression>
<stylebackcolor="#E6DAC3"/>
</conditionalStyle>
</style>
<queryString>
<![CDATA[Select * from example.customer_history;]]>
</queryString>
<fieldname="hist_id"class="java.lang.Integer"/>
<fieldname="h_date"class="java.sql.Timestamp"/>
<fieldname="h_amount"class="java.math.BigDecimal"/>
<fieldname="h_data"class="java.lang.String"/>
<background>
<bandsplitType="Stretch"/>
</background>
<title>
<bandheight="136"splitType="Stretch">
<staticText>
<reportElementuuid="3cbe62a2-fc4f-4470-8a41-04ba5a76f4ff"style="Title"x="170"y="40"width="263"height="62"/>
<textElementverticalAlignment="Middle">
<fontpdfFontName="Times-Roman"/>
</textElement>
<text><![CDATA[BigSQL Report]]></text>
</staticText>
<staticText>
<reportElementuuid="9c023796-93e4-4fbf-a609-9537018b189c"style="SubTitle"x="350"y="50"width="196"height="22"/>
<textElement>
<fontfontName="Times New Roman"pdfFontName="Times-Roman"/>
</textElement>
<text><![CDATA[with Jasper Reports]]></text>
</staticText>
<image>
<reportElementuuid="805e8dcf-c46e-49df-9647-4f14646c972d"x="-1"y="0"width="161"height="110"/>
<imageExpression><![CDATA["/Users/alisaggu/Desktop/logo10.png"]]></imageExpression>
</image>
</band>
</title>
<pageHeader>
<bandsplitType="Stretch"/>
</pageHeader>
<columnHeader>
<bandheight="16"splitType="Stretch">
<line>
<reportElementuuid="26e9c0f5-5aea-4bd6-848c-8f9820d2cb6c"positionType="FixRelativeToBottom"x="0"y="15"width="555"height="1"/>
<graphicElement>
<penlineWidth="0.5"lineColor="#999999"/>
</graphicElement>
</line>
<staticText>
<reportElementuuid="ad2d59e2-f15f-4475-9fc4-8068d571e552"style="Column header"x="0"y="0"width="138"height="15"forecolor="#736343"/>
<textElement/>
<text><![CDATA[Hist Id]]></text>
</staticText>
<staticText>
<reportElementuuid="90d84a3f-e23f-4090-9db8-42fca0ced0ab"style="Column header"x="138"y="0"width="138"height="15"forecolor="#736343"/>
<textElement/>
<text><![CDATA[Date Due]]></text>
</staticText>
<staticText>
<reportElementuuid="50862542-096a-4072-8665-88c1238fa7c5"style="Column header"x="276"y="0"width="138"height="15"forecolor="#736343"/>
<textElement/>
<text><![CDATA[Total Amount]]></text>
</staticText>
<staticText>
<reportElementuuid="8e8735a4-a896-4eef-876a-0472fabf5493"style="Column header"x="414"y="0"width="138"height="15"forecolor="#736343"/>
<textElement/>
<text><![CDATA[Raw Data]]></text>
</staticText>
</band>
</columnHeader>
<detail>
<bandheight="15"splitType="Stretch">
<frame>
<reportElementuuid="840d241f-46d5-4267-abf2-fbfabe77a1ba"style="Row"mode="Opaque"x="0"y="0"width="555"height="15"/>
<textFieldisStretchWithOverflow="true">
<reportElementuuid="904e88f2-9f49-494c-9e20-7a0f2a66c46e"style="Detail"x="0"y="0"width="138"height="15"/>
<textElement/>
<textFieldExpression><![CDATA[$F{hist_id}]]></textFieldExpression>
</textField>
<textFieldisStretchWithOverflow="true">
<reportElementuuid="745e034c-c2b7-406d-97b4-19f6a5a1f2ec"style="Detail"x="138"y="0"width="138"height="15"/>
<textElement/>
<textFieldExpression><![CDATA[$F{h_date}]]></textFieldExpression>
</textField>
<textFieldisStretchWithOverflow="true">
<reportElementuuid="c174e25f-9c68-4386-8968-1bda99810e3e"style="Detail"x="276"y="0"width="138"height="15"/>
<textElement/>
<textFieldExpression><![CDATA[$F{h_amount}]]></textFieldExpression>
</textField>
<textFieldisStretchWithOverflow="true">
<reportElementuuid="aa181d78-3fb6-47dd-ae2f-bc1aafba23ae"style="Detail"x="414"y="0"width="138"height="15"/>
<textElement/>
<textFieldExpression><![CDATA[$F{h_data}]]></textFieldExpression>
</textField>
</frame>
</band>
</detail>
<columnFooter>
<bandheight="45"splitType="Stretch">
<line>
<reportElementuuid="7c17f014-3c79-4265-b557-b38cf17c5f0f"positionType="FixRelativeToBottom"x="0"y="3"width="555"height="1"/>
<graphicElement>
<penlineWidth="0.5"lineColor="#999999"/>
</graphicElement>
</line>
</band>
</columnFooter>
<pageFooter>
<bandheight="25"splitType="Stretch">
<frame>
<reportElementuuid="1f9a92db-af7d-429a-ba60-a43bd78a783e"mode="Opaque"x="-21"y="1"width="597"height="24"forecolor="#D0B48E"backcolor="#F2EBDF"/>
<textFieldevaluationTime="Report">
<reportElementuuid="17fc328e-c16b-477d-a061-a733a9167907"style="Column header"x="533"y="0"width="40"height="20"forecolor="#736343"/>
<textElementverticalAlignment="Middle">
<fontsize="10"isBold="false"/>
</textElement>
<textFieldExpression><![CDATA[" " + $V{PAGE_NUMBER}]]></textFieldExpression>
</textField>
<textField>
<reportElementuuid="8087e1e6-d232-4090-97c3-72d455154cab"style="Column header"x="453"y="0"width="80"height="20"forecolor="#736343"/>
<textElementtextAlignment="Right"verticalAlignment="Middle">
<fontsize="10"isBold="false"/>
</textElement>
<textFieldExpression><![CDATA["Page "+$V{PAGE_NUMBER}+" of"]]></textFieldExpression>
</textField>
<textFieldpattern="EEEEE dd MMMMM yyyy">
<reportElementuuid="2a3dc088-bcb4-4488-8bc8-7809bdaf9901"style="Column header"x="22"y="1"width="197"height="20"forecolor="#736343"/>
<textElementverticalAlignment="Middle">
<fontsize="10"isBold="false"/>
</textElement>
<textFieldExpression><![CDATA[new java.util.Date()]]></textFieldExpression>
</textField>
</frame>
</band>
</pageFooter>
<summary>
<bandsplitType="Stretch"/>
</summary>
</jasperReport>

5. Preview your created report

Save and click on Preview to view the report .

6. Finally , Publishing the Report

a) Install the downloaded JasperReports Server and launch it .

Login the web interface with admin account .
username : jasperadmin
password : jasperadmin

b) Create a new JasperReports Server connection under repository tab in Jaspersoft Studio

c) A dialog to insert the server data appears. Fill it in as follows:

  • Name: the name of the connection. You can use any name you want. For this example we will leave the default name: JasperReports Server.
  • URL: the address of the server. The default address is already correct if we are using a local server. For this example the correct address is http://localhost:8080/jasperserver/
  • User: the username to access the server. The default for the local server is “jasperadmin”.
  • Password: as with the username, for the local server by default it is “jasperadmin”.

Then click the Test Connection button to test the connection. If everything is working, click Finish.

d) Open the report and click the button with a blue arrow in the upper-right corner of the designer. In the opened window you can browse the server directory structure to choose where to place the report. Select the Reports folder and name the report Then click Next.

e) Create the DataSource from Local Data source.

  • Select Datasource JDBC , click NEXT
  • Enter name and id then click next
  • Click Import from Jaspersoft Studio and Select BigSQL -> JDBC connection and click finish to create the datasource

f) On the web interface logged on with admin account , click on the folder Reports .

g) click on BigSQL published report to view it in the next screen

Featured Image2

Tutorial – Run BIRT against Hadoop

This tutorial covers setting up of BIRT (Business Intelligence Reporting Tool) to run against Hadoop. BIRT is a relatively simple and yet powerful analytics tool and the tutorial will help you harness the power against Hadoop.

1. Pre-Requisites

BigSQL Tutorial

The BigSQL Tutorial (http://www.bigsql.org/se/tutorial.jsp) needs to be followed till step 3

Step 1 : Creating example.customer_history in HIVE and loading values.
Step 2 : Creating example.customer_history as foreign table in Postgres
Step 3 : Verifying example.customer_history

Eclipse with BIRT

a) Download Eclipse with BIRT framework for your OS version from http://www.eclipse.org/downloads/packages/eclipse-ide-java-and-report-developers/keplerr

  • Select your OS version from Link column

  • Select a nearby mirror for the download location

b) Extract the downloaded .tar file and run Eclipse , setting up the workspace location

2. Eclipse, Creation of Report

a) Run an instance of eclipse with Birt reporting configured .

b) In the Menu , click on Window → Open Perspective → Report Design

c) From the menu Click on File → New → Report

  • Enter Parent Folder

  • Enter File Name
  • Click on Finish button

3. Data Source configuration

a) Verify if BIGSQL is running

$ ./bigsql status

#################################################
# BIGSQL:- your BIGSQL version
# TIME: current time
# HOST: your hostname
# JAVA: path to your JAVA_HOME
# OS: your OS
#################################################
## Checking critical ports ################
Postgres port 5432 is busy.
ZooKeeper port 2181 is busy.
HDFS port 50070 is busy.
HBase port 60010 is busy.
Hive port 9083 is busy.
Tomcat port 8080 is busy.

b) Open Window → Show View → Data Explorer

c) Download JDBC file from http://jdbc.postgresql.org/download/postgresql-9.2-1003.jdbc4.jar

d) Right click on Data Sources folder and select ‘ New Data Source’

e) Select ‘JDBC Data Source’ and click Next button.

f) Click on Manage Drivers and Add the JDBC Driver downloaded in (c) .

g) Select /Enter

  • Driver Class : org.postgresql.Driver
  • Database URL : jdbc:postgresql:postgres
  • User Name : postgres
  • Password : password

h) Click “ Test Connection” button to verify the connection

4. Dataset and Report

a) Right click on ‘Datasets’ and select ‘New Data Set’

b) Select JDBC Data source , Enter Data Source name and click next .

c) Enter in the “Query Text” window :

select hist_id , h_date , h_data from example.customer_history;

d) Drag the created Data Set and drop it in the blank report window to create the reporting table.

5. Report View

In the top menu , Click on Run → View Report → In Web Viewer

The Three “V”s of Big Data

Big Data. What’s the big deal, really? Data has been around for a long time, and organizations have been using it to their advantage since for ever. So what is so different now?
Legitimate questions from anyone who is sick of hearing all the hype around Big Data. Granted, you will come across many instances where the term is used just because it is the fashionable thing to do. At the same time, the drive towards what is called Big Data is very real. The acceleration is due to expansion of data in not one, not two, but three dimensions simultaneously. These three dimensions are famously known as the Three “V”s of Big Data.

Volume 

This is the obvious one. From mega to giga to tera to peta bytes, volume of data is increasing exponentially. You can understand this quite simply by comparing the size of your typical text file with that of a typical MP3. Take it forward and compare sizes with video files and then take it a step further to look at the size for what is now a norm with all of us, i.e. HD videos. Increase in size is pretty dramatic, and of course that means volume of data being collected at any place increases.
But that’s not the only thing that drives volume. Sources of data have increased significantly as well. Take, as an example, a typical cell phone we carry around. Not so long ago, the only data it had was voice calls and texts. With the advent of smartphones and the mind boggling array of sensors and other equipment within them, the amount of data they carry is staggering. Combine this with the Internet explosion, social media, digitized businesses, e-commerce and you will begin to understand why it’s not only the size of individual files that are driving data volumes in today’s world.
It is no longer ponds or lakes of data organizations are dealing with, it is vast expanses of oceans. And when you are in an ocean, you have a whole different set of challenges to navigate your way through the hazards of rough seas.

Velocity

This is probably the least understood of the 3 “V”s. There are multiple angles to look at velocity as a dimension in which data is expanding.
First is the sheer rate at which data is coming in. Along with an ever-faster connectivity, this co-relates directly with the volume dimension described above.
It is not just the rate at which data is pouring in, it is also how quickly it needs to be processed. Batch processing is not good enough, business managers need data in real-time. As I mentioned in a previous post (http://openscg.com/2013/07/bi-and-hadoop/), “I will get back to you tomorrow” is no longer an acceptable answer!
Thirdly, data is only useful as long as it is being processed faster than it is entering the system. If the velocity of data processing is less than the velocity of data entering the system, all you would end up with is an ever-growing backlog. And of course, because the demand is for real-time data, back-logged information is not really of any use to anyone.
Lastly, you need to be prepared for not only consistently high velocity, but also acceleration and, at times, bursts of intense activity.

Variety

Think about all the various types of data that you have the ability to gather with today’s technology. Audio/video streams, GPS, sensors, documents, SMS/MMS, flash, images etc etc.
Every organization gathers data and every organization wants to use that data. In order to convert this ‘dark’ unstructured data into information, traditional structured approaches no longer suffice.
Wrapping up, I hope you can see now why Big Data has gained so much momentum lately and why organizations need to take urgent steps to ensure they are not left behind in this race. Proper tools are the key!

BI and Hadoop

The Data Warehousing Institute anticipates that Hadoop technologies will soon become a common and valuable complement to established products and practices for BI (http://tdwi.org/research/2013/04/tdwi-best-practices-report-integrating-hadoop-into-business-intelligence-and-data-warehousing.aspx). Increasingly, data conscious businesses are putting their BI groups under pressure for all sorts of data and analytics. Where, some time back, a weekly report summarizing a few select KPIs was enough, expectations from BI have increased exponentially. The team is required now to not only furnish the required KPIs but also lend a hand in operational troubleshooting, running ad-hoc analytics, integrating with development teams in order to guide them on reporting needs, and of course be always available with data at their fingertips. “I will get back to you tomorrow” is no longer an acceptable answer!

In terms of availability of tools, on the face of it, the demands conflict each other. While some tools out there will help you store long term data, they will not let you run ad-hoc queries efficiently. Some other tools will be very good at ad-hoc analytics, but they will bog you down when you need to load huge quantities of data in real-time. BI seems to be in a state of conflict all the time.

We think BigSQL is the answer. BigSQL brings the best of two worlds together in a single entity. On one hand, you get Hadoop’s scalability, high-availability, and ultra-fast bulk loads. On the other hand, you get to leverage PostgreSQL’s ANSI compliant SQL analytics and structured data approach. What BigSQL does for you is to help avoid the ‘either/or’ question. You no longer have to choose between SQL and NoSQL. You get both at the same time … you get to harness that immense power.

Go on, take a peek … http://www.bigsql.org/