BigSQL Up and Running

All Big Data projects are a collection of related projects, each suited to a specific function to implement that Big Data solution. BigSQL can be used as a Big Data Data Warehousing solution. In addition to the archetypical Big Data components (Hadoop Distributed File System and MapReduce), there are additional projects in BigSQL that allow you to integrate Big Data and Relational Data and query it through SQL.

These components are designed to work together and to be interchangeable. This is great for flexibility.

As far as installation goes it’s the same as with all open source projects, you can build each component from source or you can lay down an integrated bundle. However you do this you need to know what components are being installed, how they are configured and what they are doing.

First time through you can go through any of the excellent texts on installing a Data Warehousing solution with Big Data. If you follow a procedure in a book, you will end up with that implementation working in your environment. OpenSCG has taken all the components that are required in a Big Data Data Warehousing solution and bundled them together as BigSQL. You can download it, run it and within a few minutes have a complete BigSQL environment up and running.

This post takes you through the installation process so you can understand what components are being installed and where they are being installed.

Whether you build each component from source, have someone do it for you, or use an installer bundle, if you are working with BigSQL you need to understand conceptually what components make up your solution. This is an annotated description of what happens when you run the BigSQL installer. BigSQL is a superb mechanism for:

  • Creating quick test instances on the fly for demo, proof of concept, benchmarking etc.
  • Understanding the installation and configuration of a BigSQL solution (and by extension any Big Data solution.)
  • Gaining a deeper understanding of the core components within BigSQL
  • Taking the open source installer bundles and using them to create your own for Dev, QA, Integration, Regression and Production environments

BigSQL will install equally easily in Mac OS X and Linux. Mac OS X is an excellent development environment, Linux is used for production.

First, download your BigSQL bundle here.

Second, fear not! You can’t really make a mistake. If you run into problems or have had enough fun for one day, you can blow the whole thing away and just restart.

Installing and Running BigSQL.

Unzip the tarball you just downloaded and go to the newly created bigsql directory. Mine is bigsql-13.02.7, yours will be qualified by whatever version you download.

./bigsql start

Regardless of what other components that may be incorporated into the BigSQL bundle, once you have it on your machine, it’s just a question of typing

./bigsql start

Now, when you do this there’s a number of things that will happen behind the scenes and I’m going to take this opportunity to explain what is happening. On My Mac (2.4 GHz Intel Core i5, 8GB RAM), it takes about 90 seconds to unzip the BigSQL tarball (your mileage may vary, I have other stuff going on at the same time.)

So, while the tarball is unzipping you can read what will happen when you start BigSQL for the first time:

./bigsql start

  1. BigSQL will check to see if the ports for PostgresHA, ZooKeeper, Hadoop and Hive are available. If they are:
    1. BigSQL will go through and set up the required environment variables for each of these
  2. BigSQL will then start each of the components in the right order. If they do not exist yet and in the following order:
    1. BigSQL will initialize ZooKeeper and start it on the default port
    2. BigSQL will initialize PostgresHA and start it on the default port
  3. BigSQL will initialize Hadoop
  4. Create an Hadoop Distributed File System
  5. Start the Hadoop NameNode on its default port and start the Job Tracker
  6. BigSQL will then initialize Hive
  7. Create the Hive Metastore in PostgresHA
  8. Start the MetaStore
  9. Start the Hive Server and finally
  10. Check the status of the newly installed BigSQL Data Warehouse

BigSQL is now up and running on your machine and by my count on my machine, the process took about 30 seconds and when it runs, it looks like this:

BigSQL initializes ZooKeeper.

ZooKeeper Start

BigSQL initializes Hadoop.

Once Hadoop is initialized BigSQL creates an HDFS file system and starts its NameNode and the Job Tracker:

Hadoop Start

BigSQL initializes Hive.

Once Hive is initialized BigSQL creates the Hive MetaStore in PostgreSQL, starts the MetaStore and starts the Hive Server:

Hive Start

and lastly

BigSQL is ready for use!

BigSQL Status

And, Just in case you have any questions.

The BigSQL mailing list is available here: