Ganglia in the cloud ( unicast instead of multicast )

Last time, I talked about how much I like using Ganglia for monitoring a large number of distributed servers.

One of the issues I ran into that is barely covered in the documentation is how to set this up if you cannot use multicast.  Multicast is the default method that ganglia uses to discover nodes.  This is great, it means that auto-discovery works… kinda… The issue is that most cloud providers squash your ability to do multicast .  This is a good thing, can you imagine having to share a room with the guy who can’t stop screaming through the bull-horn every 2 milliseconds?  So, if I want to use ganglia in EC2, the Amazon cloud, how do I go about doing that ?

To get around this issue, you need to configure ganglia in unicast mode.  This is the mysterious part, what exactly is it, where do I set it and, how do I have multiple clusters in unicast mode all report to the same web-UI?  Most of the tutorials I read alluded to the fact that you *could* have multiple clusters setup in ganglia, and most speculated [ some even correctly ] about how to do it, but none really implemented it.  So, here is how you can disable multicast in ganglia and instead, enable unicast with multiple clusters.

First, to get started with this, there are a couple of ganglia components that you really need to be familiar with.

gmetad

gmetad is the ‘server’ side of ganglia.  It is responsible for taking the data from the remote collectors and stuffing it into the backend database ( ganglia uses rrdtool).  You’ll have one of these bad-boys running for each web-ui you have setup.

Configuration

First of all, take a look at the full, default config file.  It’s got a lot of great comments in there and really helps to explain everything from soup to nuts.  That being said, Here’s what I used ( and my comments) to get me up and running.

Configuring this is done in ( default ) /etc/gmetad.conf

# Each 'cluster' is its own data-source
# I have two clusters, so, 2 data-sources
# ... plus my local host
data_source "Local" localhost
data_source "ClusterA" localhost:8650
data_source "ClusterB" localhost:8655

# I have modified this from the default rrdtool
# storage config for my purposes, I want to
# store 3 full years of datapoints.Sure there
# is a storage requirement, but that's what I need.
RRAs "RRA:AVERAGE:0.5:1:6307199" "RRA:AVERAGE:0.5:4:1576799" "RRA:AVERAGE:0.5:40:52704"

Essentially, the above sets up two clusters, ClusterA and ClusterB.  The sources from these are coming from localhost:8650 and localhosty:8651 respectively  ( don’t worry, I’ll explain that bit below…).  The other thing for me is that I need to keep 3 full years of real datapoints.  ( rrdtool is designed to ‘aggregate’ your data after some time.  If you don’t adjust it, you lose resolution to the aggregation, which can be frustrating).

gmond

gmond is a data-collector.  It will, essentially, collect data from a host and send it … somewhere.  Let’s discuss where.

Before we address the multiple clusters piece, here’s how you disable multicast.  The default config file will contain three sections that you really care about:

( The things we need to change are:

   Cluster -> name

comment out the udp_send_channel -> mcast_join parameter

comment out the udp_recv_channel -> mcast_join parameter

comment out the udp_recv_channel -> bind parameter

)


/* If a cluster attribute is specified, then all gmond hosts are wrapped inside
* of a <CLUSTER> tag. If you do not specify a cluster tag, then all <HOSTS> will
* NOT be wrapped inside of a <CLUSTER> tag. */
cluster {
name = "unspecified"
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}

/* The host section describes attributes of the host, like the location */
host {
location = "unspecified"
}

/* Feel free to specify as many udp_send_channels as you like. Gmond
used to only support having a single channel */
udp_send_channel {
# Comment this out for unicast
#mcast_join = 239.2.11.71
port = 8649
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
# Comment this out for unicast
#mcast_join = 239.2.11.71
port = 8649
#Comment this out for unicast
#bind = 239.2.11.71
}

So, in order to convert this to unicast, you would just comment out the above, and set the port to some available tcp/ip port… that simple!

So, I have 3 clusters, localhost, ClusterA and ClusterB.  To get this working with Unicast ( unicast meaning that I talk to one specific endpoint ), I need to have a separate gmond running on my server for EACH cluster.

So, on the ganglia server, I have 3 gmond config files:

(localhost)

</pre>
/*
 * The cluster attributes specified will be used as part of the <CLUSTER>
 * tag that will wrap all hosts collected by this instance.
 */
cluster {
 name = "Local"
 owner = "Scottie"
 latlong = "unspecified"
 url = "unspecified"
}
/* The host section describes attributes of the host, like the location */
host {
 location = "GangliaSever"
}

/* Feel free to specify as many udp_send_channels as you like. Gmond
 used to only support having a single channel */
udp_send_channel {
host = localhost
port = 8649
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
port = 8649
}

/* You can specify as many tcp_accept_channels as you like to share
 an xml description of the state of the cluster */
tcp_accept_channel {
 port = 8649
}

Remember the ‘data-sources’ from your gmetad.conf file? Well, if you look up, you’ll see that the data-source for the ‘Local’ cluster was ‘localhost:8649′  Essentially, gmetad will talk to this gmond on localhost:8649 for receiving data.  Now, the remainder of your gmond.conf file is important, it dictates all of the monitoring that the gmond instance will do.  Only change the section that I have listed above.

Now for the two remaining clusters:

ClusterA:

/*
 * The cluster attributes specified will be used as part of the <CLUSTER>
 * tag that will wrap all hosts collected by this instance.
 */
cluster {
 name = "ClusterA"
 owner = "Scottie"
 latlong = "unspecified"
 url = "unspecified"
}
/* The host section describes attributes of the host, like the location */
host {
 location = "GangliaSever"
}

/* Feel free to specify as many udp_send_channels as you like. Gmond
 used to only support having a single channel */
udp_send_channel {
host = localhost
port = 8650
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
port = 8650
}

/* You can specify as many tcp_accept_channels as you like to share
 an xml description of the state of the cluster */
tcp_accept_channel {
 port = 8650
}

Cluster B:

/*
 * The cluster attributes specified will be used as part of the <CLUSTER>
 * tag that will wrap all hosts collected by this instance.
 */
cluster {
 name = "ClusterB"
 owner = "Scottie"
 latlong = "unspecified"
 url = "unspecified"
}
/* The host section describes attributes of the host, like the location */
host {
 location = "GangliaSever"
}

/* Feel free to specify as many udp_send_channels as you like. Gmond
 used to only support having a single channel */
udp_send_channel {
host = localhost
port = 8655
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
port = 8655
}

/* You can specify as many tcp_accept_channels as you like to share
 an xml description of the state of the cluster */
tcp_accept_channel {
 port = 8655
}

Now that we’ve got our ‘server’ setup to receive data for each of our clusters, we need to configure the actual hosts that are part of that cluster to forward data in.  Essentially, this is going to be the same ‘gmond’ configuration, but will forward data to the ‘gmond’ that we just setup on the server.

Let’s say we have three hosts:

Grumpy ( is our local server)

Sleepy ( Cluster A)

Doc ( Cluster B)

Now, let’s configure their gmond’s to talk to our server (Grumpy) and start saving off our data.  First of all, Grumpy is already configured up and running, so if you connected to the ganglia interface at this point ( and your gmetad is running ), you should see ‘Grumpy’ showing up in the ‘Local’ cluster.

On each of these hosts, you only change the host field to be the name or IP address of your ganglia ‘server’ ( udp_send_channel->host)

:

<pre>/*
 * The cluster attributes specified will be used as part of the <CLUSTER>
 * tag that will wrap all hosts collected by this instance.
 */
cluster {
 name = "ClusterA"
 owner = "Scottie"
 latlong = "unspecified"
 url = "unspecified"
}

/* The host section describes attributes of the host, like the location */
host {
 location = "GangliaSever"
}

/* Feel free to specify as many udp_send_channels as you like. Gmond
 used to only support having a single channel */
udp_send_channel {
host = grumpy
port = 8650
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
port = 8650
}

/* You can specify as many tcp_accept_channels as you like to share
 an xml description of the state of the cluster */
tcp_accept_channel {
 port = 8650
}</pre>

On Doc ( Cluster B ), you make the same change ( udp_send_channel->host ):

/*
 * The cluster attributes specified will be used as part of the <CLUSTER>
 * tag that will wrap all hosts collected by this instance.
 */
cluster {
 name = "ClusterB"
 owner = "Scottie"
 latlong = "unspecified"
 url = "unspecified"
}
/* The host section describes attributes of the host, like the location */
host {
 location = "GangliaSever"
}

/* Feel free to specify as many udp_send_channels as you like. Gmond
 used to only support having a single channel */
udp_send_channel {
host = grumpy
port = 8655
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
port = 8655
}

/* You can specify as many tcp_accept_channels as you like to share
 an xml description of the state of the cluster */
tcp_accept_channel {
 port = 8655
}

Once you start the gmond process on each server, wait a few and they will appear in the ganglia interface. Simple as that!


Ganglia monitoring of large (and small) clusters

Lately, I’ve been spending more time back in the performance testing world.  The problem is that the testing we’re doing is taking days and in some cases, weeks to run.  No matter how diligent you are, you’re not going to be staring at your CPU / memory graphs the whole time.  The question is, what Open-Source tool should I be using to collect my metrics?

Previously, I’ve always used nmon and nmon analyser to collect and inspect ( respectively ) my metrics.  There are a few issues with it however, the most glaring of which is that the analyser tool is an Excel macro ( gross, out comes the windows VM ).  More recently I’ve been using cacti, which is a great tool for collecting system metrics, but the rrdtool defaults are a bit weak on data retention.  Basically, you end up losing your data after 24 hours.  Now, granted, I can modify cacti to increase the the number of data points stored, but there are a few issues:

  1. Data Collection is kludgy
  2. The server has a LOT to do
  3. The interface is beginning to age
  4. Adding a new host is kind of like pulling out your wisdom teeth

So, dentistry aside, I found Ganglia.  Under the covers, ganglia is really using the same database technology as cacti ( rrdtool ), but, the defaults are changed in one simple place.  In 30 seconds, I had reconfigured ALL rrd databases and metrics to store 3 years of full data points.  Pretty simple and powerful.

The big win for me though was provisioning.  The environment I’m working in has a new machine showing up each day ( or an old machine re-purposed ), so setup needs to be quick.  With Ganglia, there are two methods for doing this:

1. Multicast ( The default)

It is what is sounds like.  You turn on data collector on a host and before you even know it… your host is in the web interface.  This is really great when dealing with large clusters ( http://coen.boisestate.edu/ece/raspberry-pi/ ) in a lab where boxes come in and out before you know it.

2. Unicast ( The reality )

Multicast doesn’t work in EC2, or, in most corporate networks for that matter.  Your production environment is 4 firewalls and 9 routers from where your graphing node is.  The configuration for this mode is a bit more up-front work, but, once you get it setup, you just start the collector daemon and it connects to the mothership and does the rest ( provisioning, initial graphing, etc… )

 

If you’re looking for a monitoring solution that gives you all the data-points, is easy to provision and open-source… gotta go Ganglia!

 

BigSQL Architecture

BigSQL

BigSQL.

From data to information, from information to insight.

A state-of-the-art Big Data Warehouse solution that is fast, secure and continuously available. BigSQL will scale from your desktop to the cloud. Run real time OLAP directly from the worlds most secure RDBMS.

Get started with BigSQL right now.

You can immediately put BigSQL to work on your Relational Data and Big Data. BigSQL is an integrated bundle of:

  • PostgresHA – Highly Available PostgreSQL, the worlds most advanced open source database,
  • Hadoop, the archetypical Big Data solution and
  • Hive, an implementation of relational table abstrations.

BigSQL Architecture.

BigSQL Architecture

This is the core BigSQL engine and together they give you a Highly Available Big Data Warehouse solution.

When you add in components like Enterprise Search (Solr), Streams Processors (Pig), ETL (Sqoop) you have all the components required to analyze real time data directly from PostgreSQL including your NoSQL data in Hadoop.

Linear Scalability.

BigSQL leverages the linear scalability of Hadoop and HDFS across low cost commodity hardware and/or the cloud. It can easily scale to petabytes of information.

Platform Ubiquity.

BigSQL will lay down cleanly in 64 bit Linux (Production) and 64 bit OS X (Development) distros.

 24 x 7.

Every part of your Big Data stack should be hardened. The Hive Relational Metastore in BigSQL is PostgresHA, a Highly Available PostgreSQL implementation that can be set up and distributed exactly the same way that you would any Big Data implementation. You can have Active Standby clusters in the same datacenter but in different racks, you can stream to a remote Disaster Recover node.

Open Source.

Every component of BigSQL is Open Source. Some components serve double duty.

ZooKeeper is used as the distributed coordinator for HDFS and is used as the distributed lock manager in Hive. PostgresSQL, through PostgresHA is the relational metastore in Hive and a Relational Data Warehouse in it’s own right.

Each software component is free and it runs on cheap freely available hardware. If you cobble together enough Raspberry Pi’s, your entire hardware software stack could be open source.

Security.

BigSQL is built on PostgreSQL, the worlds most secure RDBMS.

Data Equivalence.

BigSQL gives you equivalent access to your Big Data and Relational Data through psql (the PostgresHA Command Line Interface) and the Hadoop Foreign Data Wrapper.

Help.

OpenSCG The people that built BigSQL are here to help, from package customization through on-site consulting to 24 x 7 database administration.

BigSQL

“From data to information, from information to insight.”