Friday, December 26, 2014

Sensu Plugins for Tracking Metrics

At this point, you should already be familiar with getting Graphite, Sensu and Grafana set up. I've written this article as, one, a supplement to those guides and, two, to shed some color usage documentation for these plugins, which are unfortunately, still a bit underdeveloped. Before we dive in, here's a screenshot of the metrics I'm currently tracking with Sensu.

So from a high level, it's very easy to see that intrepid, my logging and monitoring server, is currently the busiest of the three, with elon, my application server, catching up when CI is building. As your systems grow in complexity, remember that this data should remain simple. With the right set up, you'll identify and diagnose issues faster and make better data-driven business decisions.

Prerequisites: Set yourself up with Graphite, Sensu, and Grafana, first. This guide was written for Linux Ubuntu users.

Troubleshooting

We're going to get this out of the way, before we go any further. In case your Sensu server or any of your Sensu clients fails to restart after you make changes to any of the plugins, don't forget you can navigate to the Sensu dashboard to view errors. And you can tail the logs, of course.

# tail the server
sudo tail -f /var/log/sensu/sensu-server.log

# tail the client
sudo tail -f /var/log/sensu/sensu-client.log

Planning Our Naming Hierarchy

In order to easily distinguish all of the statistics we're tracking from other events and carbon stats, we're going to make sure we create a top-level stats namespace, and then group down the line as necessary. You'll notice some patterns as you carry out the steps ahead. Every *_metrics.json check we implement, for instance, passes in our custom namespace with the --scheme flag into the Sensu plugin. Here's the structure, broken down:

# here's our full nginx scheme
--scheme stats.:::name:::.nginx

# let's break this down
--scheme        # flag
stats.          # top-level 'stats' namespace
:::name:::      # we extract the name from the client
.nginx          # we add another level for 'nginx'

So from the example above, you've learned that Sensu provides you the ability to define each level with dot notation. This may also be the first time you've seen the :::property::: convention. All this allows you to do is reference values in the /etc/sensu/conf.d/client.json file of each Sensu client. So by adding in a reference to each client's name, you add a level to the stats namespace for each server that's easily identifiable. Let's take a look at one of these client.json files, again.

{
  "client": {
    "name": "client1",
    "address": "x.x.x.x",
    "mongodbPort": 27018,
    "subscriptions": [ "nginx_metrics", "mongodb_metrics", "all" ]
  }
}

As you can see, we've defined name, address, mongodbPort, and subscriptions in our client object. You can continue to add as many properties here as you'd like to pass through to the handlers.

Example Plugins

Nginx Metrics

You'll have to turn on the Nginx status page, first, by following this guide. Once your Nginx status page is up, you'll want to make sure to deny all requests from any IPs except your server and clients. Then you'll want to carry out the following steps.

# on server and client
sudo cp /opt/sensu-community-plugins/plugins/nginx/nginx-metrics.rb /etc/sensu/plugins/

# on server
sudo vi /etc/sensu/conf.d/nginx_metrics.json

{
  "checks": {
    "nginx_metrics": {
      "type": "metric",
      "handlers": ["graphite"],
      "command": "/etc/sensu/plugins/nginx-metrics.rb --url http://:::address:::/nginx_status",
      "interval": 60,
      "subscribers": ["nginx_metrics"]
    }
  }
}

// on client
{
  "client": {
    "name": "client1",
    "address": "x.x.x.x",
    "subscriptions": [ "nginx_metrics"]
  }
}

Notice that we're referencing the :::address::: because we need that to form the full path of the Nginx status page to pass into the nginx-metrics.rb plugin.

VMStats

This plugin allows us to collect basic system metrics, such as CPU usage, free memory, and context switches per second.

# on server and client
sudo cp /opt/sensu-community-plugins/plugins/nginx/vmstat-metrics.rb /etc/sensu/plugins/

# on server
sudo vi /etc/sensu/conf.d/vmstat_metrics.json

// on server
{
  "checks": {
    "vmstat_metrics": {
      "type": "metric",
      "handlers": ["graphite"],
      "command": "/etc/sensu/plugins/vmstat-metrics.rb",
      "interval": 60,
      "subscriptions": [ "vmstat_metrics" ]
    }
  }
}

// on client
{
  "client": {
    "name": "client1",
    "address": "x.x.x.x",
    "subscriptions": [ "nginx_metrics", "vmstat_metrics" ]
  }
}

Load Metrics

This plugin provides Linux load averages. If you've ever run the uptime or top commands, they're the three numbers that look like this:

load average: 0.07, 0.05, 0.09

These three numbers indicate the averages over progressively longer stretches of time (one, five, and fifteen minute averages). First thing to note, is that lower numbers are better. Anything below a threshold of 1.00 is nothing to be concerned about. Our three numbers above are well below this threshold. Once that number goes over that threshold, it's akin to traffic on a bridge being backed up. This is resolved by widening the bridge (increasing memory), or adding more bridges (CPU cores). So just to review, with a single core, 0-0.99 is safe, 1.00 means you are exactly at capacity, and over 1.00 means there's a backup. Things are more complicated when multiple cores are involved, but that's beyond the scope of this article.

# on server and client
sudo cp /opt/sensu-community-plugins/plugins/mongodb/mongodb-metrics.rb /etc/sensu/plugins/

# on server
sudo vi /etc/sensu/conf.d/load_metrics.json

{
  "checks": {
    "load_metrics": {
      "type": "metric",
      "handlers": ["graphite"],
      "auto_tag_host": "yes",
      "command": "/etc/sensu/plugins/load-metrics.rb",
      "interval": 10,
      "subscriptions": [ "load_metrics" ]
    }
  }
}

// on client
{
  "client": {
    "name": "client1",
    "address": "x.x.x.x",
    "subscriptions": [ "nginx_metrics", "vmstat_metrics", "load_metrics" ]
  }
}

MongoDB Metrics

These are basic statistics on your MongoDB servers. At this time, there is an issue with the mongodb-metrics plugin on the master branch of sensu-community-plugins related to the command that tells mongodb to return its server status. Now, I could tell you where in the plugin to make that fix, but you're better off just grabbing the version of the plugin from the milestone_V.01_WIP release.

# on server and client
sudo gem install mongo
sudo gem install bson_ext
cd /etc/sensu/plugins/; sudo wget https://raw.githubusercontent.com/sensu/sensu-community-plugins/milestone_V.01_WIP/plugins/mongodb/mongodb-metrics.rb
sudo chmod 755 /etc/sensu/plugins/mongodb-metrics.rb

# on server
sudo vi /etc/sensu/conf.d/mongodb_metrics.json

{
  "checks": {
    "mongodb_metrics": {
      "type": "metric",
      "handlers": ["graphite"],
      "command": "/etc/sensu/plugins/mongodb-metrics.rb --scheme stats.:::name:::.mongodb --port :::mongodbPort:::",
      "interval": 30,
      "subscribers": ["mongodb_metrics"]
    }
  }
}

// on client
{
  "client": {
    "name": "client1",
    "address": "x.x.x.x",
    "mongodbPort": 27018,
    "subscriptions": [ "nginx_metrics", "vmstat_metrics", "load_metrics", "mongodb_metrics" ]
  }
}

So by the end of this, we have a total of four subscriptions. Keep in mind you don't always have to specify a separate subscription for each plugin. You have the option to consolidate any of the subscription labels or use "all", for example. Anyhow, that's all for this article. I hope this was helpful.

No comments:

Post a Comment