Thursday, May 28, 2015

Analyzing Your Applications with StatsD, InfluxDB, and Grafana

StatsD was original developed by Flickr before being adopted by Etsy. It's a Node.js-powered network daemon that listens for statistics like counters and timers over a UDP or TCP connection, acting as a local aggregator. StatsD can then send application metrics to a backend for time-series data storage and visualization. Some of the most popular tools used in conjunction with StatsD are Whisper + Graphite and InfluxDB + Grafana.

The diagram above describes the data flow where the metrics travel from the application to StatsD, then get stored into InfluxDB, before finally getting visualized in Grafana. If you'd like to get a beter understanding of how Graphite and Grafana work together, head over to my guide about setting up Grafana for Graphite and Sensu.

Prerequisites: You'll want to first get familiar with Vagrant. This guide was written for Mac OS X users.

I'm going to spend the next few sections detailing the manual steps to get everything up and running in a local environment.

Preparing a VM with Vagrant

# create a directory for the vm config
mkdir -p ~/vagrants/statsd && cd ~/vagrants/statsd
vagrant init ubuntu/trusty64

# edit the newly created Vagrantfile
nano Vagrantfile

# uncomment the following line
config.vm.network "private_network", ip: "192.168.33.10"

# save and exit out of the file, spin up and ssh into the vm
vagrant up
vagrant ssh
Now that we've SSH'd into our VM, let's set up all of our services.

Installing InfluxDB

sudo apt-get update

cd ~
sudo wget https://s3.amazonaws.com/influxdb/influxdb_latest_amd64.deb
sudo dpkg -i influxdb_latest_amd64.deb
sudo /etc/init.d/influxdb start

This is all you need to get InfluxDB up and running. For full instructions, follow this link. Now open up a browser on your host machine (Mac OS X), and navigate to http://192.168.33.10:8083. Enter the username root and password root and click Connect, which takes you to the following screen:

Under Database Details, enter "demo" as the Database Name, and click Create Database. That's all you'll need to do for now. When setting InfluxDB up in production, make sure to refer to the official documentation, especially the section about file limits..

Installing StatsD

sudo apt-get update
sudo apt-get install git nginx

cd /opt
sudo git clone https://github.com/etsy/statsd.git

cd statsd
sudo add-apt-repository ppa:chris-lea/node.js
sudo apt-get update
sudo apt-get install nodejs
sudo npm install statsd-influxdb-backend -d

sudo cp exampleConfig.js config.js

Now, edit /opt/statsd/config.js which sets up StatsD to send metrics to InfluxDB.

# remove this block
{
  graphitePort: 2003
, graphiteHost: "graphite.example.com"
, port: 8125
, backends: [ "./backends/graphite" ]
}
# add this block
{
  influxdb: {
    host: '127.0.0.1', // InfluxDB host (default 127.0.0.1)
    port: 8086, // InfluxDB port (default 8086)
    database: 'demo',  // InfluxDB db instance (required)
    username: 'root', // InfluxDB db username (required)
    password: 'root', // InfluxDB db password (required)
    flush: {
      enable: true // enable regular flush strategy (default true)
    },
    proxy: {
      enable: false, // enable the proxy strategy (default false)
      suffix: 'raw', // metric name suffix (default 'raw')
      flushInterval: 1000
    }
  },
  port: 8125, // statsD port
  backends: ['./backends/console', 'statsd-influxdb-backend'],
  debug: true,
  legacyNamespace: false
}

Start StatsD with the following command. This will run statsD in the foreground, so you'll want to open up a new shell window and continue with the remaining steps, open up a separate tmux pane, or just run StatsD at the end.

nodejs /opt/statsd/stats.js /opt/statsd/config.js

Installing Grafana

# install grafana from source
cd /opt
sudo curl -O -L http://grafanarel.s3.amazonaws.com/grafana-1.9.0.tar.gz
sudo tar xf grafana-1.9.0.tar.gz
sudo cp -R grafana-1.9.0 /usr/share/grafana

# clone the config file
cd /usr/share/grafana/
sudo cp config.sample.js config.js

sudo nano config.js

Now let's set up Grafana to hook into InfluxDB.

// datasources, add as many as needed
datasources: {
  influxdb: {
    type: 'influxdb',
    url: "http://192.168.33.10:8086/db/demo",
    default: true,
    username: 'root',
    password: 'root'
  },
},

Serving Grafana through Nginx

server {
  listen                9300;

  access_log            /var/log/nginx/grafana.access.log;
  error_log            /var/log/nginx/grafana.error.log;

  location / {
    root /usr/share/grafana;
  }
}

Don't forget to restart nginx with sudo service nginx restart. Access your Grafana dashboard by navigating to http://192.168.33.10:9300.

Sending Test Data to StatsD

A Python Client

Perform the following steps in the VM:
sudo apt-get install python-pip
sudo pip install statsd

curl -O https://gist.githubusercontent.com/roblayton/31201173f4b96d9f72d5/raw/9d429db53682c7d1e5774a16775f84738cbf9bcf/statsdtest.py
python statsdtest.py

Now, navigate to grafana and you should see the data that was entered into InfluxDB.

3 comments:

  1. thanks for this.
    Just one question: I dont see any data in grafana, could you give me a example sql-statement for your data?
    thanks
    Zalu

    ReplyDelete
  2. Hi Rob, all my statsd 30000 metrics are going into a single Influxdb database my_db. Can you please tell me how I can split the statsd data into multiple databases on influxdb. If i dont do this, i simply cant use Grafana, it just goes on spinning trying to load 30000 measurements when i'm trying to build a query. With multiple databases, i can create multiple instances on Grafana, this would solve my problem, appreciate your help in this and could you please write back to me at sraj9 at yahoo, cheers

    ReplyDelete
  3. Hi Rob, all my statsd 30000 metrics are going into a single Influxdb database my_db. Can you please tell me how I can split the statsd data into multiple databases on influxdb. If i dont do this, i simply cant use Grafana, it just goes on spinning trying to load 30000 measurements when i'm trying to build a query. With multiple databases, i can create multiple instances on Grafana, this would solve my problem, appreciate your help in this and could you please write back to me at sraj9 at yahoo, cheers

    ReplyDelete