Graylog on AWS with Ansible, Part IV

An all-in-one log management solution for smaller sites

This is the final post in my series on Graylog, AWS, and Ansible. In the previous three posts, I went over how to provision, configure, and install content on an all-in-one Graylog installation using the provided AMI and Ansible to do most of the heavy lifting.

In this post, I'll go over the final step: connecting a Graylog collector and syslog forwarding to begin shipping log data into the Graylog instance.

Getting Started

Before diving in, let's go over specifically what we want to accomplish here. I'm big on checklists, and creating an outline to organize and plan your efforts can be especially helpful when putting together new systems like this.

Provision infrastructure resources (Part I)

Define the instances, network resources and storage devices we'll need for the all-in-one setup. Ensure these resources are configured and secured appropriately, and connect everything together.

Configure the Graylog server (Part II)

Install the required packages and services, ready the storage devices, and install and configure the all-in-one Graylog package.

Set up inputs, streams, and extractors (Part III)

Import an initial set of streams and extractors and set up a few Graylog inputs to enable the service to accept and parse incoming log data appropriately.

Install the Graylog collector (Part IV)

Set up the Graylog collector to monitor its own log files and those of the Graylog server, and turn on syslog forwarding.

Since we're going to use Ansible, this outline translates nicely into its built-in system of roles, plays, and tasks. We'll define a Graylog role, and playbooks for each of the four goals outlined above. Additionally, we want to ensure that our playbooks are written to be idempotent — this allows us to run the plays safely at any time without worrying about duplicating changes.

The Playbook

This playbook continues from the previous one where we configured the Graylog appliance to be able to accept incoming log data. The final step is to set up the server itself to forward log data into the running Graylog instance. This log data will come from two types of sources: syslog data and data from log files on disk.

---
- name: Configure graylog collector on the graylog instance
  tags: collector
  hosts: tag_Name_graylog
  remote_user: ubuntu
  become: yes

As before, we use the hosts argument in coordination with the AWS Dynamic Inventory script to indicate that we want this play to run on all hosts with the tag Name equal to graylog.

Install the Collector

Installing the collector is fairly straightforward. Graylog provides a package repository that we need to download and add, after which we can install the collector itself.

  tasks:
    - name: Download graylog repository package
      get_url:
        url: https://packages.graylog2.org/repo/packages/graylog-collector-latest-repository-ubuntu14.04_latest.deb
        dest: /tmp/graylog-collector-latest-repository-ubuntu14.04_latest.deb

    - name: Install repository package
      apt:
        deb: /tmp/graylog-collector-latest-repository-ubuntu14.04_latest.deb

    - name: Ensure graylog-collector is installed
      register: collector
      apt:
        name: graylog-collector
        update_cache: yes
        cache_valid_time: 86400
        state: present

We use the get_url module to download the repository package. Unlike the uri module, this has no special python package requirements on the host, so this should work on any instance (assuming it is running Ubuntu, which is the case for the Graylog all-in-one AMI).

Configure Syslog Forwarding

Forwarding local syslog messages to Graylog is extremely easy - a single line in a new rsyslog.d configuration file will do the trick:

    - name: Configure syslog forwarding
      copy:
        content: |
          *.* @localhost:514;RSYSLOG_SyslogProtocol23Format
        dest: /etc/rsyslog.d/30-graylog.conf
        mode: 0644
        owner: root
        group: root
      notify:
        - restart collector
        - restart rsyslog

Once this is done, we trigger the notification handlers for restarting both the collector and the rsyslog daemon. The actual handlers are covered further below.

Configure the Collector

One thing to note about the default permissions in the Graylog AMI is that they are not world-readable. In order to allow the collector to read and forward the logs located in /var/log/graylog/, we'll first have to recursively set the permissions to allow the collector access.

    - name: Configure log directory permissions
      file:
        path: /var/log/graylog
        state: directory
        mode: "g+rX,o+rX"
        recurse: true

    - name: Configure graylog collector
      copy:
        src: ./collector.conf
        dest: /etc/graylog/collector/collector.conf
        mode: 0644
        owner: root
        group: root
      notify:
        - restart collector

Additionally, we lay down our collector.conf file, the contents of which I will go over next.

Collector Configuration File

The Graylog collector configuration file is a HOCON-formatted file that contains instructions on which files to monitor and where to send the log data. In our case, we will be setting up monitoring for each of Graylog's individual subsystems: the collector itself, elasticsearch, etcd, mongodb, nginx, the graylog server, and the graylog web interface.

For some files, we simply split on newlines. However, others can potentially contain java-style stack trace information, in which case we want to use the custom content-splitter and content-splitter-pattern directives to split based on the initial timestamp of each log message. This will allow the entirety of a multi-line stack trace to be shipped as a single log message.

Additionally, we tag each input with a useful name to help identify the source of the log messages from within Graylog.

server-url = "http://localhost:12900/"
collector-id = graylog

inputs {
  collector_log {
    type = "file"
    path = "/var/log/graylog-collector/collector.log"
    reader-interval = "1s"
    content-splitter = "PATTERN"
    content-splitter-pattern = "^\\d{4}-\\d{2}-\\d{2}T"
    message-fields = {
      "tag" = "graylog.collector"
    }
    outputs = "collector_log"
  }

  elasticsearch {
    type = "file"
    path = "/var/log/graylog/elasticsearch/graylog2.log"
    reader-interval = "1s"
    content-splitter = "PATTERN"
    content-splitter-pattern = "^\\[\\d{4}-\\d{2}-\\d{2} "
    message-fields = {
      "tag" = "elasticsearch"
    }
    outputs = "generic"
  }

  etcd {
    type = "file"
    path = "/var/log/graylog/etcd/current"
    reader-interval = "1s"
    message-fields = {
      "tag" = "etcd"
    }
    outputs = "generic"
  }

  mongodb {
    type = "file"
    path = "/var/log/graylog/mongodb/current"
    reader-interval = "1s"
    message-fields = {
      "tag" = "mongodb"
    }
    outputs = "generic"
  }

  nginx_error {
    type = "file"
    path = "/var/log/graylog/nginx/error.log"
    reader-interval = "1s"
    message-fields = {
      "tag" = "nginx.error"
    }
    outputs = "generic"
  }

  graylog_server {
    type = "file"
    path = "/var/log/graylog/server/current"
    reader-interval = "1s"
    content-splitter = "PATTERN"
    content-splitter-pattern = "\\d{4}-\\d{2}-\\d{2}_\\d{2}:\\d{2}:\\d{2}.\\d{5} +\\w+ +\\["
    message-fields = {
      "tag" = "graylog.server"
    }
    outputs = "generic"
  }

  graylog_web {
    type = "file"
    path = "/var/log/graylog/web/application.log"
    reader-interval = "1s"
    content-splitter = "PATTERN"
    content-splitter-pattern = "^\\d{4}-\\d{2}-\\d{2} " // Make sure to escape the \ character!
    message-fields = {
      "tag" = "graylog.web"
    }
    outputs = "generic"
  }
}

outputs {
  collector_log {
    type = "gelf"
    host = "localhost"
    port = 12201
  }

  generic {
    type = "gelf"
    host = "localhost"
    port = 12222
  }
}

There are two outputs: One for collector logs, and another for generic GELF formatted log data. This lets us define specific extractors for various fields in the collector logs. If, later on, we wanted to add extractors for any of the log data being sent to the 'generic' endpoint, we could create a new input from within Graylog, update the collector to send the data to that endpoint, and then add the appropriate extractors.

Notification Handlers

The final piece to the playbook is to set up notification handlers to restart the collector and rsyslog.

  handlers:
    - name: restart collector
      service: name=graylog-collector state=restarted

    - name: restart rsyslog
      service: name=rsyslog state=restarted

Wrapping Up

And that's it! We now have a fully-configured Graylog all-in-one setup that is collecting and monitoring its own logs. Once the collection is set up, you can log in to the Graylog instance and start looking at the data coming in.

Obviously, the usefulness comes in once you start shipping over actual application or system log data from other instances, but with this setup in place it makes it very easy to add a collector to another instance or set up syslog forwarding in order to accomplish that.

I hope you enjoyed this series and that it has helped give an overview of how even a small website can take advantage of Graylog's log collection and management tools. Feel free to leave comments or email us with any corrections or questions!

Comments

MikeS Reply
Apr 23, 2016 8:42am
Wonderful summary, thanks!
Did I miss where you've made all of the Ansible files available for this setup?

Add Comment

Used only for Gravatar, and will remain private.