Graylog on AWS with Ansible, Part III

An all-in-one log management solution for smaller sites

In the previous entries of this series of posts, I went over provisioning AWS resources to create a simple all-in-one Graylog server, and configuring the Graylog server itself. The next step to turning the logging service into something usable is to populate it with inputs, streams, and extractors to parse and index the logs appropriately.

Check out Part I for a little more context around why we chose Graylog, Ansible, and AWS.

Getting Started

Before diving in, let's go over specifically what we want to accomplish here. I'm big on checklists, and creating an outline to organize and plan your efforts can be especially helpful when putting together new systems like this.

Provision infrastructure resources (Part I)

Define the instances, network resources and storage devices we'll need for the all-in-one setup. Ensure these resources are configured and secured appropriately, and connect everything together.

Configure the Graylog server (Part II)

Install the required packages and services, ready the storage devices, and install and configure the all-in-one Graylog package.

Set up inputs, streams, and extractors (Part III)

Import an initial set of streams and extractors and set up a few Graylog inputs to enable the service to accept and parse incoming log data appropriately.

Install the Graylog collector (Part IV)

Set up the Graylog collector to monitor its own log files and those of the Graylog server, and turn on syslog forwarding.

Since we're going to use Ansible, this outline translates nicely into its built-in system of roles, plays, and tasks. We'll define a Graylog role, and playbooks for each of the four goals outlined above. Additionally, we want to ensure that our playbooks are written to be idempotent — this allows us to run the plays safely at any time without worrying about duplicating changes.

The Playbook

This playbook will focus on adding content to the graylog server. This content consists of inputs, streams, extractors, and grok patterns, each of which you can find out more about via the documentation links. We will be adding them using Graylog's built-in REST API, which makes it easy to interact with a running Graylog instance and make changes to almost every part of the system.

- name: Import content into graylog
  tags: content
  hosts: tag_Name_graylog
  remote_user: ubuntu
  become: yes

Again, as with the previous play, we will be selecting the hosts on which to run the tasks via the auto-generated AWS tag syntax. The pattern tag_KEY_VALUE is a pattern used by the AWS Dynamic Inventory script to automatically group EC2 instances. See the docs for more information. The remote user is set to ubuntu, since that is the SSH user defined in the Graylog AMI used to provision the instance. We also set the become argument to yes, to ensure that all the commands are run via sudo.

Installing Prerequisites

Since we are going to be using the REST API, the easiest way to manage this is by using the ansible URI module, which requires that httplib2 be installed on the remote machine. This is one of the few times we are going to have to make changes to a host solely to support the use of Ansible, but from a practical standpoint adding a single python module doesn't seem like too much to ask:

    - name: Ensure pip is installed
        name: python-pip
        state: present

    - name: Ensure httplib2 is installed
        name: httplib2
        state: latest

Loading the Content Pack

Once the prereqs are out of the way, it's time to load in our content. Graylog provides a very convenient feature called Content Packs that allows you to import and export what amounts to an entire Graylog setup all in one go. You can browse and download a wide variety of content packs over on the Graylog Marketplace, or you can create your own very easily, as the format is simple JSON. Take a look at the marketplace for some examples to follow if you want to create your own.

Once you have your content pack ready to go, we'll use the REST API to import it into the Graylog system:

    - name: Check for content pack
      register: bundle
        url: "http://localhost:12900/system/bundles/{{ bundle_id }}"
        user: admin
        password: "{{ admin_pass }}"
        return_content: no
        status_code: 200,404

    - name: Upload content pack
      register: upload
        url: http://localhost:12900/system/bundles
        method: POST
        HEADER_Content-Type: "application/json"
        body: "{{ lookup('template', 'content-pack.json') }}"
        user: admin
        password: "{{ admin_pass }}"
        status_code: 201
      when: bundle.status == 404
      changed_when: yes

    - name: Apply content pack
      register: apply
        url: "http://localhost:12900/system/bundles/{{ bundle_id }}/apply"
        method: POST
        user: admin
        password: "{{ admin_pass }}"
        status_code: 204
      when: bundle.status == 404
      changed_when: yes

As you can see, we make the process idempotent by first checking to see whether the content pack has been imported, and only upload and apply the pack if it does not exist. Unfortunately, there is no way to 'un-apply' a content pack, so there is no good way to apply updates if you decide to change the pack outside of Graylog. However, it's a best practice to make any changes within the Graylog UI and then export them back out as needed, and only use this playbook when you are initially setting up your Graylog server.

Final Thoughts

Once the content pack has been applied, all of its streams, inputs, extractors, and grok patterns will be available for Graylog to use. You may still need to configure or tweak them to your needs, but using content packs does go a long way toward saving time when setting up an initial cluster.

Coming up in the final installment, we'll start to push some actual log data into the system using Graylog's own Graylog Collector. Stay tuned!