Graylog on AWS with Ansible, Part I

An all-in-one log management solution for smaller sites

These days, developers are kind of spoilt for choice when it comes to application log management solutions. It's not a bad situation to be in, but it also means choosing the right product for your needs may come down to splitting hairs. In a previous post, I outlined a number of possibilities for log collection and analysis, but for this post I want to focus on one in particular: Graylog.

Why Graylog?

All of the other log management solutions, whether they be hosted services or deployed on your own infrastructure, are pretty solid. Graylog has a few specific advantages for smaller shops, however, due to the ease in which you can deploy and configure it, and the fact that it's available as an "all-in-one" single-package installation.

At Touchstone, we use Graylog for all of our log collection, analysis, and reporting, and we couldn't be happier with the results. For anyone that is used to Splunk, Graylog seems to come the closest to replicating its interface and architecture. Why does this matter? Check out what Graylog themselves have to say about the matter.

Getting Started

Before diving in, let's go over specifically what we want to accomplish here. I'm big on checklists, and creating an outline to organize and plan your efforts can be especially helpful when putting together new systems like this.

Provision infrastructure resources (Part I)

Define the instances, network resources and storage devices we'll need for the all-in-one setup. Ensure these resources are configured and secured appropriately, and connect everything together.

Configure the Graylog server (Part II)

Install the required packages and services, ready the storage devices, and install and configure the all-in-one Graylog package.

Set up inputs, streams, and extractors (Part III)

Import an initial set of streams and extractors and set up a few Graylog inputs to enable the service to accept and parse incoming log data appropriately.

Install the Graylog collector (Part IV)

Set up the Graylog collector to monitor its own log files and those of the Graylog server, and turn on syslog forwarding.

Since we're going to use Ansible, this outline translates nicely into its built-in system of roles, plays, and tasks. We'll define a Graylog role, and playbooks for each of the four goals outlined above. Additionally, we want to ensure that our playbooks are written to be idempotent — this allows us to run the plays safely at any time without worrying about duplicating changes.

The Playbook

Since we're using AWS, we could just provision the necessary resources manually using the AWS console or CLI. The process, however, can be error-prone and isn't easily repeatable. Automating the process of resource provisioning with Ansible gives a huge number of benefits: You can see at a glance exactly what resources are required, you can make small tweaks and deploy them in seconds without reconfiguring the whole stack, and, perhaps most importantly, you can avoid silly mistakes like forgetting to lock down your security groups.

We'll create a playbook specifically for provisioning the system, and use the built-in AWS Modules along with the AWS Dyanamic Inventory script to simplify the process.

- name: Provision graylog AWS resources
  tags: provision
  hosts: localhost
  connection: local
  gather_facts: False

Unlike most Ansible plays, this one doesn't run against any existing hosts. Instead, since we are provisioning new resources, we set it to run against our localhost and turn off gather_facts (since we don't need facts about our local machine).

IAM Roles and Users

Next, we'll begin defining the tasks for this play:

    - name: Provision IAM role and instance profile
      register: role
        iam_type: role
        name: graylog
        state: present

    - name: Provision IAM user for SMTP access
      register: user
        iam_type: user
        name: graylog
        state: present
        access_key_state: create

    - name: Add IAM policy to allow SMTP access
        iam_type: user
        iam_name: graylog
        policy_name: "SendRawEmail"
        policy_json: |
              "Statement": [
                      "Effect": "Allow",
                      "Action": "ses:SendRawEmail",
                      "Resource": "*"
        state: present

    - name: Set IAM access and secret keys for later use
        access_key: "{{ user.user_meta.access_keys[0].access_key_id if user.user_meta is defined else '' }}"
        secret_key: "{{ user.user_meta.access_keys[0].secret_access_key if user.user_meta is defined else '' }}"

The first few tasks define the IAM resources we'll be using for the Graylog system. We create an IAM role and instance profile, create a user with an access key, give the user permission to use SES to send emails via SMTP, and then use the set_fact module to store the access and secret keys for use later on.

One great thing about using Ansible this way is that you never have to generate and store the graylog user's access and secret keys anywhere. In fact, you as a developer don't ever need to know what the secret is. Ansible will tell AWS to generate the keys, store them in a fact, and we'll put them into the instance's user_data later on in the play.

Instance and Security Group

    - name: Ensure default EC2 security group
      register: default_group
        name: default
        description: default VPC security group
        region: "{{ region }}"
        purge_rules: no
        purge_rules_egress: no
        state: present

    - name: Provision EC2 security group
      register: group
        name: graylog
        description: "Graylog security group"
        region: "{{ region }}"
          - proto: tcp
            from_port: 22
            to_port: 22
            cidr_ip: "{{ ssh_ip }}"
        state: present

    - name: Tag EC2 security group
        resource: "{{ group.group_id }}"
        region: "{{ region }}"
          Name: graylog
        state: present

The next step is to provision and deploy a new EC2 instance, starting with an associated security group. Here, most of the parameters are defined via variables, like {{ region }}. For brevity's sake, I've left out the variable declaration section of the playbook, but if you're familiar with Ansible they should be self-explanatory. If not, check out the documentation.

You'll notice that the first task simply checks to make sure you have a default VPC security group defined. AWS provides this group for all default VPCs, and it comes with a built-in rule that allows all traffic among instances that have this security group assigned. This is an extremely simple way to effectively secure internal services like Graylog, so that all external (Internet) traffic is disallowed by default. It's extremely important to secure your Graylog installation like this, because the API and ElasticSearch components have no built-in authentication by default.

For organizational purposes, I've added a tag to the security group so that it shows up with a friendly name in the AWS console, but that's optional. Also, if you don't need any Graylog-specific security group rules, you can omit the graylog group entirely.

    - name: Provision EC2 instance
      register: ec2
        key_name: "{{ key_name }}"
        instance_type: "{{ instance_type }}"
        image: "{{ image_id }}"
          - "{{ group.group_id }}"
          - "{{ default_group.group_id }}"
        instance_profile_name: graylog
        region: "{{ region }}"
        zone: "{{ availability_zone }}"
        vpc_subnet_id: "{{ vpc_subnet_id }}"
        user_data: |
          echo '{{ access_key }}' > /etc/graylog/iam_access_key
          echo '{{ secret_key }}' > /etc/graylog/iam_secret_key
        wait: yes
        exact_count: 1
          Name: graylog
          Name: graylog

For the instance itself, we use the count_tag and exact_count arguments, along with instance_tags, to ensure that exactly 1 graylog instance exists. This ensures that you can run the playbook as often as required without spinning up additional instances. Also, note that we use the user_data argument to provide a script that will write the previously-generated access_key and secret_key into the /etc/graylog/ directory on startup.

A note on the instance_type variable: The all-in-one package requires a server with at least 4GB of RAM, so a t2.micro is out of the question. You'll have to use at least a t2.medium in order for all the services to work properly. Check Amazon's documentation for a list of instance types and RAM configurations.

For the image_id, you can use the latest AMI from the list published here.

Attached Storage and Networking

    - name: Attach EBS volume
      register: volume_id
        name: graylog
        instance: "{{ ec2.tagged_instances[0].id }}"
        device_name: "/dev/sdf"
        region: "{{ region }}"
        volume_type: "gp2"
        volume_size: "{{ ebs_volume_size }}"
        state: present
      when: ec2.tagged_instances is defined

Next, we define and attach a new EBS volume. This is to ensure that Graylog has enough space to store the log data in ElasticSearch, and so we don't rely on the default 8GB boot volume that comes with the AMI. Unfortunately, I haven't found a way to use the ec2_vol module to create a new volume if and only if a volume is not already attached, so I am using the when condition to only run this task if a new instance was launched. Using a separate volume like this also makes it easier to expand the storage space in the future.

As far as the volume size, choose something that fits with your needs. By default, the Graylog server is set to store 1GB per index and keep 10 indices before purging data, so with additional overhead for storage from other services, a 15GB volume works nicely.

    - name: Attach Elastic IP
      register: eip
        instance_id: "{{ ec2.tagged_instances[0].id }}"
        in_vpc: yes
        region: "{{ region }}"
        state: present
      when: ec2.tagged_instances is defined

    - name: Provision public DNS record
        command: create
        zone: "{{ dns_name }}"
        record: "graylog.{{ dns_name }}"
        type: A
        ttl: 300
        value: "{{ eip.public_ip }}"
        overwrite: yes
      when: eip.public_ip is defined

    - name: Provision private DNS record
        command: create
        zone: "{{ dns_name }}"
        private_zone: yes
        record: "graylog.{{ dns_name }}"
        type: CNAME
        ttl: 300
        value: "ec2-{{ eip.public_ip | regex_replace('\\.', '-') }}.{{ region }}"
        overwrite: yes
      when: eip.public_ip is defined

The next step is to define an Elastic IP and DNS records, so that other instances will be able to ship their logs over to the Graylog server by using a DNS name rather than a hard-coded IP address. This assumes you are using Route53 for managing DNS, which allows you to use both private and public DNS for the same domain name. In this case, we need the private DNS to allow our instances to securely send log data to Graylog within the VPC, to avoid sensitive log data from being routed over a public network. This works assuming your VPC has been set up to use Amazon's DNS, so that DNS queries from an instance can be resolved to corresponding private IPs. The public DNS is there to allow us to connect to the web interface using a DNS name.

    - name: Add public IP to group
        hostname: "{{ eip.public_ip }}"
        groups: tag_Name_graylog
      when: eip.public_ip is defined

Finally, we add the Elastic IP's address to the tag_Name_graylog group. This is done using the add_host module so that subsequent plays can reference the new instance correctly. The condition is there because this task is only needed if a new IP was defined; otherwise, the EC2 dynamic inventory script will automatically pick up the instance by its tag.

Coming Up

Running the full play should set you up with all the AWS resources you need to get started with Graylog. In the next post of this series, I'll go over installing and configuring the Graylog server we just provisioned. Stay tuned!


chris Reply
Oct 19, 2016 4:54pm
Excellent work -- can you include the variable declaration section too? It isn't immediately obvious which variables are carrying over from previous steps and which need to be specified. I know Ansible will fail gently and I can add the variables needed, but if we had a template to work from that would be awesome.

Thanks again for posting your process and sharing!
Paul Reply
Aug 10, 2017 11:18am
Can you publish your playbook on github?
Aaron Reply
Aug 10, 2017 7:52pm
Sure thing, I put together an example repo with the files mentioned throughout the series here: