Configuring infrastructure to run containers (i.e. Docker) in the Amazon cloud, at scale, is not trivial. Intimate knowledge of AWS-specific services such as Elastic Load Balancing, VPC Networking, and more is required to arrive at a solution that makes the best use of AWS services. Docker for AWS has simplified all of this by using Amazon’s CloudFormation to automate deployment and configuration of EC2, Auto Scaling, IAM, DynamoDB, SQS, VPC networking, Elastic Load Balancing (ELB), and Cloudwatch Logging. All of these are managed by Docker Swarm to deliver a production-ready and highly available container infrastructure.

Oh, and did I mention it can be configured in about 20 seconds by answering a few simple questions? That is the power of Docker for AWS. If you already know the Docker API, you don’t have to waste hours on another sweaty AWS API training regimen in the DevOps “weight room”, Docker for AWS is for you.

Unbeknownst, Docker for AWS only tailors itself for stateless applications. Elastic Block Storage (EBS) and Elastic File Filesystem (EFS) are left out. These are critical services that allow databases, key-value stores, legacy applications, and more to store their data outside of the container and persist it beyond the container lifecycle. Wouldn’t it be cool to have this added as an available service so even your persistent applications can be supported?

REX-Ray makes it possible to use AWS EBS volumes and has recently been containerized and integrated as a Docker 1.13 Managed Plugin extending REX-Ray’s functionality to be the most flexible volume driver. The Managed Plugin System makes it super easy to get REX-Ray installed and configured with a single command:
$ docker plugin install rexray/ebs REXRAY_PREEMPT=true EBS_ACCESSKEY=<my access keyor blank for IAM> EBS_SECRETKEY=<my secretkey or blank for IAM>
If you haven’t seen it in action yet, check out this video.

Getting the REX-Ray Amazon EBS Plugin working with Docker for AWS currently requires customizing the CloudFormation Template. Editing the template directly reaps benefits such as AutoScaling Groups that will always deploy nodes with the REX-Ray Plugin. Wouldn’t it be great to see this as a core part of Docker for AWS?

This is a technology preview and should be considered experimental. If you want to see all the steps, go to Install REX-Ray as a Plugin on Docker for AWS (Cloudformation) in {code} Labs.

There are three main things that need to be done to make a successful deployment. We will expand on these details below, but these details are purely for discussion points in order to expose the changes that might be needed to enable Docker for AWS to natively embrace EBS and EFS storage.

  1. Install the plugin on all nodes
  2. Define IAM Roles
  3. Configure Host Access to EBS Volumes

Install the Plugin on All Nodes

Both ManagerLaunchConfig1 and NodeLaunchConfig1 need the plugin install command to execute during deployment. This command will automatically install the rexray/ebs plugin which also enables it. Setting the pre-emption flag to true enables high availability of the volumes between hosts and setting the keys to ”” allow the use of IAM roles.

Define IAM Roles

The IAM role is critical to making sure the host themselves are given the proper permissions for volume administration. The documentation specifies all the permissions needed for EBS and EFS access. It’s not necessary to include the EFS roles but it’s favorable to include so it’s possible to use Amazon Elastic File Storage (EFS) when an official REX-Ray plugin is available. IAM roles can only be applied to hosts on launch and cannot be applied after creation.

Configure Host Access to EBS Volumes

The last edit required is to pin hosts being deployed into a single availability zone. By default, Docker for AWS will deploy both manager and worker nodes in the same region but across multiple availability zones. This increases the high availability because a failed node will trigger the restart of the container to a new host which may be in a different availability zone. However, a limitation of EBS is that a volume is only accessible to hosts in the same availability zone. Pre-emption is the benefit REX-Ray brings to this single zone architecture that will forcefully detach a volume from a failed host and mount it to a new host requesting access. This feature brings high availability to the container with a persistent volume. Edit both the Manager and Node Auto Scaling Group’s VPCZoneIdentifier and remove the multiple availability zones so only one remains.

Alternatively, a better way to go about making sure containers have access to their EBS volumes, that isn’t available today, would be through the use of node label metadata. A feature request (#6) has been added to have each node be assigned a label automatically based on it’s placed availability zone. At this point, hosts can be spread across zones and containers can be constrained based on the node label. If 3 of my 8 deployed nodes have been given the label az=us-east-1b, then I can create a service that constrains the container to only run on hosts in that zone:

$ docker service create --replicas 1 --mount type=volume,target=/data,source=data,volume-driver=rexray/ebs --constraint 'node.labels.az == us-east-1b' busybox.

Get started by looking at all the directions, including a complete CloudFormation template, over at Install REX-Ray as a Plugin on Docker for AWS (Cloudformation) in {code} Labs. If you want to see this added, make it known! Leave a comment or use #dockerforaws and #rexray in a tweet.

We’re excited to bring this integration to Docker for AWS and extend its capability further. We look forward to bringing this same functionality to Docker for Azure and Docker for GCE (currently in beta) when PR #372 and PR #394 of libStorage bring in support for these platforms.

**UPDATE**

Issue #6 in Docker for AWS has been addressed and now labels have been added to Beta 18. View the changelog for Edge versions

Label types include:

  • os (linux)
  • region (us-east-1, etc)
  • availability_zone (us-east-1a, etc)
  • instance_type (t2.micro, etc)
  • node_type (worker, manager)
  • Leonid Makarov

    Thanks for the post and the adjusted Docker for AWS + REX-Ray template! I tried it and it does work.
    What would be the best way to keep the template up-to-date as Docker releases new version of their Docker for AWS CloudFormation template?
    Do you plan to keep it up-to-date?

    • The goal of this was to show that it can be done. We want to see this baked in directly to the Docker for AWS template but it’s going to take a community effort. If this is something you want see, I would suggest creating an issue and start working on getting people to get some thumbs up. https://github.com/docker/for-aws

  • Stroebs

    While this is awesome, it destroys the resiliency of swarm, restricting it to a single AZ. Simply including it in the Docker for AWS template wouldn’t be enough – EFS would need to take its place at the get-go.

    • This is the nature of block storage and a limitation of EBS with Amazon. EBS must be pinned to a single AZ. Simply saying “use EFS” isn’t a viable alternative. EFS is really slow in comparison and isn’t fit for types of applications that require the slightest bit of high I/O.
      Kendrick Coleman
      kendrickcoleman.com
      @KendrickColeman
      github.com/kacole2

      • Good point – Would you say something like GlusterFS on top of EBS-backed nodes would be better? It’s a great battle every day getting teams to understand the ephemeral nature of cloud and Docker, so finding a good middle-ground would be great.
        In the interim I’m settling for this: https://hub.docker.com/r/vipconsult/moby-nfs-mount/
        The use-case of which is storing WordPress content/media.

        • I can’t speak for GlusterFS. There is a similar product from Dell EMC called ScaleIO that accomplishes the same distributed block storage filesystem but is better suited for DAS. However, in our testing, we’ve found that the networking bandwidth between EC2 hosts and EBS volumes isn’t great enough to sustain reliable performance while replicating metadata. I would assume the same would be for GlusterFS but you would need to test that on your own.

          I understand there is a will to “architect for failure” but ideally you need to architect for the application. Hacking around this would be creating some sort of cron job to copy/replicate/snap the volume to every other AZ where Swarm is running and making sure it’s been given the same name. This allows swarm to restart the container in a different AZ and mount the volume using REX-Ray. It’s going to be up to you to create the backend process to do replication between volumes at certain intervals.