May 30, 2020

Blue/Green ECS Deployments with CloudFormation

Recently the CloudFormation team released a transform for CloudFormation that enables Blue/Green deployments for ECS using CodeDeploy. I’ve been using ECS for a number of years now as I find that it is a lot simpler to understand that say Kubernetes. I belive Kubernetes is more suited to a team of teams that are developing microservices and you have the necessary staff in something akin to a platforms team to maintain the Kubernetes environment and associated services. Even with services like EKS there is still a more to maintain with Kubernetes.

But this isn’t a Kubernetes vs ECS article, it’s about the new Blue/Green Transform in CloudFormation.

As I have deployed a number of services for organisations on to ECS over the past few years, I’ve developed my own approaches to solve deployments. Traditionally the simplest approach has been to use a rolling deployment approach. This took one container out at a time and replaced it with another. This was relatively easy to complete with CloudFormation. I have also utilised direct integration with tools such as Jenkins, CodeDeploy (it had native Blue/Green in late 2018) and CodePipeline.

As I love CloudFormation, it’s always a bit frustrating when you can’t do everything you want with native AWS support. Writing custom Transforms and Custom Resources is always a little bit frustrating as you know at some point it is very likely native support will be available.

Let’s take a look

To get started you can take a look at the CloudFormation documenation and find an example template to get you started.

The first thing to note is that the template as is, won’t work…

There is a reference to the ExecutionRoleArn in the TaskDefinition, but the person who wrote this used Ref, and as per the documentation for roles you will see Ref returns the role name, and not the ARN.

git diff for ecs blue/green example template

So after fixing that, I was able to deploy the test stack.

successfully deployed stack

The test stack worked as expected and I had a working nginx.

working nginx example

I edited the template and set the Image for the TaskDefinition to be amazon/amazon-ecs-sample and I got the example php application. Now I started to try and make things a bit more interesting.

I decided to change the template to be able to change the Image via a parameter, and this is where things started to go very wrong.

--- orig.yml	2020-06-02 20:26:03.000000000 +1000
+++ bluegreen.yml	2020-06-02 19:56:28.000000000 +1000
@@ -5,6 +5,10 @@
     Type: 'AWS::EC2::Subnet::Id'
   Subnet2:
     Type: 'AWS::EC2::Subnet::Id'
+  Image:
+    Type: 'String'
+    Default: 'nginxdemos/hello:latest'
+    Description: 'nginxdemos/hello:latest or amazon/amazon-ecs-sample'
 Transform:
   - 'AWS::CodeDeployBlueGreen'
 Hooks:
@@ -150,7 +154,7 @@
       ExecutionRoleArn: !GetAtt ECSTaskExecutionRole.Arn
       ContainerDefinitions:
         - Name: DemoApp
-          Image: 'nginxdemos/hello:latest'
+          Image: !Ref Image
           Essential: true
           PortMappings:
             - HostPort: 80

I change the Image parameter on a stack update and I got the following error:

Transform error

Technically this is correct. Because the Image: !Ref Image resource parameter didn’t change, the transform couldn’t recognise a change in the TaskDefinition or TaskSet, so couldn’t perform the transform.

Anyway, I decided to press on and integrate into an existing template. In this particular stack I use a set of stacks that use FnImport for some of the services. e.g. the Application Load Balancer and ECS Cluster are created in one stack and then a php application is deployed as a task+service on to the cluster and are joined to the Application Load Balancer and default listeners from another stack, not to mention the fact that the VPC and Subnets are deployed in a base networking stack.

This proved very complicated, and as of this writing I still haven’t successfully completely integrated ECS Blue/Green Deployment via CloudFormation into this set of stacks.

The documentation suggested you could integrate into an existing stack, I think it would take a bit of an effort, and if you can get a short outage or can move DNS from one ALB to another for example (from an old stack to a new stack) it woould be simpler to migrate that way.

While trying to do this for myself though, I ran into the following problem, which looks like it has something to do with the fact that I was changing the TaskDefinition mid-flight.

Transform AWS::CodeDeployBlueGreen failed with: Failed to transform template. One and only one TaskSet can be associated with an ECS Service. Found 0

Once I tore down the existing stack and deployed fresh I managed to get it deployed, but I haven’t been able to get the tasks running and attached to the ALB.

I’ve also tried to deploy an update to a stack and everytime I get the following. Not much information here to understand what’s going on.

Transform AWS::CodeDeployBlueGreen failed with: Failed to transform template. Failed to transform template

Conclusion

After many hours the transform has a lot of propmise, but there are a few rough edges that could do with smoothing out to make it something really great, and I’m sure the team will smooth these out.

Have a go for your self and let me know how you get on.

© Greg Cockburn

Powered by Hugo & Kiss.