Continuous Deployment

John Cox
9 min readDec 16, 2020

This is a consolidation of 4 blog posts I made when leading engineering for AngelMD. This content was originally published in 2018.

Goals

In devops, the goal is to automate everything and eliminate risk. To be more specific with regard to our team, our goals were as follows:

  1. master branch should always reflect what’s in production.
  2. Stage and production environments should be the same with the exception of data.
  3. Merging a branch into master will deploy the latest code without need for human interaction.
  4. End users should never be negatively impacted when new code is deployed.
  5. Deployments should happen in minutes and as many times per week/day/hour as necessary.
  6. We’re developers by trade. This has to be as low-maintenance as possible.

I’m happy to report that we were able to accomplish all of these goals. This is the first of a series of posts that will walk you through what we did and why. In this post we’ll talk about some of the prerequisites that we had to establish. These speak to culture, process and technology.

Prerequisites

Tests
I shouldn’t have to say this, but I know I do…Get your test coverage up AND make sure that they’re actually good tests.

If your reaction to the above statement is something like, “That increases time-to-market too much,” or “We can’t afford the investment at this point in the project,” then stop reading. You have precluded yourself from CI/CD by way of your own priorities. To put it a different way, you obviously favor spending exponentially more resources trying to pay down technical debt and throwing humans at QA over time.

Tests are your first and most-important line of defense in risk-mitigation. The term “tests” is fairly general and is used in a very general way here on purpose. No matter what kind of tests or technology we’re talking about, tests should be automatable, run in a reasonable amount of time, and have output that will determine whether or not your app should deploy. In our case we require 100% passing tests and no drop in test coverage for any given build.

Feature Flags
TL;DR: Feature flags let you deploy code without exposing a feature to end users. These have been essential when we need to validate functionality in our stage environment. Once validated, we simply flip the switch in production for end users. When you’re constantly deploying master they’re a must.

Automated Deployments
In order to meet the requirements of low-maitnenance and zero-downtime deployments, we ended up using AWS CLI and DNS updates for our React app deployments and a combination of Docker, Terraform and AWS ECS for our API. Then, of course, it all has to be orchestrated by some kind of build system. For us that’s CircleCI.

Whatever your technology tastes are, the main idea is to figure out a way to get code deployed without bouncing a server that a user is connected to. There are so many great tools out there that are cheap, if not free. Find the right tools and put them to work the way they were intended to be used.

The Build

Now that you have your goals and pre-reqs in place, it’s time to start talking about the build process. To put it very simply we look at the build process as a script that runs tests automatically any time we’re going to merge code into master and then will automatically deploy code to staging and production environments after the merge occurs. Here’s what the flow looks like:

  1. Push to a feature branch
  2. Open a pull request into master
  3. Build server runs tests and static code analysis on the feature branch
  4. If the build passes and the code review goes well, merge into master and delete the feature branch
  5. Build server runs tests and static code analysis on master
  6. If the build on master passes, deploy to stage and production environments

For the sake of this installment, we’ll focus on the details of step 3. We’ll look at step 6 for our React app and Rails app in a couple of future installments.

Build Server

We use CircleCI for our build server and general deployment coordinator. It’s simple to use for the tools that we develop with, but there are a ton of options, some free, some not, that might be better suited for your tooling. Generally speaking, here are the steps that our build server executes for a build:

  1. Spin up a container with the appropriate environment set up for running the app
  2. Install dependencies
  3. Checkout the branch in question
  4. Run the test suite (for us it’s rspec spec or yarn test)
  5. Report test coverage to static code analysis tool
  6. Tell Github whether tests passed or not

Here’s what our .circleci/config.yml looks like for that process for our Rails app:

defaults: &defaults  # our repo...get it? dont_fear, the repo...MORE COWBELL!
working_directory: ~/angelMD/dont_fear
parallelism: 1
shell: /bin/bash --login
environment:
CIRCLE_ARTIFACTS: /tmp/circleci-artifacts
CIRCLE_TEST_REPORTS: /tmp/circleci-test-results
AWS_DEFAULT_REGION: us-west-2
TERRAFORM_VER: 0.9.6
PATH: $PATH:$HOME/.local/bin:$HOME/bin
CC_TEST_REPORTER_ID: some_id
ELASTICSEARCH_VERSION: 5.3.3
TZ: "/usr/share/zoneinfo/America/Denver"
docker:
# our own base container with some dependencies baked in to speed up builds
- image: angelmd/api_test_image
auth:
username: $DOCKER_USER # stored in circleci
password: $DOCKER_PASS # stored in circleci
command: /sbin/init
environment:
BASH_ENV: /root/.bashrc
TZ: "/usr/share/zoneinfo/America/Denver"
version: 2
jobs:
build:
<<: *defaults
steps:
- run: echo 'export PATH=$PATH:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' > $BASH_ENV
- run: echo $PATH
- checkout
- run: mkdir -p $CIRCLE_ARTIFACTS $CIRCLE_TEST_REPORTS
- run:
working_directory: ~/angelMD/dont_fear
command: 'echo ''America/Denver'' | tee -a /etc/timezone; dpkg-reconfigure -f noninteractive tzdata; service postgresql restart; '
- run: gem install bundler -v 1.15.1
- run: echo -e "export RAILS_ENV=test\nexport RACK_ENV=test" >> $BASH_ENV
- restore_cache:
keys:
- gem-cache-{{ arch }}-{{ .Branch }}-{{ checksum "Gemfile.lock" }}
- gem-cache-{{ checksum "Gemfile.lock" }}
- run: 'bundle check --path=vendor/bundle || bundle install --path=vendor/bundle --jobs=4 --retry=3 '
- save_cache:
key: gem-cache-{{ arch }}-{{ .Branch }}-{{ checksum "Gemfile.lock" }}
paths:
- vendor/bundle
- run: curl -L https://codeclimate.com/downloads/test-reporter/test-reporter-latest-linux-amd64 > ./cc-test-reporter
- run: chmod +x ./cc-test-reporter
- run:
name: Running Elasticsearch
command: service redis-server start
background: true
- run:
name: Running Redis
command: /bin/su - elasticsearch -c "/elasticsearch-5.3.3/bin/elasticsearch -d"
background: true
- run: wget --waitretry=5 --retry-connrefused -v http://127.0.0.1:9200/
- run: mv config/database.build.yml config/database.yml
- run:
command: bundle exec rake db:create db:schema:load --trace
environment:
RAILS_ENV: test
RACK_ENV: test
- run: bundle exec rake db:seed
- run: mv config/application.yml.test config/application.yml
- run: ./cc-test-reporter before-build
- run: bundle exec rspec; ./cc-test-reporter after-build --exit-code $?
- store_test_results:
path: /tmp/circleci-test-results
- store_artifacts:
path: /tmp/circleci-artifacts
deploy:
<<: *defaults
steps:
- run: echo 'export PATH=$PATH:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' > $BASH_ENV
- run: echo $PATH
- checkout
- setup_remote_docker:
docker_layer_caching: true
- run: bin/build.sh
- run: bin/push.sh
- run: bin/deploy.sh
workflows:
version: 2
test-deploy:
jobs:
- build
- deploy:
filters:
branches:
only: master
requires:
- build

Static Code Analysis

If you’re not using tools that enforce code quality standards, you need to get on it. All of us get sloppy some times. It’s usually because we’re in a hurry. Having code standards that are enforced in code reviews are ok. Having them enforced by a machine is WAY better. Even better than that is when the standards are enforced by a machine and hold up a merge into master.

At AngelMD we use Code Climate for static code analysis on our Rails app and React app. It watches Github for a pull request and runs its analysis giving us a nice green check or red X depending on the result. Does that mean we do absolutely everything it tells us to do? Absolutely not. But it does catch a lot of little things that keep our code readable and maintainable over the long haul.

Test Coverage

As I mentioned above, our build server reports test coverage to Code Climate, which in-turn reports back to Github on a pull request. Once again, this is a go/no-go check for our code. We will not merge code that drops test coverage*. We also have a threshold of at least 50% coverage on any new code created in a given pull request. That’s a bear minimum that draws raised eyebrows during code reviews. 100% coverage is much preferred.

* Sometimes we do see a drop in test coverage by some small percent less than 1. Ususally that’s because the overall number of lines in the project decreased for some reason. That’s passable. We just keep an eye on the trend of our test coverage to make sure that it stays high — like > 92%.

That covers the concepts and some of the details of the build. Next we’ll take a look at deployment. We’ll also cover our infrastructure as a part of those details.

Frontend Push & Deploy

This will be a pretty short one. If you haven’t read them already, I’d recommend that you start at the beginning of this series and work your way up to this point. In this installment we’ll cover how we set up our infrastructure for our React app and how we update the app.

Infrastructure
Once compiled, a React app is just a collection of static files — Javascript, CSS, images, etc. As such we elected to just dump the files into AWS S3 and put AWS Cloudfront CDN in front of it. We use Terraform to set up our infrastructure. It also serves as the cornerstone of our disaster recovery plan.

Assume the steps below are done for two identical environments for stage and production.

Push
Once compiled, we use the AWS CLI to create a new folder in our bucket for the site. For convenience we name the folder the same as the git commit hash. We then drop our static files into that folder. Easy, right?

Deploy
Now to make the new version of the site public all we do is update our Cloudfront CDN origin using terraform apply with the new origin passed in as a variable. To make sure that Terraform has the information it needs for each environment, we have separate files with that information that get sourced before applying. We also store Terraform state in S3 since all of our build server infrastructure, which executes these steps, is ephemeral.

Backend Push & Deploy

The Push

Once the build server gets done running tests on master (and they all pass) we have it build a new Docker container. To make the configuration of CircleCI a little more concise, we put the commands in a little bash script:

HASH=$(git rev-parse HEAD)rm ./versionecho $HASH > ./versiondocker build --rm=false -t angelmd/app_name .

Notice that we also echo the git commit hash into a file called version. The reason for this is so we can know which version of the API is live at any time. When the app starts up, it grabs that hash from the version file and stores it in a constant. We then put it in a response header on every request. This helps a LOT when debugging.

Now that we have a Docker container, let’s tag it and push it to Dockerhub with another bash script:

set -euo pipefail
IFS=$'\n\t'
docker login -u $DOCKER_USER -p $DOCKER_PASS
REMOTE=angelmd
NAME=app_name
HASH=$(git rev-parse HEAD)
docker tag $REMOTE/$NAME $REMOTE/$NAME:$HASH
docker push $REMOTE/$NAME:$HASH
docker tag $REMOTE/$NAME $REMOTE/$NAME:latest
docker push $REMOTE/$NAME:latest
docker logout

The Deploy

The goal is to have nobody notice any downtime when deploying. A container scheduling service is the perfect tool for just that. We use AWS ECS, but you could use any flavor that suits you. The Kubernetes service from AWS, EKS, is pretty slick also. If you feel like standing up your own cluster, by all means.

Now that we have an updated container pushed to our container repository, all we have to do is tell our container scheduling service that we want to run a new version of the container. Once again, we turn to our old friend, Terraform, to update our infrastructure. This time it’s just a small tweak to the task definition in ECS (again in a short bash script):

export PATH="$PATH:/root/.local/bin"
export TF_VAR_docker_tag=$(git rev-parse HEAD)
export PATH="$PATH:"
cd dont-fear-service
. stage_env
terraform apply -var-file=dont-fear-service.tfvars
. prod_env
terraform apply -var-file=dont-fear-service.tfvars

The key line above is export TF_VAR_docker_tag=$(git rev-parse HEAD). This variable is picked up by Terraform and interpolated into the task definition update. Basically it tells ECS, “Hey, go grab the Docker container tagged with the latest git commit hash and deploy it to the cluster.”

YES!

And that’s it. Just wait for the cluster update to roll out, and the new code is deployed. We also wanted to have some easy indication as to when that code was spinning up, so we created a Slack incoming webhook that posts to a #deployments channel. As the application spins up, there’s an initializer that lets us know which git commit is being deployed with a link to that commit on github.

--

--