Clever Engineering Blog — Always a Student

How Clever Secures Infrastructure Secrets Using AWS SSM Parameter Store

By Ulzii Otgonbaatar on

Context

At Clever, we rely on nearly two thousand infrastructure secrets like DB access keys, API tokens, and session secret keys to provide our services to students and teachers. 

Properly securing these secrets so we don’t expose them in our various environments requires thorough engineering efforts. In fact, securing secrets is generally a hard problem for a wide variety of systems. The Vulnerability Disclosure Programs (VDP) of some high profile entities such as Twitter[1], for example, have revealed secrets leakage flaws and the list of entities that suffered from leakage of secrets only grows as time goes on. 

At Clever, we’ve had our own struggles with secrets leakage into our internal logs because Clever’s build pipeline used CloudFormation templates where secrets were expanded during build time. The downside to this approach was that these templates referred to secrets by value, which meant they were readable in the AWS console and thus could end up in logs.  We have had incidents where secrets ended up in logs. This resulted in a large effort in which we had to clean the logs and rotate the secrets to ensure the  integrity of the secrets was restored.

Architecture Design

Our design for secret storage has a couple of goals.

  1. Secrets by reference: If we solely use a secret’s reference, and if our deploy system can interpret a secret by reference, then we’re less concerned with the effects of secrets leaking through environment variables. We still have some cases where secrets have been used by value, for instance when creating and debugging the secrets, but the fewer places raw values are used the less risk of these secrets leaking.
  1. Integration with AWS:  To keep our maintenance overhead down, we wanted a system that would work well with our current deployment process. If the new system wouldn’t require us to write any additional systems to interface with this new secret store, we wouldn’t have any additional services to maintain.
  2. Versioning: We’ve sometimes encountered issues with secrets being improperly rotated by human error. With versioning we know the chain of previous secrets and can easily roll back to a known good value.
  3. Low cost: Given that we already rely on many AWS-provided free services, we wanted to architect a design that adds little to no cost. 

Two AWS services met all of our requirements, most notably allowing us to access secrets via reference: AWS Secrets Manager and AWS SSM Parameter Store. 

The secrets in both of these services can be embedded in CloudFormation templates via reference. The table below shows a comparison between the two AWS services. One of the stark differences is that AWS Secrets Manager costs $0.4 per secret whereas SSM Parameter Store allows up to 10,000 parameters per region at no cost.

Clever uses multiple AWS regions to achieve goals in reliability and resiliency. To allow applications access to their secrets, we must replicate secrets across all the regions we operate in. A potentially fatal drawback of  using SSM Paramstore is that there is no auto replication of secrets across regions. Hence, when creating or updating our secrets, we replicated them to all the regions, since the amount of secret replication we needed to perform in a given time period wasn’t very high, it fulfilled our requirements for resiliency without much burden. 

Ultimately, we settled on AWS SSM Parameter Store as it ticked all the boxes with low cost as an added bonus.

The diagram above shows how Clever’s infrastructure secrets are embedded either by raw value (the old way) or by reference during deployment (our new scheme). To make the migration process as smooth as possible, the deployment service uses a feature flag based on the application name to dynamically determine which secrets storage scheme to use. 

In our new secrets management system, a Clever engineer starts by creating a secret using an internal Command Line Interface (CLI), which puts this secret in the AWS SSM Parameter Store. She then adds a line, as shown below, in our production deployment configuration file to indicate that it is an infrastructure secret.

service/production.yml

 > DB_PASSWORD: secret://DB_PASSWORD

During deployment, our deployment service decides to pass the Amazon Resource Name (ARN) of each secret into the CloudFormation template for the corresponding ECS service such that the service can decrypt and understand the value of the secret.

Cost

As Clever engineers introduce new services and infrastructure secrets, we want the cost of secrets storage to scale really well. This required us to do back of the envelope calculations for the cost of the AWS Parameters Store.

There are two parts to the cost calculation: storage cost per secret and cost per SSM transaction, i.e. API call to AWS SSM Parameter Store. If we used AWS Secrets manager, the cost for the storage would be in the order of thousands of dollars per month.

Hosting our secrets as a standard SecureString parameter incurred no cost as AWS allows up to 10,000 parameters per region at no cost.

As for the API operation, the number of transactions scales with the number of secrets, number of deployment environments, and number of deploys per service. We initially estimated that the cost of the approach would be  $24 per month: $0.05 per 10,000 Parameter Store High Throughput API interactions for the ~ 160K transactions daily. The actual cost we are averaging is ~15$ per month.

Wrapping Up

Designing and migrating our high-value infrastructure secrets to AWS Parameter-based solution has been a step in the right direction. By using an AWS-integrated secret store with our templated deploy pipeline via AWS CloudFormation, we avoid having to pass around secret values and can rely on secret references instead. Combined with AWS IAM, we scope down which secrets are accessible in the first place. These two factors align with using secrets in AWS securely, contributing to an overall gain for all future deploys.