Thomas Gray recently posted about how he and his team integrated with Vault for secret management using Rancher as a source of truth for authentication and authorisation. This is a follow on post which discusses how my team and I approached a similar problem.
What did we need to do?
We were building authentication and authorisation services to dish out access tokens that provide access to sensitive, personal data. Our most important secrets were arguably the private keys used to sign these tokens but we had a range of others. We needed a way to store and retrieve these secrets securely to reduce the risk of leaking sensitive data. If one of our signing keys was stolen, then a malicious party could sign access tokens without our knowledge and gain access to this data.
Our stack was made up of docker containers orchestrated by Rancher. We did not want to store secrets in source code or in the images themselves. We needed two things:
- A secure way to store secrets at rest
- A secure way for services to retrieve the secrets they needed to run
How did we do it?
Hashicorp’s Vault seemed to solve our storage problems for us. We could choose a secret-backend which was appropriate for our environment and split the master key amongst members of our devops and techops teams. Each shareholder could store their share on a hardware device and keep it on their person for when the vault needed unsealing.
Finally, in order to keep the root token safe, we removed the need for it. The root token is only used at initialisation time in order to store the secrets and set up policies for the secrets which is done via an automated script in the presence of more than one operator. Once completed, any reference to the root token is removed. If a root token is needed in the future, then a new one must be generated with a threshold number of shares which requires more than one shareholder.
The problem still remained as to how the services would retrieve the secrets. Turns out Rancher had already suggested a way forward with the secrets bridge that was in beta at the time of writing. We ended up taking a lot of inspiration from this project.
In our model there are four components: Vault, the Rancher server, a ‘secret bridge’ and a ‘secret agent’.
The services requiring secrets sat in Rancher environments Env 1 to Env N. Vault and the Rancher server sat in a Management environment and a Tools environment was used to isolate the secret bridge containers of which there was 1 for each environment.
For each host in Env 1 we have a secret agent container which is responsible for listening for Docker start events. When a services starts up, the secret agent requests a scoped token from the secret bridge for that container. The secret agents do not communicate with Vault directly.
The secret bridge has the role of requesting tokens from Vault for the service that has started up in the environment it is related to. It is able to communicate with Vault directly but does so using a scoped token which does not give it access to any secrets – it is only able to request tokens. Of course, we want to protect against the secret bridge using these tokens itself – this is where response wrapping comes in. The secret bridge requests tokens for the services that have been wrapped with a temporary token with a one time use. If the service that requires secrets receives a token that has already been used, it’s request to Vault will fail and the service can raise a security alert. Similarly, we also raise a security alert if the service has not received a token within a short time frame as this suggests the token may have been captured.
The image below shows the flow:
While it may seem that there are a lot of actors in this flow, they all have their part to play:
- the secret agent initiates the secret retrieval process
- rancher provides a way to ensure the start event is genuine
- the secret bridge retrieves a one-time token within a restricted environment
- vault wraps the services’s scoped token in a cubbyhole for one-time retrieval
Although this model helps to mitigate many of the risks in secret retrieval, it still relies on the Rancher server for authenticating the containers that start up. If Rancher becomes compromised then so might the secrets. It’s essential that access to Rancher is well controlled and is tightly secured.
The secret bridge needs to be supplied with Rancher API keys in order to verify the containers that are started. Currently we have this as a manual process but I’d like to make use of Vault’s other authentication mechanisms so that the secret bridge can retrieve the API keys from Vault itself. Watch this space.