In the last post of the series about Sigstore, I will look at the most exciting part of the implementation – ephemeral keys, or what the Sigstore team calls keyless signing. The post will go over the second and third scenarios I outlined in Implementing Containers’ Secure Supply Chain with Sigstore Part 1 – Signing with Existing Keys and go deeper into the experience of validating artifacts and moving artifacts between registries.

Using Sigstore to Sign with Ephemeral Keys

Using Cosign to sign with ephemeral keys is still an experimental feature and will be released in v1.14.0 (see the following PR). Signing with ephemeral keys is relatively easy.

$ COSIGN_EXPERIMENTAL=1 cosign sign 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Generating ephemeral keys...
Retrieving signed certificate...
 Note that there may be personally identifiable information associated with this signed artifact.
 This may include the email address associated with the account with which you authenticate.
 This information will be used for signing this artifact and will be stored in public transparency logs and cannot be removed later.
 By typing 'y', you attest that you grant (or have permission to grant) and agree to have this information stored permanently in transparency logs.
Are you sure you want to continue? (y/[N]): y
Your browser will now be opened to:
https://oauth2.sigstore.dev/auth/auth?access_type=online&client_id=sigstore&code_challenge=e16i62r65TuJiklImxYFIr32yEsA74fSlCXYv550DAg&code_challenge_method=S256&nonce=2G9cB5h89SqGwYQG2ey5ODeaxO8&redirect_uri=http%3A%2F%2Flocalhost%3A33791%2Fauth%2Fcallback&response_type=code&scope=openid+email&state=2G9cB7iQ7BSXYQdKKe6xGOY2Rk8
Successfully verified SCT...
Warning: Tag used in reference to identify the image. Consider supplying the digest for immutability.
"562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample" appears to be a private repository, please confirm uploading to the transparency log at "https://rekor.sigstore.dev" [Y/N]: y
tlog entry created with index: 5133131
Pushing signature to: 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample

You are sent to authenticate using OpenID Connect (OIDC) via the browser. I used my GitHub account to authenticate.

Once authenticated, you are redirected back to localhost, where Cosign reads the code query string parameter from the URL and verifies the authentication.

Here is what the redirect URL looks like.

http://localhost:43219/auth/callback?code=z6dghpnzujzxn6xmfltyl6esa&state=2G9dbwwf9zCutX3mNevKWVd87wS

I have also pushed v2 and v3 of the image to the registry and signed them using the approach above. Here is the new state in my registry.

wdt_ID Artifact Tag Artifact Type Artifact Digest
1 v1 Image sha256:9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4
2 sha256-9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4.sig Signature sha256:483f2a30b765c3f7c48fcc93a7a6eb86051b590b78029a59b5c2d00e97281241
3 v2 Image sha256:d4d59b7e1eb7c55b0811c3dfd3571ab386afbe6d46dfcf83e06343e04ae888cb
4 sha256-d4d59b7e1eb7c55b0811c3dfd3571ab386afbe6d46dfcf83e06343e04ae888cb.sig Signature sha256:8c43d1944b4d0c3f0d7d6505ff4d8c93971ebf38fc60157264f957e3532d8fd7
5 v3 Image sha256:2e19bd9d9fb13c356c64c02c574241c978199bfa75fd0f46b62748f59fb84f0a
6 sha256:2e19bd9d9fb13c356c64c02c574241c978199bfa75fd0f46b62748f59fb84f0a.sig Signature sha256:cc2a674776dfe5f3e55f497080e7284a5bd14485cbdcf956ba3cf2b2eebc915f

If you look at the console output, you will also see that one of the lines mentions tlog in it. This is the index in Rekor transaction log where the signature’s receipt is stored. For the three signatures that I created, the indexes are:

5133131 for v1
5133528 for v2
and 5133614 for v3

That is it! I have signed my images with ephemeral keys, and I have the tlog entries that correspond to the signatures. It is a fast and easy experience.

Verifying Images Signed With Ephemeral Keys

Verifying the images signed with ephemeral keys is built into the Cosign CLI.

$ COSIGN_EXPERIMENTAL=1 cosign verify 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 | jq . > flasksample-v1-ephemeral-verification.json
Verification for 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 --
The following checks were performed on each of these signatures:
- The cosign claims were validated
- Existence of the claims in the transparency log was verified offline
- Any certificates were verified against the Fulcio roots.

The outputs from the verification of flasksample:v1, flasksample:v2, and flasksample:v3 are available on GitHub. Few things to note about the output from the verification.

  • The output JSON contains the logIndexas well as the logID, which I did assume I could use to search for the receipts in Rekor. I have some confusion about the logID purpose, but I will go into that a little later!
  • There is a body field that I assume is the actual signature. This JSON field is not yet documented and is hard to know with such a generic name.
  • The type field seems to be a free text field. I would expect it to be something more structured and the values to come from a list of possible and, most importantly, standardized types.

Search and Explore Rekor

The goal of my second scenario – Sign Container Images with Ephemeral Keys from Fulcio is not only to sign images with ephemeral keys but also to invalidate one of the signed artifacts. Unfortunately, documentation and the help output from the commands are scarce. Also, searching on Google how to invalidate a signature in Rekor yields no results. I decided to start exploring the Rekor logs to see if that may help.

There aren’t many commands that you can use in Rekor. The four things you can do are: get records; search by email, SHA or artifact; uploadentry or artifact; and verify entry or artifact. Using the information from the outputs in the previous section, I can get the entries for the three images I signed using the log indexes.

$ rekor-cli get --log-index 5133131 > flasksample-v1-ephemeral-logentry.json
$ rekor-cli get --log-index 5133528 > flasksample-v2-ephemeral-logentry.json
$ rekor-cli get --log-index 5133614 > flasksample-v3-ephemeral-logentry.json

The outputs from the above commands for flasksample:v1, flasksample:v2, and flasksample:v3 are available on GitHub.

I first noted that the log entries are not returned in JSON format by the Rekor CLI. This is different from what Cosign returns and is a bit inconsistent. Second, the log entries outputted by the Rekor CLI are not the same as the verification outputs returned by Cosign. Cosign verification output provides different information than the Rekor log entry. This begs the question: “How does Cosign get this information?” First, though, let’s see what else Rekor can give me.

I can use Rekor search to find all the log entries that I created. This will include the ones for the three images above and, theoretically, everything else I signed.

$ rekor-cli search --email toddysm_dev1@outlook.com
Found matching entries (listed by UUID):
24296fb24b8ad77aaf485c1d70f4ab76518483d5c7b822cf7b0c59e5aef0e032fb5ff4148d936353
24296fb24b8ad77a3f43ac62c8c7bab7c95951d898f2909855d949ca728ffd3426db12ff55390847
24296fb24b8ad77ac2334dfe2759c88459eb450a739f08f6a16f5fd275431fb42c693974af3d5576
24296fb24b8ad77a8f14877c718e228e315c14f3416dfffa8d5d6ef87ecc4f02f6e7ce5b1d5b4e95
24296fb24b8ad77a6828c6f9141b8ad38a3dca4787ab096dca59d0ba68ff881d6019f10cc346b660
24296fb24b8ad77ad54d6e9bb140780477d8beaf9d0134a45cf2ded6d64e4f0d687e5f30e0bb8c65
24296fb24b8ad77a888dc5890ac4f99fc863d3b39d067db651bf3324674b85a62e3be85066776310
24296fb24b8ad77a47fae5af8718673a2ef951aaf8042277a69e808f8f59b598d804757edab6a294
24296fb24b8ad77a7155046f33fdc71ce4e291388ef621d3b945e563cb29c2e3cd6f14b9ba1b3227
24296fb24b8ad77a5fc1952295b69ca8d6f59a0a7cbfbd30163c3a3c3a294c218f9e00c79652d476

Note that the result lists UUIDs that are different from the logID properties in the verification output JSON. You can get log entries using the UUID or the logIndex but not using the logID. The UUIDs are not present in the Cosign output mentioned in the previous section, while the logID is. However, it is unclear what the logID can be used for and why the UUID is not included in the Cosign output.

Rekor search command supposedly allows you to search by artifact and SHA. However, it is not documented what form those need to take. Using the image name or the image SHA yield no results.

$ rekor-cli search --artifact 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample
Error: invalid argument "562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample" for "--artifact" flag: Key: '' Error:Field validation for '' failed on the 'url' tag
$ rekor-cli search --sha sha256:9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4
no matching entries found
$ rekor-cli search --sha 9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4
no matching entries found

I think the above are the core search scenarios for container images (and other artifacts), but it seems they are either not implemented or not documented. Neither the Rekor GitHub repository, the Rekor public documentation, nor the Rekor Swagger have any more details on the search. I filed an issue for Rekor to ask how the artifacts search works.

Coming back to the main goal of invalidating a signed artifact, I couldn’t find any documentation on how to do that. The only apparent options to invalidate the artifacts are either uploading something to Rekor or removing the signature from Rekor. I looked at all options to upload entries or artifacts to Rekor, but the documentation mainly describes how to sign and upload entries using other types like SSH, X509, etc. It does seem to me that there is no capability in Rekor to say: “This artifact is not valid anymore”.

I thought that looking at how Rekor verifies signatures may help me understand the approach.

Verifying Signatures Using Rekor CLI

I decided to explore how the signatures are verified and reverse engineer the process to understand if an artifact signature can be invalidated. Rekor CLI has a verify command. My assumption was that Rekor’s verify command worked the same as the Cosign verify command. Unfortunately, that is not the case.

$ rekor-cli verify --artifact 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Error: invalid argument "562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1" for "--artifact" flag: Key: '' Error:Field validation for '' failed on the 'url' tag
$ rekor-cli verify --entry 24296fb24b8ad77a8f14877c718e228e315c14f3416dfffa8d5d6ef87ecc4f02f6e7ce5b1d5b4e95
Error: invalid argument "24296fb24b8ad77a8f14877c718e228e315c14f3416dfffa8d5d6ef87ecc4f02f6e7ce5b1d5b4e95" for "--entry" flag: Key: '' Error:Field validation for '' failed on the 'url' tag

Unfortunately, due to a lack of documentation and examples, I wasn’t able to figure out how this worked without browsing the code. While that kind of digging is always an option, I would expect an easier experience as an end user.

I was made aware of the following blog post, though. It describes how to handle account compromise. To put it in context, if my GitHub account is compromised, this blog post describes the steps I need to take to invalidate the artifacts. I do have two problems with this proposal:

  1. As you remember, in my scenario, I wanted to invalidate only the flasksample:v2 artifact, and not all artifacts signed with my account. If I follow the proposal in the blog post, I will invalidate everything signed with my GitHub account, which may result in outages.
  2. The proposal relies on the consumer of artifacts to constantly monitor the news for what is valid and what is not; which GitHub account is compromised and which one is not. This is unrealistic and puts too much manual burden on the consumer of artifacts. In an ideal scenario, I would expect the technology to solve this with a proactive way to notify the users if something is wrong rather than expect them to learn reactively.

At this point in time, I will call this scenario incomplete. Yes, I am able to sign with ephemeral keys, but this doesn’t seem unique in this situation. The ease around the key generation is what they seem to be calling attention to, and it does make signing much less intimidating to new users, but I could still generate a new SSH or GPG key every time I need to sign something. Trusting Fulcio’s root does not automatically increase my security – I would even argue the opposite. Making it easier for everybody to sign does not increase security, either. Let’s Encrypt already proved that. While Let’s Encrypt made an enormous contribution to our privacy and helped secure every small business site, the ease, and accessibility with which it works means that every malicious site now also has a certificate. The lock in the address bar is no longer a sign of security. We are all excited about the benefits, but I bet very few of us are also excited for this to help the bad guys. We need to think beyond the simple signing and ensure that the whole end-to-end experience is secure.

I will move to the last scenario now.

Promoting Sigstore Signed Images Between Registries

In the last scenario I wanted to test the promotion of images between registries. Let’s create a v4 of the image and sign it using an ephemeral key. Here are the commands with the omitted output.

$ docker build -t 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v4 .
$ docker push 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v4
$ COSIGN_EXPERIMENTAL=1 cosign sign 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v4

The Rekor log index for the signature is 5253114. I can use Crane to copy the image and the signature from AWS ECR into Azure ACR.

$ crane copy 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v4 tsmacrtestcssc.azurecr.io/flasksample:v4
$ crane copy 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:sha256-aa2690ed4a407ac8152d24017eb6955b01cbb0fc44afe170dadedc30da80640a.sig tsmacrtestcssc.azurecr.io/flasksample:sha256-aa2690ed4a407ac8152d24017eb6955b01cbb0fc44afe170dadedc30da80640a.sig

Also, let’s validate the ephemeral key signature using the image in Azure ACR.

$ COSIGN_EXPERIMENTAL=1 cosign verify tsmacrtestcssc.azurecr.io/flasksample:v4 | jq .
Verification for tsmacrtestcssc.azurecr.io/flasksample:v4 --
The following checks were performed on each of these signatures:
 - The cosign claims were validated
 - Existence of the claims in the transparency log was verified offline
 - Any certificates were verified against the Fulcio roots.

Next, I will sign the image with a key stored in Azure Key Vault and verify the signature.

$ cosign sign --key azurekms://tsm-kv-usw3-tst-cssc.vault.azure.net/sigstore-azure-test-key-ec tsmacrtestcssc.azurecr.io/flasksample:v4
Warning: Tag used in reference to identify the image. Consider supplying the digest for immutability.
Pushing signature to: tsmacrtestcssc.azurecr.io/flasksample
$ cosign verify --key azurekms://tsm-kv-usw3-tst-cssc.vault.azure.net/sigstore-azure-test-key-ec tsmacrtestcssc.azurecr.io/flasksample:v4
Verification for tsmacrtestcssc.azurecr.io/flasksample:v4 --
The following checks were performed on each of these signatures:
 - The cosign claims were validated
 - The signatures were verified against the specified public key
[{"critical":{"identity":{"docker-reference":"tsmacrtestcssc.azurecr.io/flasksample"},"image":{"docker-manifest-digest":"sha256:aa2690ed4a407ac8152d24017eb6955b01cbb0fc44afe170dadedc30da80640a"},"type":"cosign container image signature"},"optional":null}]

Everything worked as expected. This scenario was very smooth, and I was able to complete it in less than a minute.

Summary

So far, I have just scratched the surface of what the Sigstore project could accomplish. While going through the scenarios in these posts, I had a bunch of other thoughts, so I wanted to highlight a few below:

  • Sigstore is built on a good idea to leverage ephemeral keys for signing container images (and other software). However, just the ephemeral keys alone do not provide higher security if there is no better process to invalidate the signed artifacts. With traditional X509 certificates, one can use CRL (Certificate Revocation Lists) or OCSP (Online Certificate Status Protocol) to revoke certificates. Although they are critiqued a lot, the process of invalidating artifacts using ephemeral keys and Sigstore does not seem like an improvement at the moment. I look forward to the improvements in this area as further discussions happen.
  • Sigstore, like nearly all open-source projects, would benefit greatly from better documentation and consistency in the implementation. Inconsistent messages, undocumented features, myriad JSON schemas, multiple identifiers used for different purposes, variable naming conventions in JSONs, and unpredictable output from the command line tools are just a few things that can be improved. I understand that some of the implementation was driven by requirements to work with legacy registries but going forward, that can be simplified by using OCI references. The bigger the project grows, the harder it will become to fix those.
  • The experience that Cosign offers is what makes the project successful. Signing and verifying images using the legacy X.509 and the ephemeral keys is easy. Hiding the complexity behind a simple CLI is a great strategy to get adoption.

I tested Sigstore a year ago and asked myself the question: “How do I solve the SolarWinds exploit with Sigstore?” Unfortunately, Sigstore doesn’t make it easier to solve that problem yet. Having in mind my experience above, I would expect a lot of changes in the future as Sigstore matures.

Unfortunately, there is no viable alternative to Sigstore on the market today. Notary v1 (or Docker Content Trust) proved not flexible enough. Notary v2 is still in the works and has yet to show what it can do. However, the lack of alternatives does not automatically mean that we avoid the due diligence required for a security product of such importance.  Sigstore has had a great start, and this series proves to me that we’ve got a lot of work ahead of us as an industry to solve our software supply chain problems.

In my previous post, Implementing Containers’ Secure Supply Chain with Sigstore Part 1 – Signing with Existing Keys, I went over the Cosign experience of signing images with existing keys. As I concluded there, the signing was easy to achieve, with just a few hiccups here and there. It does seem that Cosign does a lot behind the scenes to make it easy. Though, after looking at the artifacts stored in the registry, I got curious of how the signatures and attestations are saved. Unfortunately, the Cosign specifications are a bit light on details, and it seems they were created after or evolved together with the implementation. Hence, I decided to go with the reverse-engineering approach to understand what is saved in the registry.

At this post’s end, I will validate the signatures using Cosign CLI and complete my first scenario.

The Mystery Behind Cosign Artifacts

First, to be able to store Cosign artifacts in a registry, you need to use an OCI-compliant registry. When Cosign signs a container image, an OCI artifact is created and pushed to the registry. Every OCI artifact has a manifest and layers. The manifest is standardized, but the layers can be anything that can be packed in a tarball. So, Cosign’s signature should be in the layer pushed to the registry, and the manifest should describe the signature artifact. For image signatures, Cosign tags the manifest with a tag that uses the following naming convention sha-<image-sha>.sig.

From the examples in my previous post, when Cosign signed the 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 image, it created a new artifact and tagged it sha256-9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4.sig. Here are the details of the image that was signed.

And here are the details of the signature artifact.

What is Inside Cosign Signature Artifact?

I was curious about what the signature artifact looks like. Using Crane, I can pull the signature artifact. All files are available in my Github test repository.

# Sign into the registry
$ aws ecr get-login-password --region us-west-2 | crane auth login --username AWS --password-stdin 562077019569.dkr.ecr.us-west-2.amazonaws.com

# Pull the signature manifest
$ crane manifest 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:sha256-9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4.sig | jq . > flasksample-v1-signature-manifest.json

# Pull the signature artifact as a tarball and unpack it into ./sigstore-signature
$ crane pull 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:sha256-9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4.sig flasksample-v1-signature.tar.gz
$ mkdir sigstore-signature
$ tar -xvf flasksample-v1-signature.tar.gz -C ./sigstore-signature
$ cd sigstore-signature/
$ ls -al
total 20
drwxrwxr-x 2 toddysm toddysm 4096 Oct 14 10:08 .
drwxrwxr-x 3 toddysm toddysm 4096 Oct 14 10:07 ..
-rw-r--r-- 1 toddysm toddysm  272 Dec 31  1969 09b3e371137191b52fdd07bdf115824b2b297a2003882e68d68d66d0d35fe1fc.tar.gz
-rw-r--r-- 1 toddysm toddysm  319 Dec 31  1969 manifest.json
-rw-r--r-- 1 toddysm toddysm  248 Dec 31  1969 sha256:00ce5fed483997c24aa0834081ab1960283ee9b2c9d46912bbccc3f9d18e335d
$ tar -xvf 09b3e371137191b52fdd07bdf115824b2b297a2003882e68d68d66d0d35fe1fc.tar.gz 
tar: This does not look like a tar archive

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

Surprisingly to me, the inner tarball (09b3e371137191b52fdd07bdf115824b2b297a2003882e68d68d66d0d35fe1fc.tar.gz) does not seem to be a tarball, although it has the proper extensions. I assumed this was the actual signature blob, but I couldn’t confirm without knowing how to manipulate that archive. Interestingly, opening the file in a simple text editor reveals that it is a plain JSON file with an .tar.gz extension. Looking into the other two files  manifest.json and sha256:00ce5fed483997c24aa0834081ab1960283ee9b2c9d46912bbccc3f9d18e335d it looks like all the files are some kind of manifests, but not very clear what for. I couldn’t find any specification explaining the content of the layer and the meaning of the files inside it.

Interestingly, Cosign offers a tool to download the signature.

$ cosign download signature 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 | jq . > flasksample-v1-cosign-signature.json

The resulting signature is available in the GitHub repository. The page linked above claims that you can verify the signature in another tool, but I couldn’t immediately find details on how to do that. I decided to leave this for some other time.

The following stand out from this experience.

  • First (and typically a red flag when I am evaluating software), why the JSON file has a tarball file extension? Normally, this is malicious practice, and it’s especially concerning in this context. I am sure it will get fixed, and an explanation will be provided now that I have filed an issue for it.
  • Why are there so many JSON files? Trying to look at all the available Cosign documentation, I couldn’t find any architectural or design papers that explain those decisions. It seems to me that those things got hacked on top of each other when a need arose. There may be GitHub issues discussing those design decisions, but I didn’t find any in a quick search.
  • The signature is not in any of the files that I downloaded. The signature is stored as an OCI annotation in the manifest called dev.cosignproject.cosign/signature. So, do I even need the rest of the artifact?
  • Last but not least, it seems that Cosign tool is the only one that understands how to work with the stored artifacts, which may result in a tool lock-in. Although there is a claim that I can verify the signature in a different tool, without specification, it will be hard to implement such a tool.

What is Inside Cosign Attestation Artifact?

Knowing how the Cosign signatures work, I would expect something similar for the attestations. The mental hierarchy I have built in my mind is the following:

+ Image
  - Image signature
  + Attestation
    - Attestation signature

Unfortunately, this is not the case. There is no separate artifact for the attestation signature. Here are the steps to download the attestation artifact.

# Pull the attestation manifest
$ crane manifest 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:sha256-9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4.att | jq . > flasksample-v1-attestation-manifest.json

# Pull the attestation artifact as a tarball and unpack it into ./sigstore-attestation
$ crane pull 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:sha256-9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4.sig flasksample-v1-attestation.tar.gz
$ mkdir sigstore-attestation
$ tar -xvf flasksample-v1-attestation.tar.gz -C ./sigstore-attestation/
sha256:728e26b36817753d90d7de8420dacf4fa1bcf746da2b54bb8c59cd047a682198
c9880779c90158a29f7a69f29c492551261e7a3936247fc75f225171064d6d32.tar.gz
ff626be9ff3158e9d2118072cd24481d990a5145d10109affec6064423d74cc4.tar.gz
manifest.json

All attestation files are available in my Github test repository. Knowing what was done for the signature, the results are somewhat what I would have expected. Also, I think I am getting a sense of the design by slowly reverse-engineering it.

The manifest.json file describes the archive. It points to the config sha256:728e26b36817753d90d7de8420dacf4fa1bcf746da2b54bb8c59cd047a682198 file and the two layers c9880779c90158a29f7a69f29c492551261e7a3936247fc75f225171064d6d32.tar.gz and ff626be9ff3158e9d2118072cd24481d990a5145d10109affec6064423d74cc4.tar.gz. I was not sure what the config file sha256:728e26b36817753d90d7de8420dacf4fa1bcf746da2b54bb8c59cd047a682198 is used for, so I decided to ignore it. The two layer JSONs (which were both JSON files, despite the tar.gz extensions) were more interesting, so I decided to dig more into them.

The first thing to note is that the layer JSONs (here and here) for the attestations have a different format from the layer JSON for the signature. While the signature seems to be something proprietary to Cosign, the attestations have a payload type application/vnd.in-toto+json, which hints at something more widely accepted. While this is not an official IANA media type, there is at least in-toto specification published. The payload looks a lot like a Base64 encoded string, so I gave it a try.

# This decodes the SLSA provenance attestation
$ cat ff626be9ff3158e9d2118072cd24481d990a5145d10109affec6064423d74cc4.tar.gz | jq -r .payload | base64 -d | jq . > ff626be9ff3158e9d2118072cd24481d990a5145d10109affec6064423d74cc4.tar.gz.payload.json

# And this decodes the SPDX SBOM
$ cat c9880779c90158a29f7a69f29c492551261e7a3936247fc75f225171064d6d32.tar.gz | jq -r .payload | base64 -d | jq . > c9880779c90158a29f7a69f29c492551261e7a3936247fc75f225171064d6d32.tar.gz.payload.json

Both files are available here and here. If I want to get the SBOM or the SLSA provenance, I need to get the  predicate value from the above JSONs, JSON decode it, and then use it if I want. I didn’t go into that because that was not part of my goals for this experiment.

Note one thing! As you remember from the beginning of the section, I expected to have signatures for the attestations, and I do! They are not where I expected though. The signatures are part of the layer JSONs (the ones with the strange extensions). If you want to extract the signatures, you need to get the signatures value from them.

# Extract the SBOM attestation signature
$ cat c9880779c90158a29f7a69f29c492551261e7a3936247fc75f225171064d6d32.tar.gz | jq -r .signatures > flasksample-v1-sbom-signatures.json

# Extract the SLSA provenance attestation signature
cat ff626be9ff3158e9d2118072cd24481d990a5145d10109affec6064423d74cc4.tar.gz | jq -r .signatures > flasksample-v1-slsa-signature.json

The SBOM signature and the SLSA provenance signature are available on GitHub. For whatever reason, the key ID is left blank.

Here are my takeaways from this experience.

  • Cosign uses a myriad of nonstandard JSON file formats to store signatures and attestations. These still need documentation and to be standardized (except the in-toto one).
  • To get to the data, I need to make several conversations from JSON to Base64 to JSON-encoded, which increases not only the computation power that I need to use but also the probability of errors and bugs, so I would recommend making that simpler.
  • All attestations are stored in a single OCI artifact, and there is no way to retrieve a single attestation based on its type. In my example, if I need to get only the SLSA provenance (785 bytes), I still need to download the SBOM, which is 1.5 MB. This is 1500 times more data than I need. The impact will be performance and bandwidth cost, and at scale, would make this solution the wrong one for me.

Verifying Signatures and Attestations with Cosign

Cosign CLI has commands for signature and attestation verification.

# This one verifies the image signature
$ cosign verify --key awskms:///61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 > sigstore-verify-signature-output.json

Verification for 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 --
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - The signatures were verified against the specified public key

# This one verifies the image attestations
$ cosign verify-attestation --key awskms:///61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 > sigstore-verify-attestation-output.json

Verification for 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 --
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - The signatures were verified against the specified public key

The output of the signature verification and the output of the attestation verification are available in Github. The signature verification is as expected. The attestation verification is more compelling, though. It is another JSON file in non-standard JSON. It seems as though the developer took the two JSONs from the blobs and concatenated them, perhaps unintentionally. Below is a screenshot of the output loaded in Visual Studio Code. Notice that there is no comma between lines 10 and 11. Also, the JSON objects are not wrapped in a JSON array. I’ve opened another issue for the non-standard JSON output from the verification command.

With this, I was able to complete my first scenario – Sign Container Images With Existing Keys Stored in a KMS.

Summary

Here is the summary for the second part of my experience:

  • It seems that the Sigstore implementation grew organically, and some specs were written after the implementation was done. Though many pieces are missing specifications and documentation, it will be hard to develop third-party tooling or even maintain the code easily until those are written. The more the project grows, the harder and slower it will be to add new capabilities, and the risk of unintended side effects and even security bugs will grow.
  • There are certain architectural choices that I would question. I have already mentioned the issue with saving all attestations in a single artifact and the numerous proprietary manifests and JSON files. I would also like to dissect the lack of separation between Cosign CLI and Cosign libraries. If they were separate, it would be easier to use them in third-party tooling to verify or sign artifacts.
  • Finally, the above two tell me that there will be a lot of incompatible changes in the product going forward. I would expect this from an MVP but not from a V1 product that I will use in production. If the team wants to move to a cleaner design and more flexible architecture, a lot of the current data formats will change. This, of course, can be hidden behind the Cosign CLI, but that means that I need to take a hard dependency on it. It will be interesting to understand the plan for verification scenarios and how Cosign can be integrated with various policies and admission controllers. Incompatible changes in the verification scenario can, unfortunately, result in production outages.

My biggest concern so far is the inconsistent approach to the implementation and the lax architectural principles and documentation of the design decisions. On the bright side, verifying the signatures and the attestations using the Cosign CLI was very easy and smooth.

In my next post, I will look at the ephemeral key signing scenario and the capabilities to revoke signed artifacts. I will also look at the last scenario that involves the promotion and re-signing of artifacts.

Photo by ammiel jr on Unsplash

Today, the secure supply chain for software is on top of mind for every CISO and enterprise leader. After the President’s Executive Order (EO), many efforts were spun off to secure the supply chain. One of the most prominent is, of course, Sigstore. I looked at Sigstore more than a year ago and was excited about the idea of ephemeral keys. I thought it might solve some common problems with signing. Like, for example, reducing the blast radius if a signing key is compromised or signing identity is stolen.

Over the past twelve months, I’ve spent a lot of time working on a secure supply chain for containers at Microsoft and gained a deep knowledge of the use cases and myriad of scenarios. At the same time, Sigstore gained popularity, and more and more companies started using it to secure their container supply chains. I’ve followed the project development and the growth in popularity. In recent weeks, I decided to take another deep look at the technology and evaluate how it will perform against a few core scenarios to secure container images against supply chain attacks.

This will be a three-part series going over the Sigstore experience for signing containers. In the first part, I will look at the experience of signing with existing long-lived keys as well as adding attestations like SBOMs and SLSA provenance documents. In the second part, I will go deeper into the artifacts created during the signing and reverse-engineer their purpose. In the third part, I will look at the signing experience with short-lived keys as well as promoting signatures between registries.

Before that, though, let’s look at some scenarios that I will use to guide my experiment.

Containers’ Supply Chain Scenarios

Every technology implementation (I believe) should start with user scenarios. Signing container images is not a complete scenario but a part of a larger experience. Below are the experiences that I would like to test as part of my experiment. Also, I will do this using the top two cloud vendors – AWS and Azure.

Sign Container Images With Existing Keys Stored in a KMS

In this scenario, I will sign the images with keys that are already stored in my cloud key management systems (ASKW KMS or Azure Key Vault). The goal here is to enable enterprises to use existing keys and key management infrastructure for signing. Many enterprises already use this legacy signing scenario for their software, so there is nothing revolutionary here except the additional artifacts.

  1. Build a v1 of a test image
  2. Push the v1 of the test image to a registry
  3. Sign the image with a key stored in a cloud KMS
  4. Generate an SBOM for the container image
  5. Sign the SBOM and push it to the registry
  6. Generate an SLSA provenance attestation
  7. Sign the SLSA provenance attestation and push it to the registry
  8. Pull and validate the SBOM
  9. Pull and validate the SLSA provenance attestation

A note! I will cheat with the SLSA provenance attestations because the SLSA tooling works better in CI/CD pipelines than with manual Docker build commands that I will use for my experiment.

Sign Container Images with Ephemeral Keys from Fulcio

In this scenario, I will test how the signing with ephemeral keys (what Sigstore calls keyless signing) improves the security of the containers’ supply chain. Keyless signing is a bit misleading because keys are still involved in generating the signature. The difference is that the keys are generated on-demand by Fulcio and have a short lifespan (I believe 10 min or so). I will not generate SBOMs and SLSA provenance attestations for this second scenario, but you can assume that this may also be part of it in a real-life application. Here is what I will do:

  1. Build a v1 of a test image
  2. Push the v1 of the test image to a registry
  3. Sign the image with an ephemeral key
  4. Build a v2 of the test image and repeat steps 2 and 3 for it
  5. Build a v3 of the test image and repeat steps 2 and 3 for it
  6. Invalidate the signature for v2 of the test image

The premise of this scenario is to test a temporary exploit of the pipeline. This is what happened with SolarWinds Supply Chain Compromise, and I would like to understand how we might be able to use Sigstore to prevent such an attack in the future or how it could reduce the blast radius. I don’t want to invalidate the signatures for v1 and v3 because this will be similar to the traditional signing approach with long-lived keys.

Acquire OSS Container Image and Re-Sign for Internal Use

This is a common scenario that I’ve heard from many customers. They import images from public registries, verify them, scan them, and then want to re-sign them with internal keys before allowing them for use. So, here is what I will do:

  1. Build an image
  2. Push it to the registry
  3. Sign it with an ephemeral key
  4. Import the image and the signature from one registry (ECR) into another (ACR)
    Those steps will simulate importing an image signed with an ephemeral key from an OSS registry like Docker Hub or GitHub Container Registry.
  5. Sign the image with a key from the cloud KMS
  6. Validate the signature with the cloud KMS certificate

Let’s get started with the experience.

Environment Set Up

To run the commands below, you will need to have AWS and Azure accounts. I have already created container registries and set up asymmetric keys for signing in both cloud vendors. I will not go over the steps for setting those up – you can follow the vendor’s documentation for that. I have also set up AWS and Azure CLIs so I can sign into the registries, run other commands against the registries and retrieve the keys from the command line. Once again, you can follow the vendor’s documentation to do that. Now, let’s go over the steps to set up Sigstore tooling.

Installing Sigstore Tooling

To go over the scenarios above, I will need to install the Cosign and Rekor CLIs. Cosign is used to sign the images and also interacts with Fulcio to obtain the ephemeral keys for signing. Rekor is the transparency log that keeps a record of the signatures done by Cosign using ephemeral keys.

When setting up automation for either signing or signature verification, you will need to install Cosign only as a tool. If you need to add or retrieve Rekor records that are not related to signing or attestation, you will need to install Rekor CLI.

You have several options to install Cosign CLI; however, the only documented option to install Rekor CLI is using Golang or building from source (for which you need Golang). One note: the installation instructions for all Sigstore tools are geared toward Golang developers.

The next thing is that on the Sigstore documentation site, I couldn’t find information on how to verify that the Cosign binaries I installed were the ones that Sigstore team produced. And the last thing that I noticed after installing the CLIs is the details I got about the binaries. Running cosign version and rekor-cli version gives the following output.

$ cosign version
  ______   ______        _______. __    _______ .__   __.
 /      | /  __  \      /       ||  |  /  _____||  \ |  |
|  ,----'|  |  |  |    |   (----`|  | |  |  __  |   \|  |
|  |     |  |  |  |     \   \    |  | |  | |_ | |  . `  |
|  `----.|  `--'  | .----)   |   |  | |  |__| | |  |\   |
 \______| \______/  |_______/    |__|  \______| |__| \__|
cosign: A tool for Container Signing, Verification and Storage in an OCI registry.

GitVersion:    1.13.0
GitCommit:     6b9820a68e861c91d07b1d0414d150411b60111f
GitTreeState:  "clean"
BuildDate:     2022-10-07T04:37:47Z
GoVersion:     go1.19.2
Compiler:      gc
Platform:      linux/amd64Sigstore documentation site
$ rekor-cli version
  ____    _____   _  __   ___    ____             ____   _       ___
 |  _ \  | ____| | |/ /  / _ \  |  _ \           / ___| | |     |_ _|
 | |_) | |  _|   | ' /  | | | | | |_) |  _____  | |     | |      | |
 |  _ <  | |___  | . \  | |_| | |  _ <  |_____| | |___  | |___   | |
 |_| \_\ |_____| |_|\_\  \___/  |_| \_\          \____| |_____| |___|
rekor-cli: Rekor CLI

GitVersion:    v0.12.2
GitCommit:     unknown
GitTreeState:  unknown
BuildDate:     unknown
GoVersion:     go1.18.2
Compiler:      gc
Platform:      linux/amd64

Cosign CLI provides details about the build of the binary, Rekor CLI does not. Using the above process to install the binaries may seem insecure, but this seems to be by design, as explained in Sigstore Issue #2300: Verify the binary downloads when installing from .deb (or any other binary release).

Here is the catch, though! I looked at the above experience as a novice user going through the Sigstore documentation. Of course, as with any other technical documentation, this one is incomplete and not updated with the implementation. There is no documentation on how to verify the Cosign binary, but there is one describing how to verify Rekor binaries. If you go to the Sigstore Github organization and specifically to the Cosign and Rekor release pages, you will see that they’ve published the signatures and the SBOMs for both tools. You will also find binaries for Rekor that you can download. So you can verify the signature of the release binaries before installing. Here is what I did for Rekor CLI version that I had downloaded:

$ COSIGN_EXPERIMENTAL=1 cosign verify-blob \
    --cert https://github.com/sigstore/rekor/releases/download/v0.12.2/rekor-cli-linux-amd64-keyless.pem \
    --signature https://github.com/sigstore/rekor/releases/download/v0.12.2/rekor-cli-linux-amd64-keyless.sig \
    https://github.com/sigstore/rekor/releases/download/v0.12.2/rekor-cli-linux-amd64

tlog entry verified with uuid: 38665ab8dc42600de87ed9374e86c83ac0d7d11f1a3d1eaf709a8ba0d9a7e781 index: 4228293
Verified OK

Verifying the Cosign binary is trickier, though, because you need to have Cosign already installed to verify it. Here is the output if you already have Cosign installed and you want to move to a newer version:

$ COSIGN_EXPERIMENTAL=1 cosign verify-blob \
    --cert https://github.com/sigstore/cosign/releases/download/v1.13.0/cosign-linux-amd64-keyless.pem 
    --signature https://github.com/sigstore/cosign/releases/download/v1.13.0/cosign-linux-amd64-keyless.sig 
    https://github.com/sigstore/cosign/releases/download/v1.13.0/cosign-linux-amd64

tlog entry verified with uuid: 6f1153edcc399b22b016709a218127fc7d5e9fb7071cd4812a9847bf13f65190 index: 4639787
Verified OK

If you are installing Cosign for the first time and downloading the binaries from the release page, you can follow a process similar to the one for verifying Rekor releases. I have submitted an issue to update the Cosign documentation with release verification instructions.

I would rate the installation experience no worse than any other tool geared toward hardcore engineers.

Let’s get into the scenarios.

Using Cosign to Sign Container Images with a KMS Key

Here are the two images that I will use for the first scenario:

$ docker images
REPOSITORY                                                 TAG       IMAGE ID       CREATED         SIZE
562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample   v1        b40ba874cb57   2 minutes ago   138MB
tsmacrtestcssc.azurecr.io/flasksample                      v1        b40ba874cb57   2 minutes ago   138MB

Using Cosign With a Key Stored in AWS KMS

Let’s go over the AWS experience first.

# Sign into the registry
$ aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 562077019569.dkr.ecr.us-west-2.amazonaws.com
Login Succeeded

# And push the image after that
$ docker push 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1

Signing the container image with the AWS key was relatively easy. Though, be careful when you omit the host and make sure you add that third backslash; otherwise, you will get errors. Here is what I got on the first attempt, which puzzled me a little.

$ cosign sign --key awskms://61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Error: signing [562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1]: getting signer: reading key: kms get: kms specification should be in the format awskms://[ENDPOINT]/[ID/ALIAS/ARN] (endpoint optional)
main.go:62: error during command execution: signing [562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1]: getting signer: reading key: kms get: kms specification should be in the format awskms://[ENDPOINT]/[ID/ALIAS/ARN] (endpoint optional)

$ cosign sign --key awskms://arn:aws:kms:us-west-2:562077019569:key/61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Warning: Tag used in reference to identify the image. Consider supplying the digest for immutability.
Error: signing [562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1]: recursively signing: signing digest: getting fetching default hash function: getting public key: operation error KMS: GetPublicKey, failed to parse endpoint URL: parse "https://arn:aws:kms:us-west-2:562077019569:key": invalid port ":key" after host
main.go:62: error during command execution: signing [562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1]: recursively signing: signing digest: getting fetching default hash function: getting public key: operation error KMS: GetPublicKey, failed to parse endpoint URL: parse "https://arn:aws:kms:us-west-2:562077019569:key": invalid port ":key" after host

Of course, when I typed the URIs correctly, the image was signed, and the signature got pushed to the registry.

$ cosign sign --key awskms:///61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Warning: Tag used in reference to identify the image. Consider supplying the digest for immutability.
Pushing signature to: 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample

Interestingly, I didn’t get the tag warning when using the Key ID incorrectly. I got it when I used the ARN incorrectly as well as when I used the Key ID correctly. Also, I struggled to interpret the error messages, which made me wonder about the consistency of the implementation, but I will cover more about that in the conclusions.

One nice thing was that I was able to copy the Key ID and the Key ARN and directly paste them into the URI without modification. Unfortunately, this was not the case with Azure Key Vault 🙁 .

Using Cosign to Sign Container Images With Azure Key Vault Key

According to the Cosign documentation, I had to set three environment variables to use keys stored in Azure Key Vault. It looks as if service principal is the only authentication option that Cosign implemented. So, I created one and gave it all the necessary permissions to Key Vault. I’ve also set the required environment variables with the service principal credentials.

As I hinted above, my first attempt to sign with a key stored in Azure Key Vault failed. Unlike the AWS experience, copying the key identifier from the Azure Portal and pasting it into the URI (without the https:// part) won’t do the job.

$ cosign sign --key azurekms://tsm-kv-usw3-tst-cssc.vault.azure.net/keys/sigstore-azure-test-key/91ca3fb133614790a51fc9c04bd96890 tsmacrtestcssc.azurecr.io/flasksample:v1
Error: signing [tsmacrtestcssc.azurecr.io/flasksample:v1]: getting signer: reading key: kms get: kms specification should be in the format azurekms://[VAULT_NAME][VAULT_URL]/[KEY_NAME]
main.go:62: error during command execution: signing [tsmacrtestcssc.azurecr.io/flasksample:v1]: getting signer: reading key: kms get: kms specification should be in the format azurekms://[VAULT_NAME][VAULT_URL]/[KEY_NAME]

If you decipher the help text that you get from the error message: kms specification should be in the format azurekms://[VAULT_NAME][VAULT_URL]/[KEY_NAME], you would assume that there are two ways to construct the URI:

  1. Using the key vault name and the key name like this
    azurekms://tsm-kv-usw3-tst-cssc/sigstore-azure-test-key
    The assumption is that Cosign automatically appends .vault.azure.net at the end.
  2. Using the key vault hostname (not URL or identifier) and the key name like this
    azurekms://tsm-kv-usw3-tst-cssc.vault.azure.net/sigstore-azure-test-key

The first one just hung for minutes and did not complete. I’ve tried it several times, but the behavior was consistent.

$ cosign sign --key azurekms://tsm-kv-usw3-tst-cssc/sigstore-azure-test-key tsmacrtestcssc.azurecr.io/flasksample:v1
Warning: Tag used in reference to identify the image. Consider supplying the digest for immutability.
^C
$

I assume the problem is that it tries to connect to a host named tsm-kv-usw3-tst-cssc , but it seems that it was not timing out. The hostname one brought me a step further. It seems that the call to Azure Key Vault was made, and I got the following error:

$ cosign sign --key azurekms://tsm-kv-usw3-tst-cssc.vault.azure.net/sigstore-azure-test-key tsmacrtestcssc.azurecr.io/flasksample:v1
Warning: Tag used in reference to identify the image. Consider supplying the digest for immutability.
Error: signing [tsmacrtestcssc.azurecr.io/flasksample:v1]: recursively signing: signing digest: signing the payload: keyvault.BaseClient#Sign: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="Forbidden" Message="The user, group or application 'appid=04b07795-xxxx-xxxx-xxxx-02f9e1bf7b46;oid=f4650a81-f57d-4fb3-870c-e84fe859f68a;numgroups=1;iss=https://sts.windows.net/08c1c649-bfdd-439e-8e5b-5ff31c72ce4e/' does not have keys sign permission on key vault 'tsm-kv-usw3-tst-cssc;location=westus3'. For help resolving this issue, please see https://go.microsoft.com/fwlink/?linkid=2125287" InnerError={"code":"ForbiddenByPolicy"}
main.go:62: error during command execution: signing [tsmacrtestcssc.azurecr.io/flasksample:v1]: recursively signing: signing digest: signing the payload: keyvault.BaseClient#Sign: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="Forbidden" Message="The user, group or application 'appid=04b07795-8ddb-461a-bbee-02f9e1bf7b46;oid=f4650a81-f57d-4fb3-870c-e84fe859f68a;numgroups=1;iss=https://sts.windows.net/08c1c649-bfdd-439e-8e5b-5ff31c72ce4e/' does not have keys sign permission on key vault 'tsm-kv-usw3-tst-cssc;location=westus3'. For help resolving this issue, please see https://go.microsoft.com/fwlink/?linkid=2125287" InnerError={"code":"ForbiddenByPolicy"}

Now, this was a very surprising error. And mainly because the AppId from the error message (04b07795-xxxx-xxxx-xxxx-02f9e1bf7b46) didn’t match the AppId (or Client ID) of the environment variable that I have set as per the Cosign documentation.

$ echo $AZURE_CLIENT_ID
a59eaa16-xxxx-xxxx-xxxx-dca100533b89

Note that I masked parts of the IDs for privacy reasons.

My first assumption was that the AppId from the error message was for my user account, with which I signed in using Azure CLI. This assumption turned out to be true. Not knowing the intended behavior, I filed an issue for the Sigstore team to clarify and document the Azure Key Vault authentication behavior. After restarting the terminal (it seems to restart is the norm in today’s software products 😉 ), I was able to move another step forward. Now, having only signed in with the service principal credentials, I got the following error:

$ cosign sign --key azurekms://tsm-kv-usw3-tst-cssc.vault.azure.net/sigstore-azure-test-key tsmacrtestcssc.azurecr.io/flasksample:v1
Warning: Tag used in reference to identify the image. Consider supplying the digest for immutability.
Error: signing [tsmacrtestcssc.azurecr.io/flasksample:v1]: recursively signing: signing digest: signing the payload: keyvault.BaseClient#Sign: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="BadParameter" Message="Key and signing algorithm are incompatible. Key https://tsm-kv-usw3-tst-cssc.vault.azure.net/keys/sigstore-azure-test-key/91ca3fb133614790a51fc9c04bd96890 is of type 'RSA', and algorithm 'ES256' can only be used with a key of type 'EC' or 'EC-HSM'."
main.go:62: error during command execution: signing [tsmacrtestcssc.azurecr.io/flasksample:v1]: recursively signing: signing digest: signing the payload: keyvault.BaseClient#Sign: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="BadParameter" Message="Key and signing algorithm are incompatible. Key https://tsm-kv-usw3-tst-cssc.vault.azure.net/keys/sigstore-azure-test-key/91ca3fb133614790a51fc9c04bd96890 is of type 'RSA', and algorithm 'ES256' can only be used with a key of type 'EC' or 'EC-HSM'."

Apparently, I have generated an incompatible key! Note that RSA keys are not supported by Cosign, as I documented in the following Sigstore documentation issue. After generating a new key, the signing finally succeeded.

$ cosign sign --key azurekms://tsm-kv-usw3-tst-cssc.vault.azure.net/sigstore-azure-test-key-ec tsmacrtestcssc.azurecr.io/flasksample:v1
Warning: Tag used in reference to identify the image. Consider supplying the digest for immutability.
Pushing signature to: tsmacrtestcssc.azurecr.io/flasksample

OK! I was able to get through the first three steps of Scenario 1: Sign Container Images With Existing Keys from KMS. Next, I will add some other artifacts to the image – aka attestations. I will use only one of the cloud vendors for that because I don’t expect differences in the experience.

Adding SBOM Attestation With Cosign

Using Syft, I can generate an SBOM for the container image that I have built. Then I can use Cosign to sign and push the SBOM to the registry. Keep in mind that you need to be signed into the registry to generate the SBOM. Below are the steps to generate the SBOM (nothing to do with Cosign). The SBOM generated is also available in my Github test repo.

# Sign into AWS ERC
$ aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 562077019569.dkr.ecr.us-west-2.amazonaws.com

# Generate the SBOM
$ syft packages 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 -o spdx-json > flasksample-v1.spdx

Cosign CLI’s help shows the following message how to add an attestation to an image using AWS KMS key.

cosign attest --predicate <FILE> --type <TYPE> --key awskms://[ENDPOINT]/[ID/ALIAS/ARN] <IMAGE>

When I was running this test, there was no explanation of what the --type <TYPE> parameter was. I decided just to give it a try.

$ cosign attest --predicate flasksample-v1.spdx --type sbom --key awskms:///arn:aws:kms:us-west-2:562077019569:key/61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Error: signing 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1: invalid predicate type: sbom
main.go:62: error during command execution: signing 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1: invalid predicate type: sbom

Trying spdx-json as a type also doesn’t work. There were a couple of places here and here, where Cosign documentation spoke about custom predicate types, but none of the examples showed how to use the parameter. I decided to give it one last try.

$ cosign attest --predicate flasksample-v1.spdx --type "cosign.sigstore.dev/attestation/v1" --key awskms:///arn:aws:kms:us-west-2:562077019569:key/61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Error: signing 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1: invalid predicate type: cosign.sigstore.dev/attestation/v1
main.go:62: error during command execution: signing 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1: invalid predicate type: cosign.sigstore.dev/attestation/v1

Obviously, this was not yet documented, and it was not clear what values could be provided for it. Here the issue asking to clarify the purpose of the --type <TYPE> parameter. From the documentation examples, it seemed that this parameter could be safely omitted. So, I gave it a shot! Running the command without the parameter worked fine and pushed the attestation to the registry.

$ cosign attest --predicate flasksample-v1.spdx --key awskms:///arn:aws:kms:us-west-2:562077019569:key/61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Using payload from: flasksample-v1.spdx

One thing that I noticed with the attestation experience is that it pushed a single artifact with .att at the end of the tag. I will come back to this in the next post. Now, let’s push the SLSA attestation for this image.

Adding SLSA Attestation With Cosign

As I mentioned above, I will cheat with the SLSA attestation because I do all those steps manually and docker build doesn’t generate SLSA provenance. I will use this sample for the SLSA provenance attestation.

$ cosign attest --predicate flasksample-v1.slsa --key awskms:///arn:aws:kms:us-west-2:562077019569:key/61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Using payload from: flasksample-v1.slsa

Cosign did something, as we can see on the console as well as in the registry – the digest of the .att artifact changed.

The question, though, is what exactly happened?

In the next post of the series, I will go into detail about what is happening behind the scenes, where I will look deeper at the artifacts created by Cosign.

Summary

To summarize my experience so far, here is what I think.

  • As I mentioned above, the installation experience for the tools is no worse than any other tool targeted to engineers. Improvements in the documentation would be beneficial for the first-use experience, and I filed a few issues to help with that.
  • Signing with a key stored in AWS was easy and smooth. Unfortunately, the implementation followed the same pattern for Azure Key Vault. I think it would be better to follow the patterns for the specific cloud vendor. There is no expectation that each cloud vendor will follow the same naming, URI, etc. patterns; changing those may result in more errors than benefits for the user.
  • While Cosign hides a lot of the complexities behind the scenes, providing some visibility into what is happening will be good. For example, if you create a key in Azure Key Vault, Cosign CLI will automatically create a key that it supports. That will avoid the issue I encountered with the RSA keys, but it may not be the main scenario used in the enterprise.

Next time, I will spend some time looking at the artifacts created by Cosign and understanding their purpose, as well as how to verify those using Cosign and the keys stored in the KMS.

In my last post, Implementing Quarantine Pattern for Container Images, I wrote about how to implement a quarantine pattern for container images and how to use policies to prevent the deployment of an image that doesn’t meet certain criteria. In that post, I also mentioned that the quarantine flag (not to be confused with the quarantine pattern 🙂) has certain disadvantages. Since then, Steve Lasker has convinced me that the quarantine flag could be useful in certain scenarios. Many of those scenarios are new and will play a role in the containers’ secure supply chain improvements. Before we look at the scenarios, let’s revisit how the quarantine flag works.

What is the Container Image Quarantine Flag?

As you remember from the previous post, the quarantine flag is set on an image at the time the image is pushed to the registry. The expected workflow is shown in the flow diagram below.

The quarantine flag is set on the image for as long as the Quarantine Processor completes the actions and removes the image from quarantine. We will go into detail about what those actions can be later on in the post. The important thing to remember is that, while in quarantine, the image can be pulled only by the Quarantine Processor. Neither the Publisher nor the Consumer or other actor should be able to pull the image from the registry while in quarantine. The way this is achieved is through special permissions that are assigned to the Quarantine Processor that the other actors do not have. Such permissions can be quarantine pull, and quarantine push, and should allow pulling artifacts from and pushing artifacts to the registry while the image is in quarantine.

Inside the registry, you will have a mix of images that are in quarantine and images that are not. The quarantined ones can only be pulled by the Quarantine Processor, while others can be pulled by anybody who has access to the registry.

Quarantining images is a capability that needs to be implemented in the container registry. Though this is not a standard capability, very few, if any, container registries implement it. Azure Container Registry (ACR) has a quarantine feature that is in preview. As explained in the previous post, the quarantine flag’s limitations are still valid. Mainly, those are:

  • If you need to have more than one Quarantine Processor, you need to figure out a way to synchronize their operations. The Quarantine Processor who completes the last action should remove the quarantine flag.
  • Using asynchronous processing is hard to manage. The Quarantine Processor manages all the actions and changes the flag. If you have an action that requires asynchronous processing, the Quarantine Processor needs to wait for the action to complete to evaluate the result and change the flag.
  • Last, you should not set the quarantine flag once you remove it. If you do that, you may break a lot of functionality and bring down your workloads. The problem is that you do not have the granularity of control over who can and cannot pull the image except for giving them the Quarantine Processor role.

With all that said, though, if you have a single Quarantine Processor, the quarantine flag can be used to prepare the image for use. This can be very helpful in the secure supply chain scenarios for containers, where the CI/CD pipelines do not only push the images to the registries but also produce additional artifacts related to the images. Let’s look at a new build scenario for container images that you may want to implement.

Quarantining Images in the CI/CD Pipeline

The one place where the quarantine flag can prove useful is in the CI/CD pipeline used to produce a compliant image. Let’s assume that for an enterprise, a compliant image is one that is signed, has an SBOM that is also signed, and passed a vulnerability scan with no CRITICAL or HIGH severity vulnerabilities. Here is the example pipeline that you may want to implement.

In this case, the CI/CD agent is the one that plays the Quarantine Processor role and manages the quarantine flag. As you can see, the quarantine flag is automatically set in step 4 when the image is pushed. Steps 5, 6, 7, and 8 are the different actions performed on the image while it is in quarantine. While those actions are not complete, the image should not be pullable by any consumer. For example, some of those actions, like the vulnerability scan, may take a long time to complete. You don’t want a developer to accidentally pull the image before the vulnerability scan is done. If one of those actions fails for any reason, the image should stay in quarantine as non-compliant.

Protecting developers from pulling non-compliant images is just one of the scenarios that a quarantine flag can help with. Another one is avoiding triggers for workflows that are known to fail if the image is not compliant.

Using Events to Trigger Image Workflows

Almost every container registry has an eventing mechanism that allows you to trigger workflows based on events in the registry. Typically, you would use the image push event to trigger the deployment of your image for testing or production. In the above case, if your enterprise has a policy for only deploying images with signatures, SBOMs, and vulnerability reports, your deployment will fail if the deployment is triggered right after step 4. The deployment should be triggered after step 9, which will ensure that all the required actions on the image are performed before the deployment starts.

To avoid the triggering of the deployment, the image push event should be delayed till after step 9. A separate event quarantine push can be emitted in step 4 that can be used to trigger actions related to the quarantine of the image. Note of caution here, though! As we mentioned previously, synchronizing multiple actors who can act on the quarantine flag can be tricky. If the CI/CD pipeline is your Quarantine Processor, you may feel tempted to use the quarantine push event to trigger some other workflow or long-running action. An example of such action can be an asynchronous malware scanning and detonation action, which cannot be run as part of the CI/CD pipeline. The things to be aware of are:

  • To be able to pull the image, the malware scanner must also have the Quarantine Processor role assigned. This means that you will have more than one concurrent Quarantine Processor acting on the image.
  • The Quarantine Processor that finishes first will remove the quarantine flag or needs to wait for all other Quarantine Processors to complete. This, of course, adds complexity to managing the concurrency and various race conditions.

I would strongly suggest that you have only one Quarantine Processor and use it to manage all activities from it. Else, you can end up with inconsistent states of the images that do not meet your compliance criteria.

When Should Events be Fired?

We already mentioned in the previous section the various events you may need to implement in the registry:

  • A quarantine push event is used to trigger workflows that are related to images in quarantine.
  • An image push event is the standard event triggered when an image is pushed to the registry.

Here is a flow diagram of how those events should be fired.

This flow offers a logical sequence of events that can be used to trigger relevant workflows. The quarantine workflow should be trigerred by the quarantine push event, while all other workflows should be triggered by the image push event.

If you look at the current implementation of the quarantine feature in ACR, you will notice that both events are fired if the registry quarantine is not enabled (note that the feature is in preview, and functionality may change in the future). I find this behavior confusing. The reason, albeit philosophical, is simple – if the registry doesn’t support quarantine, then it should not send quarantine push events. The behavior should be consistent with any other registry that doesn’t have quarantine capability, and only the image push event should be fired.

What Data Should the Events Contain?

The consumers of the events should be able to make a decision on how to proceed based on the information in the event. The minimum information that needs to be provided in the event should be:

  • Timestamp
  • Event Type: quarantine or push
  • Repository
  • Image Tag
  • Image SHA
  • Actor

This information will allow the event consumers to subscribe to registry events and properly handle them.

Audit Logging for Quarantined Images

Because we are discussing a secure supply chain for containers, we should also think about traceability. For quarantine-enabled registries, a log message should be added at every point the status of the image is changed. Once again, this is something that needs to be implemented by the registry, and it is not standard behavior. At a minimum, you should log the following information:

  • When the image is put into quarantine (initial push)
    • Timestamp
    • Repository
    • Image Tag
    • Image SHA
    • Actor/Publisher
  • When the image is removed from quarantine (quarantine flag is removed)
    Note: if the image is removed from quarantine, the assumption is that is passed all the quarantine checks.

    • Timestamp
    • Repository
    • Image Tag
    • Image SHA
    • Actor/Quarantine Processor
    • Details
      Details can be free-form or semi-structured data that can be used by other tools in the enterprise.

One question that remains is whether a message should be logged if the quarantine does not pass after all actions are completed by the Quarantine Processor. It would be good to get the complete picture from the registry log and understand why certain images stay in quarantine forever. On the other side, though, the image doesn’t change its state (it is in quarantine anyway), and the registry needs to provide an API just to log the message. Because the API to remove the quarantine is not a standard OCI registry API, a single API can be provided to both remove the quarantine flag and log the audit message if the quarantine doesn’t pass. ACR quarantine feature uses the custom ACR API to do both.

Summary

To summarize, if implemented by a registry, the quarantine flag can be useful in preparing the image before allowing its wider use. The quarantine activities on the image should be done by a single Quarantine Processor to avoid concurrency and inconsistencies in the registry. The quarantine flag should be used only during the initial setup of the image before it is released for wider use. Reverting to a quarantine state after the image is published for wider use can be dangerous due to the lack of granularity for actor permissions. Customized policies should continue to be used for images that are published for wider use.

While working on a process of improving the container secure supply chain, I often need to go over the current challenges of patching container vulnerabilities. With the introduction of Automatic VM Patching, having those conversations are even more challenging because there is always the question: “Why can’t we patch containers the same way we patch VMs?” Really, why can’t we? First, let’s look at how VM and container workloads differ.

How do VM and Container Workloads Differ?

VM-based applications are considered legacy applications, and VMs fall under the category of Infrastructure-as-a-Service (IaaS) compute services. One of the main characteristics of IaaS compute services is the persistent local storage that can be used to save data on the VM. Typically the way you use the VMs for your application is as follows:

  • You choose a VM image from the cloud vendor’s catalog. The VM image specifies the OS and its version you want to run on the VM.
  • You create the VM from that image and specify the size of the VM. The size includes the vCPUs, memory, and persistent storage to be used by the VM.
  • You install the additional software you need for your application on the VM.

From this point onward, the VM workload state is saved to the persistent storage attached to the VM. Any changes to the OS (like patches) are also committed to the persistent storage, and next time the VM workload needs to be spun up, those are loaded from there. Here are the things to remember for VM-based workloads:

  • VM image is used only once when the VM workload is created.
  • Changes to the VM workload are saved to the persistent storage; the next time the VM is started, those changes are automatically loaded.
  • If a VM workload is moved to a different hardware, the changes will still be loaded from the persistent storage.

How do containers differ, though?

Whenever a new container workload is started, the container image is used to create the container (similar to the VM). If the container workload is stopped and started on the same VM or hardware, any changes to the container will also be automatically loaded. However, because orchestrators do not know whether the new workload will end up on the same VM (or hardware due to resource constraints), they do not stop but destroy the containers, and if a new one needs to be spun up, they use the container image again to create it.

That is a major distinction between VMs and containers. While the VM image is used only once when the VM workload is created, the container images are used repeatedly to re-create the container workload when moving from one place to another and increasing capacity. Thus, when a VM is patched, the patches will be saved to the VM’s persistent storage, while the container patches need to be available in the container image for the workloads to be always patched.

The bottom line is, unlike VMs, when you think of how to patch containers, you should target improvements in updating the container images.

A Timeline of a Container Image Patch

For this example, we will assume that we have an internal machine learning team that builds their application image using  python:3.10-bullseye as a base image. We will concentrate on the timelines for fixing the OpenSSL vulnerabilities CVE-2022-0778 and CVE-2022-1292. The internal application team’s dependency is OpenSSL <– Debian <– Python. Those are all Open Source Software (OSS) projects driven by their respective communities. Here is the timeline of fixes for those vulnerabilities by the OSS community.

2022-03-08: python:3.10.2-bullseye Released

Python publishes  puthon:3.10.2-bullseye container image. This is the last Python image before the CVE-2022-0778 OpenSSL vulnerability was fixed.

2022-03-15: OpenSSL CVE-2022-0778 Fixed

OpenSSL publishes fix for CVE-2022-0778 impacting versions 1.0.2 – 1.0.2zc, 1.1.1 – 1.1.1m, and 3.0.0 – 3.0.1.

2022-03-16: debian:bullseye-20220316 Released

Debian publishes  debian:bullseye-20220316 container image that includes a fix for CVE-2022-0778.

2022-03-18: python:3.10.3-bullseye Released

Python publishes  python:3.10.3-bullseye container image that includes a fix for CVE-2022-0778.

2022-05-03: OpenSSL CVE-2022-1292 Fixed

OpenSSL publishes fix for CVE-2022-1292 impacting versions 1.0.2 – 1.0.2zd, 1.1.1 – 1.1.1n, and 3.0.0 – 3.0.2.

2022-05-09: debian:bullseye-20220509 Released

Debian publishes debian:bullseye-20220316 container image that DOES NOT include a fix for CVE-2022-1292.

2022-05-27: debian:bullseye-20220527 Released

Debian publishes  debian:bullseye-20220527 container image that includes a fix for CVE-2022-1292.

2022-06-02: python:3.10.4-bullseye Released

Python publishes  python:3.10.4-bullseye container image that includes a fix for CVE-2022-1292.

There are a few important things to notice in this timeline:

  • CVE-2022-0778 was fixed in the whole chain within three days only.
  • In comparison, CVE-2022-1292 took 30 days to fix in the whole chain.
  • Also, in the case of CVE-2022-1292, Debian released a container image after the fix from OpenSSL was available, but that image DID NOT contain the fix.

The bottom line is:

  • Timelines for fixes by the OSS communities are unpredictable.
  • The latest releases of container images do not necessarily contain the latest software patches.

SLAs and the Typical Process for Fixing Container Vulnerabilities

The typical process teams use to fix vulnerabilities in container images is waiting for the fixes to appear in the upstream images. In our example, the machine learning team must wait for the fixes to appear in the python:3.10-bullseye image first, then rebuild their application image, test the new image, and re-deploy to their production workloads if tests pass. Let’s call this process wait-rebuild-test-redeploy (or WRTR if you like acronyms:)).

The majority of enterprises have established SLAs for fixing vulnerabilities. For those that have not established such yet, things will soon change due to the Executive Order for Improving the Nation’s Cybersecurity. Many enterprises model their patching processes based on the FedRAMP 30/90/180 rules specified in the FedRAMP Continuous Monitoring Strategy Guide. According to the FedRAMP rules, high severity vulnerabilities must be remediated within 30 days. CISA’s Operational Directive for Reducing the Risk of Known Exploited Vulnerabilities has much more stringent timelines of two weeks for vulnerabilities published in CISA’s Known Exploited Vulnerabilities Catalog.

Let’s see how the timelines for patching the abovementioned OpenSSL vulnerabilities fit into those SLAs for the machine learning team using the typical process for patching containers.

CVE-2022-0778 was published on March 15th, 2022. It is a high severity vulnerability, and according to the FedRAMP guidelines, the machine learning team has till April 14th, 2022, to fix the vulnerability in their application image. Considering that the python:3.10.3-bullseye image was published on March 18th, 2022, the machine learning team has 27 days to rebuild, test, and redeploy the image. This sounds like a reasonable time for those activities. Luckily, CVE-2022-0778 is not in the CISA’s catalog, but the team would still have 11 days for those activities if it was.

The picture with CVE-2022-1292 does not look so good, though. The vulnerability was published on May 3rd, 2022. It is a critical severity vulnerability, and according to the FedRAMP guidelines, the machine learning team has till June 2nd, 2022, to fix the vulnerability. Unfortunately, though, python:3.10.4-bullseye image was published on June 2nd, 2022. This means that the team needs to do the re-build, testing, and re-deployment on the same day the community published the image. Either the team needs to be very efficient with their processes or work around the clock that day to complete all the activities (after hoping the community will publish a fix for the python image before the SLA deadline). That is a very unrealistic expectation and also impacts the team’s morale. If by any chance, the vulnerability appeared on the CISA’s catalog (which luckily it did not), the team would not be able to fix it within the two-week SLA.

That proves that the wait-rebuild-test-redeploy (WRTR) process is ineffective in meeting the SLAs for fixing vulnerabilities in container images. But, what can you currently do to improve this and take control of the timelines?

Using Multi-Stage Builds to Fix Container Vulnerabilities

Until the container technology evolves and a more declarative way for patching container images is available, teams can use multi-stage builds to build their application images and fix the base image vulnerabilities. This is easily done in the CI/CD pipeline. This approach will also allow teams to control the timelines for vulnerability fixes and meet their SLAs. Here is an example how you can solve the issue with patching the above example:

FROM python:3.10.2-bullseye as baseimage

RUN apt-get update; \
     apt-get upgrade -y

RUN adduser appuser

FROM baseimage

USER appuser

WORKDIR /app

CMD [ "python", "--version" ]

In the above Dockerfile, the first stage of the build updates the base image with the latest patches. The second stage builds the application and runs it with the appropriate user permissions. Using this approach you awoid the wait part in the WRTD process above and you can always meet your SLAs with simple re-build of the image.

Of course, this approach also has drawbacks. One of its biggest issues is the level of control teams have over what patches are applied. Another one is that some teams do not want to include layers in their images that do not belong to the application (i.e. modify the base image layers). Those all are topics for another post 🙂

Photo by Webstacks on Unsplash

With the recent Solorigate incident, a lot of emphasis is put on determining the origin of the software running in an enterprise. For Docker container images, this will mean to embed in the image the Dockerfile the image was built from. However, tracking down the software origin is not so trivial to do. For closed-source software, we blindly trust the vendors and if we are lucky enough, we may get a signed piece of code. For open-source one, we rarely check the SHA signature and never even think of verifying what source code this binary was produced from. In talks with customers, I quite often hear them asking, how can they verify what sources a container image is built from. They want to attribute each image with metadata that links to the Dockerfile used to build the image as well as the Git commit and the developer who triggered the build.

There are many articles that discuss this problem. Here are two recent examples. Richard Lander from the Microsoft .NET team writes in his blog post Staying safe with .NET containers about the pedigree and provenance of the software we run and how to think about it. Josh Hendrick in his post Embedding source code version information in Docker images offers one solution to the problem.

Josh Hendrick’s proposal is in the direction I would go, but one problem I have with it is that it requires special handling in the application that runs in the container to obtain this information. I would prefer to have this information readily available without the need to run the container image. Docker images and the Open Container Initiative already have specified ways to do that without adding special files to your image. In this post, I will outline another way you can embed this information into your images and easily retrieve it without any changes to your application.

Using Docker Image Labels

Docker images spec has already built-in functionality to add labels to the image. Labels are intended to be set during build time. They also show up when inspecting the image using docker image inspect, which makes them the right choice to specify the Dockerfile and the other build origin details. One more argument that makes them the right choice for this information is that the labels are layers in the image, and thus immutable. If you change the label in an image the resulting image SHA will change.

To demonstrate how labels can be used to embed the Dockerfile and other origin information into the Docker image, I have published a dynamic labels sample on GitHub. The sample uses a base Python image and implements a simple functionality to print the container’s environment variables. Let’s walk through it step by step.

The Dockerfile is quite simple.

FROM python:slim
ARG IMAGE_COMMITTER
ARG IMAGE_DOCKERFILE
ARG IMAGE_COMMIT_SHA
LABEL "build.user"=${IMAGE_COMMITTER}
LABEL "build.sha"=${IMAGE_COMMIT_SHA}
LABEL "build.dockerfile"=${IMAGE_DOCKERFILE}
ADD ./samples/dynamic-labels/source /
CMD ["python", "/show_environment.py"]

Lines 2-4 define the build arguments that need to be set during the build of the image. Lines 5-7 set the three labels build.user, build.sha, and build.dockerfilethat we want to embed in the image. build.dockerfile is the URL to the Dockerfile in the GitHub repository, while the build.sha is the Git commit that triggers the build. If you build the image locally with some dummy build arguments you will see that new layers are created for each of the lines 5-7.

toddysm@MacBook-Pro ~ % docker build -t test --build-arg IMAGE_COMMITTER=toddysm --build-arg IMAGE_DOCKERFILE=https://test.com --build-arg IMAGE_COMMIT_SHA=12345 -f .\samples\dynamic-labels\Dockerfile .
Sending build context to Docker daemon  376.3kB
Step 1/9 : FROM python:slim
 ---> 8c84baace4b3
Step 2/9 : ARG IMAGE_COMMITTER
 ---> Running in 71ad05f20d20
Removing intermediate container 71ad05f20d20
 ---> fe56c62b9903
Step 3/9 : ARG IMAGE_DOCKERFILE
 ---> Running in fe468c44e9fc
Removing intermediate container fe468c44e9fc
 ---> b776dca57bd7
Step 4/9 : ARG IMAGE_COMMIT_SHA
 ---> Running in 849a82225c31
Removing intermediate container 849a82225c31
 ---> 3a4c6c23a699
Step 5/9 : LABEL "build.user"=${IMAGE_COMMITTER}
 ---> Running in fd4bfb8d5b5b
Removing intermediate container fd4bfb8d5b5b
 ---> 2e9be17c48ff
Step 6/9 : LABEL "build.sha"=${IMAGE_COMMIT_SHA}
 ---> Running in 892323d73495
Removing intermediate container 892323d73495
 ---> b7bc6559629d
Step 7/9 : LABEL "build.dockerfile"=${IMAGE_DOCKERFILE}
 ---> Running in 98687b8dd9fb
Removing intermediate container 98687b8dd9fb
 ---> 35e97d273cbc
Step 8/9 : ADD ./samples/dynamic-labels/source /
 ---> 9e71859892b1
Step 9/9 : CMD ["python", "/show_environment.py"]
 ---> Running in 366b1b6c3bea
Removing intermediate container 366b1b6c3bea
 ---> e7cb39a21c2a
Successfully built e7cb39a21c2a
Successfully tagged test:latest

You can inspect the image and see the labels by issuing the command docker image inspect --format='{{json .Config.Labels}}' <imagename>.

toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' test | jq
{
  "build.dockerfile":"https://test.com",
  "build.sha":"12345",
  "build.user":"toddysm"
}

Now, let’s automate the process with the help of GitHub Actions. I have created one GitHub Action to build and push the image to DockerHub and another to build and push to Azure Container Registry (ACR). Both actions are similar in the steps they use. The first two steps are the same for both actions. They will build the URL to the Dockerfile using the corresponding GitHub Actions variables:

- name: 'Set environment variable for Dockerfile URL for push'
  if: ${{ github.event_name == 'push' }}
  run: echo "DOCKERFILE_URL=${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/blob/${GITHUB_REF#refs/*/}/samples/dynamic-labels/Dockerfile" >> $GITHUB_ENV

- name: 'Set environment variable for Dockerfile URL for pull request'
  if: ${{ github.event_name == 'pull_request' }}
  run: echo "DOCKERFILE_URL=${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/blob/${GITHUB_BASE_REF#refs/*/}/samples/dynamic-labels/Dockerfile" >> $GITHUB_ENV

Then, there will be specific steps to sign into DockerHub or Azure. After that, the build steps are the ones where the labels are set. Here, for example, is the build step that buildx and automatically pushes the image to DockerHub:

- name: Build and push
  id: docker_build
  uses: docker/build-push-action@v2
  with:
    context: ./
    file: ./samples/dynamic-labels/Dockerfile
    push: true
    tags: ${{ secrets.DOCKER_HUB_REPONAME }}:build-${{ github.run_number }}
    build-args: |
      IMAGE_COMMITTER=${{ github.actor }}
      IMAGE_DOCKERFILE=${{ env.DOCKERFILE_URL }}
      IMAGE_COMMIT_SHA=${{ github.sha }}

The build step for building the image and pushing to Azure Container Registry uses the traditional docker build approach:

- name: Build and push
  id: docker_build
  uses: azure/docker-login@v1
  with:
    login-server: ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}
    username: ${{ secrets.ACR_REGISTRY_USERNAME }}
    password: ${{ secrets.ACR_REGISTRY_PASSWORD }}
- run: |
    docker build -f ./samples/dynamic-labels/Dockerfile -t ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}/${{ secrets.ACR_REPOSITORY_NAME }}:build-${{ github.run_number }} --build-arg IMAGE_COMMITTER=${{ github.actor }} --build-arg IMAGE_DOCKERFILE=${{ env.DOCKERFILE_URL }} --build-arg IMAGE_COMMIT_SHA=${{ github.sha }} .
    docker push ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}/${{ secrets.ACR_REPOSITORY_NAME }}:build-${{ github.run_number }}

After the actions complete, the images are available in DockerHub and Azure Container Registry. Here is how the image looks like in DockerHub:

Docker container image with labels

If you scroll down a little, you will see the labels that appear in the list of layers:

The URL points you to the Dockerfile that was used to create the image while the commit SHA can be used to identify the latest changes that are done on the project that is used to build the image. If you pull the image locally, you can also see the labels using the command:

toddysm@MacBook-Pro ~ % docker pull toddysm/tmstests:build-36
build-36: Pulling from toddysm/tmstests
45b42c59be33: Already exists
8cd3485318db: Already exists
2f564129f025: Pull complete
cf1573f5a21e: Pull complete
ceec8aed2dab: Pull complete
78b1088f77a0: Pull complete
Digest: sha256:7862c2a31970916fd50d3ab38de0dad74a180374d41625f014341c90c4b55758
Status: Downloaded newer image for toddysm/tmstests:build-36
docker.io/toddysm/tmstests:build-36
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' toddysm/tmstests:build-36
{
  "build.dockerfile":"https://github.com/CrimsonPinnacle/container-image-inspector/blob/development/samples/dynamic-labels/Dockerfile",
  "build.sha":"e80e6ef86f86a11d6a73aea8d8c41700c4d3d7c5",
  "build.user":"toddysm"
}

To summarize, the benefit of using labels for embedding the Dockerfile and other origin information into the container images is that those are considered immutable layers of the image. Thus, they cannot be changed without changing the image.

Who is Using Docker Image Labels?

Unfortunately, labels are not widely used if at all 🙁 Checking several popular images from DockerHub yields the following results:

toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' busybox | jq
null
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' alpine | jq 
null
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' ubuntu | jq 
null

Tracking down the sources from which the Alpine image is built would require much higher effort.

What is Next for Checking Docker Image Origins?

There are a couple of community initiatives that will play a role in determining the origin of container images.

  • Notary V2 will allow images to be signed. Having the origin information embedded into the image and adding an official signature to the image will increase the confidence in the legitimacy of the image.
  • OCI manifest specification allows artifacts (i.e. images) to be annotated with arbitrary metadata. Unfortunately, Docker doesn’t support those yet. Hopefully, in the future, Docker images will add support for arbitrary metadata that can be included in the image manifest.
  • An implementation of metadata service (see metadata service draft from Steve Lasker) as part of the registry will enable additional capabilities to provide origin information for the images.

Summary

While image metadata is great to annotate images with useful information and enable search and querying capabilities, the metadata is kept outside of the image layers and can mutate over time. Verifying the authenticity of the metadata and keeping a history of the changes will be a harder problem to solve. Docker already provides a way to embed the Dockerfile and other image origin information as immutable layers of the image itself. Using dynamically populated Docker image labels, developers can right now provide origin information and increase the supply chain confidence for their images.