For a while, we’ve been exploring the idea of using OCI annotations to track the lifecycle of container images. The problem we are trying to solve is as follows. Container images are immutable and cannot be dynamically patched like virtual machines. To apply the latest updates to a containerized application, teams must produce a new image with the patches. Once the new image is produced, the old one should be considered outdated and vulnerable, and all workloads using the old image should be redeployed with the new one.

The problem is how to mark the old image as outdated. Why? Because teams may have pinned their deployments to a digest or an immutable tag, and we want them to move to the patched version. We also want to create policies that outdated images should not be deployed. Finally, we want an automated way to point the teams to the latest patched image. Unfortunately, using digests and tags prevents us from achieving those goals.

Container Image Lifecycle Example

Here is a concrete example. I will use semantic versioning for the tags to make the example more relatable.

  • First Revision: Build the application image ghcr.io/toddysm/flasksample:1.0
    1.0 in this example is a rolling tag. To differentiate between images, I may want to use an immutable tag; for example 1.0-20230707 using the YYYYMMDD format for the date. This image has a digest sha256@1234567890.
  • Second Revision: New vulnerabilities are found in ghcr.io/toddysm/flasksample:1.0 and I rebuild the image and tag it using the same 1.0 rolling tag.
    I also tag it with an immutable tag like 1.0-20230710. This image has a different digest sha256@5647382910.

At this moment, I have two images for the same application from the same lineage 1.0. Here is the relation between tags and digests:

  • Tags 1.0 and 1.0-20230710 point to digest sha256@5647382910
  • Tag 1.0-20230707 points to digest sha256@1234567890

If I have pinned my deployment to the tag 1.0-20230707 or the digest sha256@1234567890 I do not know whether the image is still fresh or has a newer patched version. I could try interpreting the tags, but this will be very custom to my tagging scheme. For example, my tagging scheme 1.0-YYYYMMDD differs from Python’s, NodeJS’, Alpine’s, or Ubuntu’s scheme (well, Ubuntu’s is very similar :)). Also, obtaining the tag from the image digest is not possible.

The idea is to use OCI annotations to add additional information to the images. This information can help us communicate the deprecation of images and preserve vital information lost when retagging. We are not the only ones thinking in this direction – the folks from Ubuntu also want to use annotations to deprecate images, although their goal is a bit different.

OCI Annotations for Image Lifecycle

OCI annotations are key-value pairs that you can add to the manifest of any OCI artifact, including container images. The problem is that annotations cannot be changed once the manifest is created. Hence, if you add the OCI annotations to the image manifest, you cannot update them anymore. The workaround to that is to use the OCI referrer capability and add a new artifact with annotations that is linked to the image. In essence, whenever you need to update an annotation, you must push a new artifact with the full set of annotations and link it to the image. The consumer of the annotations needs to list all referrer artifacts with the “annotation” type and take the latest one.

The other question that comes to mind is: “What annotations will help you track the image lifecycle?” OCI already specifies some pre-defined keys that you can leverage. There are three important ones that will help you manage the lifecycle:

  • org.opencontainers.image.created can be used to specify the date at which the image was created. For example, when the image is built.
  • org.opencontainers.image.version can be used to specify the version of the software. Think of this as the lineage of the software (i.e. Python 3.10 or Ubuntu Jammy).
  • org.opencontainers.image.revision can be used to sperify the patch version of the software. For example, Python 3.10.12 or Ubuntu Jammy 20230624.

One thing that OCI does not specify is an annotation for end-of-life of the image. For that, you can use custom annotation like vnd.myorganization.image.end-of-life.

How to Use Annotations for Image Lifecycle?

Let’s look at how the above annotations can be used to manage the lifecycle of a series of images. I will use concrete dates for the example.

First Revision of the Image: Build the application image ghcr.io/toddysm/flasksample:1.0. As part of the build process, add the following annotations to the image:

{
    "org.opencontainers.image.created" : "2023-07-07T00:00:00-08:00",
    "org.opencontainers.image.version" : "1.0",
    "org.opencontainers.image.revision" : "20230707"
}

Second Revision of the Image: Vulnerabilities are discovered in the ghcr.io/toddysm/flasksample:1.0 image. Rebuild the image with the fixes and add the following annotations to it:

{
    "org.opencontainers.image.created" : "2023-07-10T00:00:00-08:00",
    "org.opencontainers.image.version" : "1.0",
    "org.opencontainers.image.revision" : "20230710"
}

Find the previous image in the registry ghcr.io/toddysm/flasksample:1.0-20230707 and update the annotations to the following:

{
    "org.opencontainers.image.created" : "2023-07-07T00:00:00-08:00",
    "org.opencontainers.image.version" : "1.0",
    "org.opencontainers.image.revision" : "20230707",
    "vnd.myorganization.image.end-of-life" : "2023-07-10T00:00:00-08:00"
}

With those annotations in place, you can track the lifecycle of each image. But not only that! You can always determine the latest and most up-to-date image in the lineage by just pulling the 1.0 tag (which is available in the org.opencontainers.image.version annotation for every image in the lineage). Both the mutable and the immutable tags are preserved with the image. New annotations can always be added to the image by adding “empty” referrer artifacts with just the annotations. Also, annotations for the images are available even if the image is pulled by its digest.

Here are a couple of scenarios that you can implement with this information.

  • Block deployment of images that are end-of-life
    You can implement a policy to block the deployment of images that are end-of-life on your Kubernetes clusters. Such a policy can be implemented in admission controllers like Kyverno or Gatekeeper.
  • Suggest updated image in action items in vulnerability reports
    Current vulnerability reports for container images are hardly actionable because they do not provide an update path for the reported images. Development teams are not interested in how many vulnerabilities are discovered but in the quickest way to fix those.
  • Automate the process for rebuilding dependent images
    Tools like Dependabot can use the image lifecycle information to create pull requests for dependent images. This can speed up the process of fixing vulnerabilities and improving the vulnerability posture for the application.

All that sounds great, but the problem is the tooling support. While OCI specifies how you can store artifacts in registries and defines some standard annotations, very few, if any, tools allow you to easily achieve the above experience. I took it upon myself to give it a try!

Implementing Image Lifecycle Annotations with Existing Tools

Note: The below experience uses Docker buildx, ORAS, and GitHub Container Registry (GHCR) for the experience. The experience is as of July 10th, 2023, and may (or most certainly will) change. Other tools like regctl can also be used instead of ORAS. As always, I will use my cssc-pipeline repository for storing any code for this blog post.

First, I will set some environment variables to avoid retyping and make the commands easier to follow.

export TEMP_LOCATION=temp
export IMAGE_VERSION=1.0
export FIRST_REVISION=20230707
export SECOND_REVISION=20230710
export REGISTRY=ghcr.io/toddysm/cssc-pipeline
export REPOSITORY=flasksample
mkdir -p $TEMP_LOCATION

Building the First Revision of the Image

The first step is to build the first revision of the image. Docker buildx can build the image, but the default option is to save the image in Docker’s proprietary format, which doesn’t allow the use of annotations. To use annotations, you must use the OCI exporter and save the image as a tarball. Here is the command that will allow you to do that:

docker buildx build . -f Dockerfile \
  -t ${REGISTRY}/${REPOSITORY}:${IMAGE_VERSION} \
  -o "type=oci,dest=${TEMP_LOCATION}/flasksample-${IMAGE_VERSION}-${FIRST_REVISION}.tar,annotation.org.opencontainers.image.created=20230707T00:00-08:00,annotation.org.opencontainers.image.version=${IMAGE_VERSION},annotation.org.opencontainers.image.revision=${FIRST_REVISION}" \
  --metadata-file ${TEMP_LOCATION}/${REPOSITORY}-${IMAGE_VERSION}-${FIRST_REVISION}-metadata.json

The command above creates an image in OCI format and saves it as a tarball. We can use the generated metadata file to obtain the image’s digest.

export FIRST_REVISION_DIGEST=`cat ${TEMP_LOCATION}/${REPOSITORY}-${IMAGE_VERSION}-${FIRST_REVISION}-metadata.json \
  | jq -r '."containerimage.descriptor".digest'`
echo $FIRST_REVISION_DIGEST

In my case, the digest is sha256:1446094f076dcbc2b7e7943ae3806bb44003ee9e6c94efd3208b8f04159aa8c0.

Next, I will use the following ORAS command to push the OCI image to the GHCR registry:

oras cp --from-oci-layout ${TEMP_LOCATION}/${REPOSITORY}-${IMAGE_VERSION}-${FIRST_REVISION}.tar:${IMAGE_VERSION} \
  ${REGISTRY}/${REPOSITORY}:${IMAGE_VERSION}

I can verify that the annotations are set on the image by pulling the manifest and checking the annotations field:

oras manifest fetch ${REGISTRY}/${REPOSITORY}:${IMAGE_VERSION} \
  | jq .annotations

At this point, I have the first revision of the image built and published to GHCR under ghcr.io/toddysm/cssc-pipeline/flasksample:1.0.

Building the Second Revision of the Image

A few days later, if vulnerabilities are discovered in the image, I need to update the image with the latest patches. As part of the build process, I also need to update the annotations of the previous image revision.

The first thing I need to do is to obtain the digest of the first revision. Because the first revision is still tagged with 1.0, I can quickly get the digest using the following command:

export OLD_IMAGE_DIGEST=`oras manifest fetch --descriptor ${REGISTRY}/${REPOSITORY}:${IMAGE_VERSION} \
  | jq .digest | tr -d '"'`
echo $OLD_IMAGE_DIGEST

That command returns the same digest as before sha256:1446094f076dcbc2b7e7943ae3806bb44003ee9e6c94efd3208b8f04159aa8c0. Now, I have a unique reference to the first revision of the image. I can build the second revision of the image using the same commands as before.

# Build the second revision of the container image with annotations...
docker buildx build . -f Dockerfile \
  -t ${REGISTRY}/${REPOSITORY}:${IMAGE_VERSION} \
  -o "type=oci,dest=${TEMP_LOCATION}/${REPOSITORY}-${IMAGE_VERSION}-${SECOND_REVISION}.tar,annotation.org.opencontainers.image.created=20230710T00:00-08:00,annotation.org.opencontainers.image.version=${IMAGE_VERSION},annotation.org.opencontainers.image.revision=${SECOND_REVISION}" \
  --metadata-file ${TEMP_LOCATION}/${REPOSITORY}-${IMAGE_VERSION}-${SECOND_REVISION}-metadata.json

# Get the digest for the second revision...
export SECOND_REVISION_DIGEST=`cat ${TEMP_LOCATION}/${REPOSITORY}-${IMAGE_VERSION}-${SECOND_REVISION}-metadata.json \
  | jq -r '."containerimage.descriptor".digest'`
echo $SECOND_REVISION_DIGEST

# Use ORAS to push the second revision to the registry...
oras cp --from-oci-layout ${TEMP_LOCATION}/${REPOSITORY}-${IMAGE_VERSION}-${SECOND_REVISION}.tar:${IMAGE_VERSION} \
  ${REGISTRY}/${REPOSITORY}:${IMAGE_VERSION}

# Use ORAS to verify the annotations are set on the image...
oras manifest fetch ${REGISTRY}/${REPOSITORY}:${IMAGE_VERSION} \
  | jq .annotations

You can check that the second revision digest is different from the first revision using the following commands:

export IMAGE_DIGEST=`oras manifest fetch --descriptor ${REGISTRY}/${REPOSITORY}:${IMAGE_VERSION} \
  | jq .digest | tr -d '"'`
echo $IMAGE_DIGEST

For me, the digest of the second revision (or the most up-to-date image) is sha256:4ee61e3d9d28fe15cffc33854a2b851e2c87929a99f0c71bfc7a689ad372894d.

Updating the Lifecycle Annotations for the First Revision of the Image

This is the most important step of the process – I need to go back and update the lifecycle annotations for the first revision of the image and mark it as end-of-life. This is a bit trickier process because the manifest of the original image cannot be modified. It is immutable! I need to create a referrer artifact and store the lifecycle annotations in the manifest of this referrer artifact. However, the referrer artifact should be empty (well, you can put a cat picture there, but it is irrelevant 🙂 ) ORAS already supports push with an empty artifact. In the future, OCI-compliant registries will support empty layers for artifacts too. Here are the steps for that.

First, I will fetch the annotations for the first revision and update them with the end-of-life annotation.

oras manifest fetch ${REGISTRY}/${REPOSITORY}@${OLD_IMAGE_DIGEST} \
  | jq .annotations \
  | jq '. += {"vnd.myorganization.image.end-of-life":"20230710T00:00-08:00"}' \
  | jq '{"$manifest":.}' \
  > ${TEMP_LOCATION}/annotations.json

Note that ORAS uses a special JSON schema for annotation files. Hence, I needed to convert the annotations that I retrieved from the image manifest to a new JSON object expected by ORAS. Here is what the resulting file looks like:

jq . ${TEMP_LOCATION}/annotations.json                                                                         

{
  "$manifest": {
    "org.opencontainers.image.created": "20230707T00:00-08:00",
    "org.opencontainers.image.revision": "20230707",
    "org.opencontainers.image.version": "1.0",
    "vnd.myorganization.image.end-of-life": "20230710T00:00-08:00"
  }
}

Now, I need to push the empty artifact and refer to the first revision of the image. To do that, I will also need to use an artifact type (or mediaType in OCI language) to find my lifecycle annotations later on easily. There is no standard mediaType for that, so I have to invent my own – I will use application/vnd.myorganization.image.lifecycle.metadata. Here is the ORAS command that you can use to update the annotations:

oras attach --artifact-type application/vnd.myorganization.image.lifecycle.metadata \
  --annotation-file ${TEMP_LOCATION}/annotations.json \
  ${REGISTRY}/${REPOSITORY}@${OLD_IMAGE_DIGEST} \
  ${TEMP_LOCATION}/empty.layer

OK, I am done with setting the lifecycle annotations for the images.

Fetching the Lifecycle Annotations for Each Image

I can fetch the annotations for each image by simply using the following commands:

oras manifest fetch ${REGISTRY}/${REPOSITORY}@${OLD_IMAGE_DIGEST} | jq .annotations

oras manifest fetch ${REGISTRY}/${REPOSITORY}@${NEW_IMAGE_DIGEST} | jq .annotations

There is a problem, though! Those commands fetch the annotations that are set in the image manifest. For the outdated image (i.e. $OLD_IMAGE), the command will not return the end-of-life annotation. To get that annotation, I will need to fetch the referrer with the particular type application/vnd.myorganization.image.lifecycle.metadata. Here is how to do that.

First, I need to get the digest of the referrer artifact.

export ANNOTATIONS_ARTIFACT_DIGEST=`oras discover --artifact-type "application/vnd.myorganization.image.lifecycle.metadata" \
  ${REGISTRY}/${REPOSITORY}@${OLD_IMAGE_DIGEST} -o json \
  | jq '.manifests[0].digest' \
  | tr -d '"'`
echo $ANNOTATIONS_ARTIFACT_DIGEST

And then retrieve the annotations set in the referrer’s artifact manifest.

oras manifest fetch ${REGISTRY}/${REPOSITORY}@${ANNOTATIONS_ARTIFACT_DIGEST} | jq .annotations

As a logic for implementation, I would always check if the image has a referrer lifecycle artifact. If so, I will ignore the lifecycle annotations in the image manifest.

Closing Thoughts

Lifecycle annotations enable interesting scenarios for securing container supply chains and improving containerized applications’ vulnerability posture. Though the tooling can undoubtedly be improved – I had to do a lot of JSON conversions to get it working. The lack of standard annotation for end-of-life and mediaType for the referrer artifact makes the above solution very proprietary.

One issue that can arise is if multiple application/vnd.myorganization.image.lifecycle.metadata referrers are created. OCI doesn’t specify how to retrieve the latest artifact from registries by type. If multiple lifecycle annotations artifacts are pushed for the image, the client must pull and inspect each. This logic can be quite complex and can impact the performance on the client’s side.

An improvement that can be made to the process above is to sign each artifact (the image and the lifecycle annotations artifact). This will ensure that the annotations are trustable and not tampered with. Of course, an attacker can always remove the referrer artifact from the registry and leave the client with the impression that the image is still fresh.

Those are all food for thought and good topics for future posts. Here is also a video of the whole experience described above.

[UPDATE: 2023-03-26] When I wrote this post, the expectation was that OCI will release version 1.1 of the specification with artifact manifest included. This release was supposed to happen by end of Jan 2023 or mid Feb 2023. Unfortunately, the OCI 1.1 Image Spec PR 999 put a hold on that and as of today, the spec is not released. Although I promised to have a Part 2, due to the changes in the spec, continuing the investigation in the original direction may not be fruitful and helpful to anyone. Most of the functionality described below is removed from many registries and the steps and the information may be incorrect. The concepts are still relevant but their actual implementation may not be as described in this post. Consider the relevance of the information applicable only between Jan 5th 2023 and Jan 24th 2023 – the date the above PR was submitted. There will be no other updates to this post or Part 2 of the series. Instead of Part 2, folks may find the Registry & client support for Image Manifest type artifacts issue relevant to what they are looking for.

If you are deep into containers and software supply chain security, you may have heard of OCI referrers API and OCI artifacts. If not, but you are interested in the containers’ secure supply chain topic, this post will give you enough details to start exploring new registry capabilities that can significantly improve your software supply chain architecture.

This will be a two-part series. In the first part, I will examine the differences between OCI 1.0 and OCI 1.1 and their support across registries. In the second part, I will look at more advanced scenarios like deep hierarchies, deleting artifacts, and migrating content between registries with different support.

But before we start…

What is OCI?

The Open Container Initiative (OCI) is the governance organization responsible for creating open industry standards for container formats and runtimes. OCI develops and maintains three essential specifications:

  1. The OCI Image Format Specification defines the structure and the layout of an image or artifact. If you are interested in reading more about the OCI image layout, I recommend the No More Additional Network Requests – Enter: OCI Image Layout post from @developerguy. It will give you a good background on how the image is structured. I will mainly discuss the OCI Artifract Manifest in this post.
  2. The OCI Distribution Specification defines the APIs that registries should implement to enable the distribution of artifacts. The OCI Referrers API is part of this specification and will be discussed in this post.
  3. The OCI Runtime Specification specifies the configuration, the execution environment, and the lifecycle of a container.  I will not discuss the runtime specification in this post.

One additional note. You may have heard of the term OCI reference types in the past. This was the name of the working group (WG) responsible for driving the changes in the image format and distribution specification. The prototype implementation of reference types was first implemented in ORAS. Its usefulness was the reason it was brought to the attention of the OCI group and resulted in the new changes.

Disclaimer: One last thing I have to mention is that, at the time of this writing, the OCI specifications (OCI 1.1) that support the new artifact manifest changes and the referrers API is in release candidate 2 (RC.2). The release of the OCI 1.1 specifications is planned for February 2023. Keep in mind that not many registries support the new artifact manifest and the referrers API due to this fact. This post aims to describe the scenarios it enables and discuss the backward compatibilities with registries that support the current OCI 1.0 specifications. I will also test several registries and point out their current capabilities.

What Scenarios Do OCI Artifact Manifest and Referrers API Enable?

As always, I would like to start with the scenarios and what are the benefits of using those new capabilities. As part of the ongoing software secure supply chain efforts, every vendor must produce metadata in addition to the actual software. Vendors need to add human and machine-readable metadata describing the software, whether this is a binary executable or a container image. The most common metadata discussed nowadays is software bills of materials (SBOMs) and signatures. SBOMs list the packages and binaries used in the individual piece of software (aka the software “ingredients”). The signature is intended to testify about the authenticity of the software and prevent tampering with the bits.

In the past, container registries were intended to store only container images. With the introduction of OCI artifacts, container registries can store other artifacts like SBOMs, signatures, plain text files, and even videos. The OCI referrers API goes even further and allows you to establish relationships between artifacts. This is a compelling functionality that allows you to create structures like this:

+ Container Image
    - Signature of the Container Image
    + SBOM for the Container Image
        - Signature of the SBOM
    + Vulnerability Report for the Container Image
        - Signature of the Vulnerability Report
    + Additional Container Image metadata
        - Signature of the additional metadata
    - ...

Now, the container registry is not just a storage place for images but a generic artifacts storage that can also define relations between the artifacts. As you may have noticed the trend in the industries, the registries are not referred to as container registries anymore but as artifact registries.

There are many benefits that the new capabilities offer in addition to storing various artifacts:

  • Relevant artifacts can be stored and managed together with the subject (or primary) artifact.
    Querying and visualizing the related artifacts is much easier than storing them unrelated. This can result not only in more manageable implementations but also in better performance.
  • Relevant artifacts are easily discoverable.
    Pulling an image from a registry may require additional artifacts for verification. An example is signature verification before allowing deployment. Using the OCI referrers API to get an image’s signature will be a trivial and standardized operation.
  • Relevant artifacts can be copied together between registries.
    Content promotion between registries is a common scenario in container supply chains. Now, the image can be promoted to the target registry with all relevant artifacts instead of making many calls to the registry to discover them before promotion.

Because the capabilities are still new, how to standardize the implementations is still in discussion. You can look at my request for guidance for OCI artifacts for more variations of the above scenario and the possible implementations.

For this post, though, I will concentrate on a straightforward scenario using BOMs. I want to attach three different SBOMs to an image and test with a few major registries to understand the current capabilities. I will build the following content structure:

+ Container image
  artifactType: "application/vnd.docker.container.image.v1+json"
    - SPDX SBOM in JSON format
      artifactType: "application/spdx+json"
    - SPDX SBOM in TEXT format
      artifactType: "text/spdx"
    - CycloneDX SBOM in JSON format
      artifactType: "application/vnd.cyclonedx+json"

I also chose the following registries to test with:

You may not be familiar with the Zot and the ORAS registries listed above, but they are Open Source registries that you can run locally. Those registries are on top of any new OCI capabilities and one of the first registries to implement those. They make it a good option for testing new OCI capabilities.

Now, let’s dive into the registry capabilities.

Creating the Artifacts

All artifacts and results can be found in my container secure supply chain playground repository on GitHub. I have created the usual flasksample image and generated the SBOMs using Syft. Here are all the commands for that:

# Buld and push the image
docker build -t toddysm/flasksample:oci1.1-tests .
docker login -u toddysm
docker push toddysm/flasksample:oci1.1-tests

# Generate the SBOM in various formats
syft packages toddysm/flasksample:oci1.1-tests -o spdx-json > oci1.1-tests.spdx.json
syft packages toddysm/flasksample:oci1.1-tests -o spdx > oci1.1-tests.spdx
syft packages toddysm/flasksample:oci1.1-tests -o cyclonedx-json > oci1.1-tests.cyclonedx.json

I will use the above image and the generated SBOMs to push to various registries and test their behavior. Note that ORAS CLI can handle registries that support the new OCI 1.1 specifications and registries that support only OCI 1.0 specifications. ORAS CLI automatically converts the manifest to the most appropriate manifest based on the registry support.

Referring to Artifacts in Registries with OCI 1.0 Support

Docker Hub recently announced support for OCI Artifacts. Note, though, that this is support for OCI 1.0. Here are the commands to push the SBOMs to Docker Hub and reference the image as a subject:

oras attach --artifact-type "application/spdx+json" --annotation "producer=syft 0.63.0" docker.io/toddysm/flasksample:oci1.1-tests ./oci1.1-tests.spdx.json
# Command reponse
Uploading e6011f4dd3fa oci1.1-tests.spdx.json
Uploaded  e6011f4dd3fa oci1.1-tests.spdx.json
Attached to docker.io/toddysm/flasksample@sha256:b89e2098603bead4f07e318e1a4e11b4a4ef1f3614725c88b3fcdd469d55c0e0
Digest: sha256:0a1dd8fcdef54eb489aaa99978e19cffd7f6ae11595322ab5af694913da177d4

oras attach --artifact-type "text/spdx" --annotation "producer=syft 0.63.0" docker.io/toddysm/flasksample:oci1.1-tests ./oci1.1-tests.spdx
# Command response
Uploading d9c2135fe4b9 oci1.1-tests.spdx
Uploaded  d9c2135fe4b9 oci1.1-tests.spdx
Error: DELETE "https://registry-1.docker.io/v2/toddysm/flasksample/manifests/sha256:16a58d1ed78402935d61e524f5609087334b164861618373d7b96a7b7c612f1a": response status code 405: unsupported: The operation is unsupported.

oras attach --artifact-type "application/vnd.cyclonedx+json" --annotation "producer=syft 0.63.0" docker.io/toddysm/flasksample:oci1.1-tests ./oci1.1-tests.cyclonedx.json
# Command response
Uploading c0ddc2a5ea78 oci1.1-tests.cyclonedx.json
Uploaded  c0ddc2a5ea78 oci1.1-tests.cyclonedx.json
Error: DELETE "https://registry-1.docker.io/v2/toddysm/flasksample/manifests/sha256:37ebfdebe499bcec8e5a5ce04ae4526d3e560c199c16a85a97f80a91fbf1d2c3": response status code 405: unsupported: The operation is unsupported.

Checking Docker Hub, I can see that the image digest is sha256:b89e2098603bead4f07e318e1a4e11b4a4ef1f3614725c88b3fcdd469d55c0e0 as returned by the ORAS CLI above.

I expected to see another artifact with sha256:0a1dd8fcdef54eb489aaa99978e19cffd7f6ae11595322ab5af694913da177d4 (again returned by the ORAS CLI above). However, such an artifact is not shown in the Docker Hub UI. There is another artifact tagged with the digest of the image.

However, the digest of that artifact (sha256:c8c7d53f0e1ed5553a815c7b5ccf40c09801f7636a3c64940eafeb7bfab728cd) is not the one from the ORAS CLI output.

Of course, the question in my mind is: “What is the digest that ORAS CLI returned above?” The sha256:0a1dd8fcdef54eb489aaa99978e19cffd7f6ae11595322ab5af694913da177d4 one. Using ORAS CLI or crane, I can explore the various manifests.

What Manifests Are Created When Referring Between Artifacts in OCI 1.0 Registries?

The oras discover command helps visualize the hierarchy of artifacts that reference a subject.

oras discover docker.io/toddysm/flasksample:oci1.1-tests -o tree                      
docker.io/toddysm/flasksample:oci1.1-tests
├── application/spdx+json
│   └── sha256:0a1dd8fcdef54eb489aaa99978e19cffd7f6ae11595322ab5af694913da177d4
├── text/spdx
│   └── sha256:6f6c9260247ad876626f742508550665ad20c75ac7e4469782d18e47d40cac67
└── application/vnd.cyclonedx+json
    └── sha256:047054894cbe7c9e57532f4e01d03f631e92c3aec48b4a06485296aee1374b3b

According to the output above, I should be able to see four artifacts. Also, as you can see, the digest sha256:0a1dd8fcdef54eb489aaa99978e19cffd7f6ae11595322ab5af694913da177d4 is the one for the first SBOM I attached to the image. To understand what is happening, let’s look at the different manifests. I will use the oras manifest command to pull the manifests of all artifacts by referencing them by digest:

# Pull the manifest for the image
oras manifest fetch docker.io/toddysm/flasksample@sha256:b89e2098603bead4f07e318e1a4e11b4a4ef1f3614725c88b3fcdd469d55c0e0 > manifest-sha256-b89e2098603bead4f07e318e1a4e11b4a4ef1f3614725c88b3fcdd469d55c0e0.json

# Pull the manifest for the SPDX SBOM in JSON format
oras manifest fetch docker.io/toddysm/flasksample@sha256:0a1dd8fcdef54eb489aaa99978e19cffd7f6ae11595322ab5af694913da177d4 > manifest-sha256-0a1dd8fcdef54eb489aaa99978e19cffd7f6ae11595322ab5af694913da177d4.json

# Pull the manifest for the SPDX SBOM in TEXT format
oras manifest fetch docker.io/toddysm/flasksample@sha256:6f6c9260247ad876626f742508550665ad20c75ac7e4469782d18e47d40cac67 > manifest-sha256-6f6c9260247ad876626f742508550665ad20c75ac7e4469782d18e47d40cac67.json

# Pull the manifest for the CycloneDX SBOM in JSON format
oras manifest fetch docker.io/toddysm/flasksample@sha256:047054894cbe7c9e57532f4e01d03f631e92c3aec48b4a06485296aee1374b3b > manifest-sha256-047054894cbe7c9e57532f4e01d03f631e92c3aec48b4a06485296aee1374b3b.json

# Pull the manifest of the artifact tagged with the image digest
oras manifest fetch docker.io/toddysm/flasksample@sha256:c8c7d53f0e1ed5553a815c7b5ccf40c09801f7636a3c64940eafeb7bfab728cd > manifest-sha256-c8c7d53f0e1ed5553a815c7b5ccf40c09801f7636a3c64940eafeb7bfab728cd.json

All manifests are available in the dockerhub folder in my container secure supply chain playground repository on GitHub. The image manifest is self-explanatory and I will not dig into it. The other four are more interesting. Opening the manifest for the SPDX SBOM in JSON format, I can see that it is an artifact manifest "mediaType": "application/vnd.oci.artifact.manifest.v1+json" of type "artifactType": "application/spdx+json". It has a blob annotated with the name of the file I pushed. It also has a subject field referring to the image. The manifest for the SPDX SBOM in TEXT format and the CycloneDX SBOM in JSON format have the same structure. The hierarchy represented by the oras discover command above shows exactly those manifests. I believe the output of oras discover could be improved to show also the image digest for completeness:

oras discover docker.io/toddysm/flasksample:oci1.1-tests -o tree                      
docker.io/toddysm/flasksample:oci1.1-tests
│   └── sha256:b89e2098603bead4f07e318e1a4e11b4a4ef1f3614725c88b3fcdd469d55c0e0 
├── application/spdx+json
│   └── sha256:0a1dd8fcdef54eb489aaa99978e19cffd7f6ae11595322ab5af694913da177d4
├── text/spdx
│   └── sha256:6f6c9260247ad876626f742508550665ad20c75ac7e4469782d18e47d40cac67
└── application/vnd.cyclonedx+json
    └── sha256:047054894cbe7c9e57532f4e01d03f631e92c3aec48b4a06485296aee1374b3b

The question remains how the manifest tagged with the image digest plays a role here. Looking at it, I can see that it is an index manifest "mediaType": "application/vnd.oci.image.index.v1+json" that lists the three SBOM artifacts I pushed. Remember, this index manifest is tagged with the image digest. This is similar to the structure Sigstore creates that I described in Implementing Containers’ Secure Supply Chain with Sigstore Part 2 – The Magic Behind. Here is a visual of how the manifests are related:

The SBOM artifacts are not visible in the Docker Hub UI because they are not tagged, and Docker Hub has no UI to show untagged artifacts. A few? questions remain:

  • What happens if I delete the image?
  • What happens if I delete the index manifest?
  • Can I create deeper hierarchical structures in registries that support OCI 1.0?
  • What happens when I copy related artifacts from OCI 1.0 registry to OCI 1.1 registry?

I will come back to those in the second part of the series. Before that, I would like to examine how registries with OCI 1.1 support storing the manifests for the referred artifacts.

Referring to Artifacts in Registries with OCI 1.1 Support

Azure Container Registry (ACR) just announced support for OCI 1.1. It is in Public Preview and supports the OCI 1.1 RC spec at the moment of this writing. After retagging the image, the commands for pushing the SBOMs are similar to the ones used for Docker Hub.

# Re-tag and push the image
docker image tag toddysm/flasksample:oci1.1-tests tsmacrwcusocitest.azurecr.io/flasksample:oci1.1-tests
docker push tsmacrwcusocitest.azurecr.io/flasksample:oci1.1-tests

oras attach --artifact-type "application/spdx+json" --annotation "producer=syft 0.63.0" tsmacrwcusocitest.azurecr.io/flasksample:oci1.1-tests ./oci1.1-tests.spdx.json
# Command response
Uploading e6011f4dd3fa oci1.1-tests.spdx.json
Uploaded  e6011f4dd3fa oci1.1-tests.spdx.json
Attached to tsmacrwcusocitest.azurecr.io/flasksample@sha256:b89e2098603bead4f07e318e1a4e11b4a4ef1f3614725c88b3fcdd469d55c0e0
Digest: sha256:71e90130cb912fbcff6556c0395878a8e7a0c7244eb8e8ee9001e84f9cba804a

oras attach --artifact-type "text/spdx" --annotation "producer=syft 0.63.0" tsmacrwcusocitest.azurecr.io/flasksample:oci1.1-tests ./oci1.1-tests.spdx
# Command response
Uploading d9c2135fe4b9 oci1.1-tests.spdx
Uploaded  d9c2135fe4b9 oci1.1-tests.spdx
Attached to tsmacrwcusocitest.azurecr.io/flasksample@sha256:b89e2098603bead4f07e318e1a4e11b4a4ef1f3614725c88b3fcdd469d55c0e0
Digest: sha256:0fbd0e611ec9fe620b72ebe130da680de9402e1e241b30c2aa4515610ed2d766

oras attach --artifact-type "application/vnd.cyclonedx+json" --annotation "producer=syft 0.63.0" tsmacrwcusocitest.azurecr.io/flasksample:oci1.1-tests ./oci1.1-tests.cyclonedx.json
# Command response
Uploading c0ddc2a5ea78 oci1.1-tests.cyclonedx.json
Uploaded  c0ddc2a5ea78 oci1.1-tests.cyclonedx.json
Attached to tsmacrwcusocitest.azurecr.io/flasksample@sha256:b89e2098603bead4f07e318e1a4e11b4a4ef1f3614725c88b3fcdd469d55c0e0
Digest: sha256:198e405344b5fafd6127821970eb4a84129ae729402e8c2c71fc1bb80abf0954

Azure portal does not show any additional artifacts and manifests, as shown in this screenshot:

This is a bit confusing, as I would at least expect to see a few more manifests. The distribution specification does not define functionality for listing untagged manifests; figuring out those dependencies without additional information will be hard. One noticeable thing is that no additional index manifest is tagged with the image digest.

oras discover command returns the following tree:

oras discover tsmacrwcusocitest.azurecr.io/flasksample:oci1.1-tests -o tree
tsmacrwcusocitest.azurecr.io/flasksample:oci1.1-tests
├── application/vnd.cyclonedx+json
│   └── sha256:198e405344b5fafd6127821970eb4a84129ae729402e8c2c71fc1bb80abf0954
├── text/spdx
│   └── sha256:0fbd0e611ec9fe620b72ebe130da680de9402e1e241b30c2aa4515610ed2d766
└── application/spdx+json
    └── sha256:71e90130cb912fbcff6556c0395878a8e7a0c7244eb8e8ee9001e84f9cba804a

This is the same structure I saw above when using the command on the Docker Hub image. Pulling the manifests reveals that they are precisely the same as the ones from Docker Hub.

# Pull the manifest for the image
oras manifest fetch tsmacrwcusocitest.azurecr.io/flasksample@sha256:b89e2098603bead4f07e318e1a4e11b4a4ef1f3614725c88b3fcdd469d55c0e0 > manifest-sha256-b89e2098603bead4f07e318e1a4e11b4a4ef1f3614725c88b3fcdd469d55c0e0.json

# Pull the manifest for the SPDX SBOM in JSON format
oras manifest fetch tsmacrwcusocitest.azurecr.io/flasksample@sha256:71e90130cb912fbcff6556c0395878a8e7a0c7244eb8e8ee9001e84f9cba804a > manifest-sha-71e90130cb912fbcff6556c0395878a8e7a0c7244eb8e8ee9001e84f9cba804a.json

# Pull the manifest for the SPDX SBOM in TEXT format
oras manifest fetch tsmacrwcusocitest.azurecr.io/flasksample@sha256:0fbd0e611ec9fe620b72ebe130da680de9402e1e241b30c2aa4515610ed2d766 > manifest-sha-0fbd0e611ec9fe620b72ebe130da680de9402e1e241b30c2aa4515610ed2d766.json

# Pull the manifest for the CycloneDX SBOM in JSON format
oras manifest fetch tsmacrwcusocitest.azurecr.io/flasksample@sha256:198e405344b5fafd6127821970eb4a84129ae729402e8c2c71fc1bb80abf0954 > manifest-sha256-198e405344b5fafd6127821970eb4a84129ae729402e8c2c71fc1bb80abf0954.json

All manifests are available in the acr folder in my container secure supply chain playground repository on GitHub.

Luckily, ACR has CLI commands to list the manifests. Those commands call ACR’s proprietary APIs to gather the information. There are two ACR CLI commands I can use to list the manifests for a repository: acr manifest list and acr manifest metadata list . At the time of this writing acr manifest list had a bug and couldn’t list the OCI artifact manifests. acr manifest list-metadata worked fine and I could list all manifests in the repository. The output from the acr manifest list-metadata command is available here. From the output, I can see that only four manifests were created. There is no manifest index that points to the three artifacts. Here is a visual of how the manifests are related in an OCI 1.1 compliant registry:

To summarize the differences between the OCI 1.0 and OCI 1.1 referrers’ implementation:

  • In OCI 1.0 compliant registries, you will see an additional index manifest that is tagged with the image digest
  • In OCI 1.0 compliant registries, the index manifest lists the artifacts related to the image
  • In OCI 1.0 compliant registries, the artifact manifests still refer to the image using the subject field

I will look at how this impacts the content in your registry in the second part of this series.

Referrers Support Across Registries

Here is a table that shows the current (as of Jan 5th, 2023) support in registries.

The manifests and the debug logs are available in the corresponding registry folders in the cssc-pipeline repository on GitHub. You can refer to those for details.

Note: The investigation is done using the ORAS tool – the only one I am aware of that can create references between artifacts at the time of this writing. It may be possible to craft manifests manually and push them to the registries failing with ORAS.

Learnings

In addition to the above, I learned a few more things while experimenting with different registries.

  • As far as I know, OCI does not specify an API to list untagged manifests in a registry. This can be a problem because the attached artifacts do not have tags but only digests. I am pretty sure I ended up with some orphaned artifacts in the registries that do not support artifact referrers. Unfortunately, I cannot be sure due to the lack of such an API.
  • Registries are inconsistent in their responses when the capabilities are not supported. In my opinion, there is a lack of feedback on what capabilities each registry supports, which makes it hard for the clients. An easy way to check the capabilities of a registry would be beneficial.

In the next post of the series, I will go over more advanced scenarios like promotion between registries and building deeper hierarchies.

Photo by Petrebels on Unsplash

In the last post of the series about Sigstore, I will look at the most exciting part of the implementation – ephemeral keys, or what the Sigstore team calls keyless signing. The post will go over the second and third scenarios I outlined in Implementing Containers’ Secure Supply Chain with Sigstore Part 1 – Signing with Existing Keys and go deeper into the experience of validating artifacts and moving artifacts between registries.

Using Sigstore to Sign with Ephemeral Keys

Using Cosign to sign with ephemeral keys is still an experimental feature and will be released in v1.14.0 (see the following PR). Signing with ephemeral keys is relatively easy.

$ COSIGN_EXPERIMENTAL=1 cosign sign 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Generating ephemeral keys...
Retrieving signed certificate...
 Note that there may be personally identifiable information associated with this signed artifact.
 This may include the email address associated with the account with which you authenticate.
 This information will be used for signing this artifact and will be stored in public transparency logs and cannot be removed later.
 By typing 'y', you attest that you grant (or have permission to grant) and agree to have this information stored permanently in transparency logs.
Are you sure you want to continue? (y/[N]): y
Your browser will now be opened to:
https://oauth2.sigstore.dev/auth/auth?access_type=online&client_id=sigstore&code_challenge=e16i62r65TuJiklImxYFIr32yEsA74fSlCXYv550DAg&code_challenge_method=S256&nonce=2G9cB5h89SqGwYQG2ey5ODeaxO8&redirect_uri=http%3A%2F%2Flocalhost%3A33791%2Fauth%2Fcallback&response_type=code&scope=openid+email&state=2G9cB7iQ7BSXYQdKKe6xGOY2Rk8
Successfully verified SCT...
Warning: Tag used in reference to identify the image. Consider supplying the digest for immutability.
"562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample" appears to be a private repository, please confirm uploading to the transparency log at "https://rekor.sigstore.dev" [Y/N]: y
tlog entry created with index: 5133131
Pushing signature to: 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample

You are sent to authenticate using OpenID Connect (OIDC) via the browser. I used my GitHub account to authenticate.

Once authenticated, you are redirected back to localhost, where Cosign reads the code query string parameter from the URL and verifies the authentication.

Here is what the redirect URL looks like.

http://localhost:43219/auth/callback?code=z6dghpnzujzxn6xmfltyl6esa&state=2G9dbwwf9zCutX3mNevKWVd87wS

I have also pushed v2 and v3 of the image to the registry and signed them using the approach above. Here is the new state in my registry.

wdt_ID Artifact Tag Artifact Type Artifact Digest
1 v1 Image sha256:9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4
2 sha256-9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4.sig Signature sha256:483f2a30b765c3f7c48fcc93a7a6eb86051b590b78029a59b5c2d00e97281241
3 v2 Image sha256:d4d59b7e1eb7c55b0811c3dfd3571ab386afbe6d46dfcf83e06343e04ae888cb
4 sha256-d4d59b7e1eb7c55b0811c3dfd3571ab386afbe6d46dfcf83e06343e04ae888cb.sig Signature sha256:8c43d1944b4d0c3f0d7d6505ff4d8c93971ebf38fc60157264f957e3532d8fd7
5 v3 Image sha256:2e19bd9d9fb13c356c64c02c574241c978199bfa75fd0f46b62748f59fb84f0a
6 sha256:2e19bd9d9fb13c356c64c02c574241c978199bfa75fd0f46b62748f59fb84f0a.sig Signature sha256:cc2a674776dfe5f3e55f497080e7284a5bd14485cbdcf956ba3cf2b2eebc915f

If you look at the console output, you will also see that one of the lines mentions tlog in it. This is the index in Rekor transaction log where the signature’s receipt is stored. For the three signatures that I created, the indexes are:

5133131 for v1
5133528 for v2
and 5133614 for v3

That is it! I have signed my images with ephemeral keys, and I have the tlog entries that correspond to the signatures. It is a fast and easy experience.

Verifying Images Signed With Ephemeral Keys

Verifying the images signed with ephemeral keys is built into the Cosign CLI.

$ COSIGN_EXPERIMENTAL=1 cosign verify 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 | jq . > flasksample-v1-ephemeral-verification.json
Verification for 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 --
The following checks were performed on each of these signatures:
- The cosign claims were validated
- Existence of the claims in the transparency log was verified offline
- Any certificates were verified against the Fulcio roots.

The outputs from the verification of flasksample:v1, flasksample:v2, and flasksample:v3 are available on GitHub. Few things to note about the output from the verification.

  • The output JSON contains the logIndexas well as the logID, which I did assume I could use to search for the receipts in Rekor. I have some confusion about the logID purpose, but I will go into that a little later!
  • There is a body field that I assume is the actual signature. This JSON field is not yet documented and is hard to know with such a generic name.
  • The type field seems to be a free text field. I would expect it to be something more structured and the values to come from a list of possible and, most importantly, standardized types.

Search and Explore Rekor

The goal of my second scenario – Sign Container Images with Ephemeral Keys from Fulcio is not only to sign images with ephemeral keys but also to invalidate one of the signed artifacts. Unfortunately, documentation and the help output from the commands are scarce. Also, searching on Google how to invalidate a signature in Rekor yields no results. I decided to start exploring the Rekor logs to see if that may help.

There aren’t many commands that you can use in Rekor. The four things you can do are: get records; search by email, SHA or artifact; uploadentry or artifact; and verify entry or artifact. Using the information from the outputs in the previous section, I can get the entries for the three images I signed using the log indexes.

$ rekor-cli get --log-index 5133131 > flasksample-v1-ephemeral-logentry.json
$ rekor-cli get --log-index 5133528 > flasksample-v2-ephemeral-logentry.json
$ rekor-cli get --log-index 5133614 > flasksample-v3-ephemeral-logentry.json

The outputs from the above commands for flasksample:v1, flasksample:v2, and flasksample:v3 are available on GitHub.

I first noted that the log entries are not returned in JSON format by the Rekor CLI. This is different from what Cosign returns and is a bit inconsistent. Second, the log entries outputted by the Rekor CLI are not the same as the verification outputs returned by Cosign. Cosign verification output provides different information than the Rekor log entry. This begs the question: “How does Cosign get this information?” First, though, let’s see what else Rekor can give me.

I can use Rekor search to find all the log entries that I created. This will include the ones for the three images above and, theoretically, everything else I signed.

$ rekor-cli search --email toddysm_dev1@outlook.com
Found matching entries (listed by UUID):
24296fb24b8ad77aaf485c1d70f4ab76518483d5c7b822cf7b0c59e5aef0e032fb5ff4148d936353
24296fb24b8ad77a3f43ac62c8c7bab7c95951d898f2909855d949ca728ffd3426db12ff55390847
24296fb24b8ad77ac2334dfe2759c88459eb450a739f08f6a16f5fd275431fb42c693974af3d5576
24296fb24b8ad77a8f14877c718e228e315c14f3416dfffa8d5d6ef87ecc4f02f6e7ce5b1d5b4e95
24296fb24b8ad77a6828c6f9141b8ad38a3dca4787ab096dca59d0ba68ff881d6019f10cc346b660
24296fb24b8ad77ad54d6e9bb140780477d8beaf9d0134a45cf2ded6d64e4f0d687e5f30e0bb8c65
24296fb24b8ad77a888dc5890ac4f99fc863d3b39d067db651bf3324674b85a62e3be85066776310
24296fb24b8ad77a47fae5af8718673a2ef951aaf8042277a69e808f8f59b598d804757edab6a294
24296fb24b8ad77a7155046f33fdc71ce4e291388ef621d3b945e563cb29c2e3cd6f14b9ba1b3227
24296fb24b8ad77a5fc1952295b69ca8d6f59a0a7cbfbd30163c3a3c3a294c218f9e00c79652d476

Note that the result lists UUIDs that are different from the logID properties in the verification output JSON. You can get log entries using the UUID or the logIndex but not using the logID. The UUIDs are not present in the Cosign output mentioned in the previous section, while the logID is. However, it is unclear what the logID can be used for and why the UUID is not included in the Cosign output.

Rekor search command supposedly allows you to search by artifact and SHA. However, it is not documented what form those need to take. Using the image name or the image SHA yield no results.

$ rekor-cli search --artifact 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample
Error: invalid argument "562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample" for "--artifact" flag: Key: '' Error:Field validation for '' failed on the 'url' tag
$ rekor-cli search --sha sha256:9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4
no matching entries found
$ rekor-cli search --sha 9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4
no matching entries found

I think the above are the core search scenarios for container images (and other artifacts), but it seems they are either not implemented or not documented. Neither the Rekor GitHub repository, the Rekor public documentation, nor the Rekor Swagger have any more details on the search. I filed an issue for Rekor to ask how the artifacts search works.

Coming back to the main goal of invalidating a signed artifact, I couldn’t find any documentation on how to do that. The only apparent options to invalidate the artifacts are either uploading something to Rekor or removing the signature from Rekor. I looked at all options to upload entries or artifacts to Rekor, but the documentation mainly describes how to sign and upload entries using other types like SSH, X509, etc. It does seem to me that there is no capability in Rekor to say: “This artifact is not valid anymore”.

I thought that looking at how Rekor verifies signatures may help me understand the approach.

Verifying Signatures Using Rekor CLI

I decided to explore how the signatures are verified and reverse engineer the process to understand if an artifact signature can be invalidated. Rekor CLI has a verify command. My assumption was that Rekor’s verify command worked the same as the Cosign verify command. Unfortunately, that is not the case.

$ rekor-cli verify --artifact 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Error: invalid argument "562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1" for "--artifact" flag: Key: '' Error:Field validation for '' failed on the 'url' tag
$ rekor-cli verify --entry 24296fb24b8ad77a8f14877c718e228e315c14f3416dfffa8d5d6ef87ecc4f02f6e7ce5b1d5b4e95
Error: invalid argument "24296fb24b8ad77a8f14877c718e228e315c14f3416dfffa8d5d6ef87ecc4f02f6e7ce5b1d5b4e95" for "--entry" flag: Key: '' Error:Field validation for '' failed on the 'url' tag

Unfortunately, due to a lack of documentation and examples, I wasn’t able to figure out how this worked without browsing the code. While that kind of digging is always an option, I would expect an easier experience as an end user.

I was made aware of the following blog post, though. It describes how to handle account compromise. To put it in context, if my GitHub account is compromised, this blog post describes the steps I need to take to invalidate the artifacts. I do have two problems with this proposal:

  1. As you remember, in my scenario, I wanted to invalidate only the flasksample:v2 artifact, and not all artifacts signed with my account. If I follow the proposal in the blog post, I will invalidate everything signed with my GitHub account, which may result in outages.
  2. The proposal relies on the consumer of artifacts to constantly monitor the news for what is valid and what is not; which GitHub account is compromised and which one is not. This is unrealistic and puts too much manual burden on the consumer of artifacts. In an ideal scenario, I would expect the technology to solve this with a proactive way to notify the users if something is wrong rather than expect them to learn reactively.

At this point in time, I will call this scenario incomplete. Yes, I am able to sign with ephemeral keys, but this doesn’t seem unique in this situation. The ease around the key generation is what they seem to be calling attention to, and it does make signing much less intimidating to new users, but I could still generate a new SSH or GPG key every time I need to sign something. Trusting Fulcio’s root does not automatically increase my security – I would even argue the opposite. Making it easier for everybody to sign does not increase security, either. Let’s Encrypt already proved that. While Let’s Encrypt made an enormous contribution to our privacy and helped secure every small business site, the ease, and accessibility with which it works means that every malicious site now also has a certificate. The lock in the address bar is no longer a sign of security. We are all excited about the benefits, but I bet very few of us are also excited for this to help the bad guys. We need to think beyond the simple signing and ensure that the whole end-to-end experience is secure.

I will move to the last scenario now.

Promoting Sigstore Signed Images Between Registries

In the last scenario I wanted to test the promotion of images between registries. Let’s create a v4 of the image and sign it using an ephemeral key. Here are the commands with the omitted output.

$ docker build -t 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v4 .
$ docker push 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v4
$ COSIGN_EXPERIMENTAL=1 cosign sign 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v4

The Rekor log index for the signature is 5253114. I can use Crane to copy the image and the signature from AWS ECR into Azure ACR.

$ crane copy 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v4 tsmacrtestcssc.azurecr.io/flasksample:v4
$ crane copy 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:sha256-aa2690ed4a407ac8152d24017eb6955b01cbb0fc44afe170dadedc30da80640a.sig tsmacrtestcssc.azurecr.io/flasksample:sha256-aa2690ed4a407ac8152d24017eb6955b01cbb0fc44afe170dadedc30da80640a.sig

Also, let’s validate the ephemeral key signature using the image in Azure ACR.

$ COSIGN_EXPERIMENTAL=1 cosign verify tsmacrtestcssc.azurecr.io/flasksample:v4 | jq .
Verification for tsmacrtestcssc.azurecr.io/flasksample:v4 --
The following checks were performed on each of these signatures:
 - The cosign claims were validated
 - Existence of the claims in the transparency log was verified offline
 - Any certificates were verified against the Fulcio roots.

Next, I will sign the image with a key stored in Azure Key Vault and verify the signature.

$ cosign sign --key azurekms://tsm-kv-usw3-tst-cssc.vault.azure.net/sigstore-azure-test-key-ec tsmacrtestcssc.azurecr.io/flasksample:v4
Warning: Tag used in reference to identify the image. Consider supplying the digest for immutability.
Pushing signature to: tsmacrtestcssc.azurecr.io/flasksample
$ cosign verify --key azurekms://tsm-kv-usw3-tst-cssc.vault.azure.net/sigstore-azure-test-key-ec tsmacrtestcssc.azurecr.io/flasksample:v4
Verification for tsmacrtestcssc.azurecr.io/flasksample:v4 --
The following checks were performed on each of these signatures:
 - The cosign claims were validated
 - The signatures were verified against the specified public key
[{"critical":{"identity":{"docker-reference":"tsmacrtestcssc.azurecr.io/flasksample"},"image":{"docker-manifest-digest":"sha256:aa2690ed4a407ac8152d24017eb6955b01cbb0fc44afe170dadedc30da80640a"},"type":"cosign container image signature"},"optional":null}]

Everything worked as expected. This scenario was very smooth, and I was able to complete it in less than a minute.

Summary

So far, I have just scratched the surface of what the Sigstore project could accomplish. While going through the scenarios in these posts, I had a bunch of other thoughts, so I wanted to highlight a few below:

  • Sigstore is built on a good idea to leverage ephemeral keys for signing container images (and other software). However, just the ephemeral keys alone do not provide higher security if there is no better process to invalidate the signed artifacts. With traditional X509 certificates, one can use CRL (Certificate Revocation Lists) or OCSP (Online Certificate Status Protocol) to revoke certificates. Although they are critiqued a lot, the process of invalidating artifacts using ephemeral keys and Sigstore does not seem like an improvement at the moment. I look forward to the improvements in this area as further discussions happen.
  • Sigstore, like nearly all open-source projects, would benefit greatly from better documentation and consistency in the implementation. Inconsistent messages, undocumented features, myriad JSON schemas, multiple identifiers used for different purposes, variable naming conventions in JSONs, and unpredictable output from the command line tools are just a few things that can be improved. I understand that some of the implementation was driven by requirements to work with legacy registries but going forward, that can be simplified by using OCI references. The bigger the project grows, the harder it will become to fix those.
  • The experience that Cosign offers is what makes the project successful. Signing and verifying images using the legacy X.509 and the ephemeral keys is easy. Hiding the complexity behind a simple CLI is a great strategy to get adoption.

I tested Sigstore a year ago and asked myself the question: “How do I solve the SolarWinds exploit with Sigstore?” Unfortunately, Sigstore doesn’t make it easier to solve that problem yet. Having in mind my experience above, I would expect a lot of changes in the future as Sigstore matures.

Unfortunately, there is no viable alternative to Sigstore on the market today. Notary v1 (or Docker Content Trust) proved not flexible enough. Notary v2 is still in the works and has yet to show what it can do. However, the lack of alternatives does not automatically mean that we avoid the due diligence required for a security product of such importance.  Sigstore has had a great start, and this series proves to me that we’ve got a lot of work ahead of us as an industry to solve our software supply chain problems.

In my previous post, Implementing Containers’ Secure Supply Chain with Sigstore Part 1 – Signing with Existing Keys, I went over the Cosign experience of signing images with existing keys. As I concluded there, the signing was easy to achieve, with just a few hiccups here and there. It does seem that Cosign does a lot behind the scenes to make it easy. Though, after looking at the artifacts stored in the registry, I got curious of how the signatures and attestations are saved. Unfortunately, the Cosign specifications are a bit light on details, and it seems they were created after or evolved together with the implementation. Hence, I decided to go with the reverse-engineering approach to understand what is saved in the registry.

At this post’s end, I will validate the signatures using Cosign CLI and complete my first scenario.

The Mystery Behind Cosign Artifacts

First, to be able to store Cosign artifacts in a registry, you need to use an OCI-compliant registry. When Cosign signs a container image, an OCI artifact is created and pushed to the registry. Every OCI artifact has a manifest and layers. The manifest is standardized, but the layers can be anything that can be packed in a tarball. So, Cosign’s signature should be in the layer pushed to the registry, and the manifest should describe the signature artifact. For image signatures, Cosign tags the manifest with a tag that uses the following naming convention sha-<image-sha>.sig.

From the examples in my previous post, when Cosign signed the 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 image, it created a new artifact and tagged it sha256-9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4.sig. Here are the details of the image that was signed.

And here are the details of the signature artifact.

What is Inside Cosign Signature Artifact?

I was curious about what the signature artifact looks like. Using Crane, I can pull the signature artifact. All files are available in my Github test repository.

# Sign into the registry
$ aws ecr get-login-password --region us-west-2 | crane auth login --username AWS --password-stdin 562077019569.dkr.ecr.us-west-2.amazonaws.com

# Pull the signature manifest
$ crane manifest 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:sha256-9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4.sig | jq . > flasksample-v1-signature-manifest.json

# Pull the signature artifact as a tarball and unpack it into ./sigstore-signature
$ crane pull 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:sha256-9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4.sig flasksample-v1-signature.tar.gz
$ mkdir sigstore-signature
$ tar -xvf flasksample-v1-signature.tar.gz -C ./sigstore-signature
$ cd sigstore-signature/
$ ls -al
total 20
drwxrwxr-x 2 toddysm toddysm 4096 Oct 14 10:08 .
drwxrwxr-x 3 toddysm toddysm 4096 Oct 14 10:07 ..
-rw-r--r-- 1 toddysm toddysm  272 Dec 31  1969 09b3e371137191b52fdd07bdf115824b2b297a2003882e68d68d66d0d35fe1fc.tar.gz
-rw-r--r-- 1 toddysm toddysm  319 Dec 31  1969 manifest.json
-rw-r--r-- 1 toddysm toddysm  248 Dec 31  1969 sha256:00ce5fed483997c24aa0834081ab1960283ee9b2c9d46912bbccc3f9d18e335d
$ tar -xvf 09b3e371137191b52fdd07bdf115824b2b297a2003882e68d68d66d0d35fe1fc.tar.gz 
tar: This does not look like a tar archive

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

Surprisingly to me, the inner tarball (09b3e371137191b52fdd07bdf115824b2b297a2003882e68d68d66d0d35fe1fc.tar.gz) does not seem to be a tarball, although it has the proper extensions. I assumed this was the actual signature blob, but I couldn’t confirm without knowing how to manipulate that archive. Interestingly, opening the file in a simple text editor reveals that it is a plain JSON file with an .tar.gz extension. Looking into the other two files  manifest.json and sha256:00ce5fed483997c24aa0834081ab1960283ee9b2c9d46912bbccc3f9d18e335d it looks like all the files are some kind of manifests, but not very clear what for. I couldn’t find any specification explaining the content of the layer and the meaning of the files inside it.

Interestingly, Cosign offers a tool to download the signature.

$ cosign download signature 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 | jq . > flasksample-v1-cosign-signature.json

The resulting signature is available in the GitHub repository. The page linked above claims that you can verify the signature in another tool, but I couldn’t immediately find details on how to do that. I decided to leave this for some other time.

The following stand out from this experience.

  • First (and typically a red flag when I am evaluating software), why the JSON file has a tarball file extension? Normally, this is malicious practice, and it’s especially concerning in this context. I am sure it will get fixed, and an explanation will be provided now that I have filed an issue for it.
  • Why are there so many JSON files? Trying to look at all the available Cosign documentation, I couldn’t find any architectural or design papers that explain those decisions. It seems to me that those things got hacked on top of each other when a need arose. There may be GitHub issues discussing those design decisions, but I didn’t find any in a quick search.
  • The signature is not in any of the files that I downloaded. The signature is stored as an OCI annotation in the manifest called dev.cosignproject.cosign/signature. So, do I even need the rest of the artifact?
  • Last but not least, it seems that Cosign tool is the only one that understands how to work with the stored artifacts, which may result in a tool lock-in. Although there is a claim that I can verify the signature in a different tool, without specification, it will be hard to implement such a tool.

What is Inside Cosign Attestation Artifact?

Knowing how the Cosign signatures work, I would expect something similar for the attestations. The mental hierarchy I have built in my mind is the following:

+ Image
  - Image signature
  + Attestation
    - Attestation signature

Unfortunately, this is not the case. There is no separate artifact for the attestation signature. Here are the steps to download the attestation artifact.

# Pull the attestation manifest
$ crane manifest 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:sha256-9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4.att | jq . > flasksample-v1-attestation-manifest.json

# Pull the attestation artifact as a tarball and unpack it into ./sigstore-attestation
$ crane pull 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:sha256-9bd049b6b470118cc6a02d58595b86107407c9e288c0d556ce342ea8acbafdf4.sig flasksample-v1-attestation.tar.gz
$ mkdir sigstore-attestation
$ tar -xvf flasksample-v1-attestation.tar.gz -C ./sigstore-attestation/
sha256:728e26b36817753d90d7de8420dacf4fa1bcf746da2b54bb8c59cd047a682198
c9880779c90158a29f7a69f29c492551261e7a3936247fc75f225171064d6d32.tar.gz
ff626be9ff3158e9d2118072cd24481d990a5145d10109affec6064423d74cc4.tar.gz
manifest.json

All attestation files are available in my Github test repository. Knowing what was done for the signature, the results are somewhat what I would have expected. Also, I think I am getting a sense of the design by slowly reverse-engineering it.

The manifest.json file describes the archive. It points to the config sha256:728e26b36817753d90d7de8420dacf4fa1bcf746da2b54bb8c59cd047a682198 file and the two layers c9880779c90158a29f7a69f29c492551261e7a3936247fc75f225171064d6d32.tar.gz and ff626be9ff3158e9d2118072cd24481d990a5145d10109affec6064423d74cc4.tar.gz. I was not sure what the config file sha256:728e26b36817753d90d7de8420dacf4fa1bcf746da2b54bb8c59cd047a682198 is used for, so I decided to ignore it. The two layer JSONs (which were both JSON files, despite the tar.gz extensions) were more interesting, so I decided to dig more into them.

The first thing to note is that the layer JSONs (here and here) for the attestations have a different format from the layer JSON for the signature. While the signature seems to be something proprietary to Cosign, the attestations have a payload type application/vnd.in-toto+json, which hints at something more widely accepted. While this is not an official IANA media type, there is at least in-toto specification published. The payload looks a lot like a Base64 encoded string, so I gave it a try.

# This decodes the SLSA provenance attestation
$ cat ff626be9ff3158e9d2118072cd24481d990a5145d10109affec6064423d74cc4.tar.gz | jq -r .payload | base64 -d | jq . > ff626be9ff3158e9d2118072cd24481d990a5145d10109affec6064423d74cc4.tar.gz.payload.json

# And this decodes the SPDX SBOM
$ cat c9880779c90158a29f7a69f29c492551261e7a3936247fc75f225171064d6d32.tar.gz | jq -r .payload | base64 -d | jq . > c9880779c90158a29f7a69f29c492551261e7a3936247fc75f225171064d6d32.tar.gz.payload.json

Both files are available here and here. If I want to get the SBOM or the SLSA provenance, I need to get the  predicate value from the above JSONs, JSON decode it, and then use it if I want. I didn’t go into that because that was not part of my goals for this experiment.

Note one thing! As you remember from the beginning of the section, I expected to have signatures for the attestations, and I do! They are not where I expected though. The signatures are part of the layer JSONs (the ones with the strange extensions). If you want to extract the signatures, you need to get the signatures value from them.

# Extract the SBOM attestation signature
$ cat c9880779c90158a29f7a69f29c492551261e7a3936247fc75f225171064d6d32.tar.gz | jq -r .signatures > flasksample-v1-sbom-signatures.json

# Extract the SLSA provenance attestation signature
cat ff626be9ff3158e9d2118072cd24481d990a5145d10109affec6064423d74cc4.tar.gz | jq -r .signatures > flasksample-v1-slsa-signature.json

The SBOM signature and the SLSA provenance signature are available on GitHub. For whatever reason, the key ID is left blank.

Here are my takeaways from this experience.

  • Cosign uses a myriad of nonstandard JSON file formats to store signatures and attestations. These still need documentation and to be standardized (except the in-toto one).
  • To get to the data, I need to make several conversations from JSON to Base64 to JSON-encoded, which increases not only the computation power that I need to use but also the probability of errors and bugs, so I would recommend making that simpler.
  • All attestations are stored in a single OCI artifact, and there is no way to retrieve a single attestation based on its type. In my example, if I need to get only the SLSA provenance (785 bytes), I still need to download the SBOM, which is 1.5 MB. This is 1500 times more data than I need. The impact will be performance and bandwidth cost, and at scale, would make this solution the wrong one for me.

Verifying Signatures and Attestations with Cosign

Cosign CLI has commands for signature and attestation verification.

# This one verifies the image signature
$ cosign verify --key awskms:///61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 > sigstore-verify-signature-output.json

Verification for 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 --
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - The signatures were verified against the specified public key

# This one verifies the image attestations
$ cosign verify-attestation --key awskms:///61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 > sigstore-verify-attestation-output.json

Verification for 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 --
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - The signatures were verified against the specified public key

The output of the signature verification and the output of the attestation verification are available in Github. The signature verification is as expected. The attestation verification is more compelling, though. It is another JSON file in non-standard JSON. It seems as though the developer took the two JSONs from the blobs and concatenated them, perhaps unintentionally. Below is a screenshot of the output loaded in Visual Studio Code. Notice that there is no comma between lines 10 and 11. Also, the JSON objects are not wrapped in a JSON array. I’ve opened another issue for the non-standard JSON output from the verification command.

With this, I was able to complete my first scenario – Sign Container Images With Existing Keys Stored in a KMS.

Summary

Here is the summary for the second part of my experience:

  • It seems that the Sigstore implementation grew organically, and some specs were written after the implementation was done. Though many pieces are missing specifications and documentation, it will be hard to develop third-party tooling or even maintain the code easily until those are written. The more the project grows, the harder and slower it will be to add new capabilities, and the risk of unintended side effects and even security bugs will grow.
  • There are certain architectural choices that I would question. I have already mentioned the issue with saving all attestations in a single artifact and the numerous proprietary manifests and JSON files. I would also like to dissect the lack of separation between Cosign CLI and Cosign libraries. If they were separate, it would be easier to use them in third-party tooling to verify or sign artifacts.
  • Finally, the above two tell me that there will be a lot of incompatible changes in the product going forward. I would expect this from an MVP but not from a V1 product that I will use in production. If the team wants to move to a cleaner design and more flexible architecture, a lot of the current data formats will change. This, of course, can be hidden behind the Cosign CLI, but that means that I need to take a hard dependency on it. It will be interesting to understand the plan for verification scenarios and how Cosign can be integrated with various policies and admission controllers. Incompatible changes in the verification scenario can, unfortunately, result in production outages.

My biggest concern so far is the inconsistent approach to the implementation and the lax architectural principles and documentation of the design decisions. On the bright side, verifying the signatures and the attestations using the Cosign CLI was very easy and smooth.

In my next post, I will look at the ephemeral key signing scenario and the capabilities to revoke signed artifacts. I will also look at the last scenario that involves the promotion and re-signing of artifacts.

Photo by ammiel jr on Unsplash

Today, the secure supply chain for software is on top of mind for every CISO and enterprise leader. After the President’s Executive Order (EO), many efforts were spun off to secure the supply chain. One of the most prominent is, of course, Sigstore. I looked at Sigstore more than a year ago and was excited about the idea of ephemeral keys. I thought it might solve some common problems with signing. Like, for example, reducing the blast radius if a signing key is compromised or signing identity is stolen.

Over the past twelve months, I’ve spent a lot of time working on a secure supply chain for containers at Microsoft and gained a deep knowledge of the use cases and myriad of scenarios. At the same time, Sigstore gained popularity, and more and more companies started using it to secure their container supply chains. I’ve followed the project development and the growth in popularity. In recent weeks, I decided to take another deep look at the technology and evaluate how it will perform against a few core scenarios to secure container images against supply chain attacks.

This will be a three-part series going over the Sigstore experience for signing containers. In the first part, I will look at the experience of signing with existing long-lived keys as well as adding attestations like SBOMs and SLSA provenance documents. In the second part, I will go deeper into the artifacts created during the signing and reverse-engineer their purpose. In the third part, I will look at the signing experience with short-lived keys as well as promoting signatures between registries.

Before that, though, let’s look at some scenarios that I will use to guide my experiment.

Containers’ Supply Chain Scenarios

Every technology implementation (I believe) should start with user scenarios. Signing container images is not a complete scenario but a part of a larger experience. Below are the experiences that I would like to test as part of my experiment. Also, I will do this using the top two cloud vendors – AWS and Azure.

Sign Container Images With Existing Keys Stored in a KMS

In this scenario, I will sign the images with keys that are already stored in my cloud key management systems (ASKW KMS or Azure Key Vault). The goal here is to enable enterprises to use existing keys and key management infrastructure for signing. Many enterprises already use this legacy signing scenario for their software, so there is nothing revolutionary here except the additional artifacts.

  1. Build a v1 of a test image
  2. Push the v1 of the test image to a registry
  3. Sign the image with a key stored in a cloud KMS
  4. Generate an SBOM for the container image
  5. Sign the SBOM and push it to the registry
  6. Generate an SLSA provenance attestation
  7. Sign the SLSA provenance attestation and push it to the registry
  8. Pull and validate the SBOM
  9. Pull and validate the SLSA provenance attestation

A note! I will cheat with the SLSA provenance attestations because the SLSA tooling works better in CI/CD pipelines than with manual Docker build commands that I will use for my experiment.

Sign Container Images with Ephemeral Keys from Fulcio

In this scenario, I will test how the signing with ephemeral keys (what Sigstore calls keyless signing) improves the security of the containers’ supply chain. Keyless signing is a bit misleading because keys are still involved in generating the signature. The difference is that the keys are generated on-demand by Fulcio and have a short lifespan (I believe 10 min or so). I will not generate SBOMs and SLSA provenance attestations for this second scenario, but you can assume that this may also be part of it in a real-life application. Here is what I will do:

  1. Build a v1 of a test image
  2. Push the v1 of the test image to a registry
  3. Sign the image with an ephemeral key
  4. Build a v2 of the test image and repeat steps 2 and 3 for it
  5. Build a v3 of the test image and repeat steps 2 and 3 for it
  6. Invalidate the signature for v2 of the test image

The premise of this scenario is to test a temporary exploit of the pipeline. This is what happened with SolarWinds Supply Chain Compromise, and I would like to understand how we might be able to use Sigstore to prevent such an attack in the future or how it could reduce the blast radius. I don’t want to invalidate the signatures for v1 and v3 because this will be similar to the traditional signing approach with long-lived keys.

Acquire OSS Container Image and Re-Sign for Internal Use

This is a common scenario that I’ve heard from many customers. They import images from public registries, verify them, scan them, and then want to re-sign them with internal keys before allowing them for use. So, here is what I will do:

  1. Build an image
  2. Push it to the registry
  3. Sign it with an ephemeral key
  4. Import the image and the signature from one registry (ECR) into another (ACR)
    Those steps will simulate importing an image signed with an ephemeral key from an OSS registry like Docker Hub or GitHub Container Registry.
  5. Sign the image with a key from the cloud KMS
  6. Validate the signature with the cloud KMS certificate

Let’s get started with the experience.

Environment Set Up

To run the commands below, you will need to have AWS and Azure accounts. I have already created container registries and set up asymmetric keys for signing in both cloud vendors. I will not go over the steps for setting those up – you can follow the vendor’s documentation for that. I have also set up AWS and Azure CLIs so I can sign into the registries, run other commands against the registries and retrieve the keys from the command line. Once again, you can follow the vendor’s documentation to do that. Now, let’s go over the steps to set up Sigstore tooling.

Installing Sigstore Tooling

To go over the scenarios above, I will need to install the Cosign and Rekor CLIs. Cosign is used to sign the images and also interacts with Fulcio to obtain the ephemeral keys for signing. Rekor is the transparency log that keeps a record of the signatures done by Cosign using ephemeral keys.

When setting up automation for either signing or signature verification, you will need to install Cosign only as a tool. If you need to add or retrieve Rekor records that are not related to signing or attestation, you will need to install Rekor CLI.

You have several options to install Cosign CLI; however, the only documented option to install Rekor CLI is using Golang or building from source (for which you need Golang). One note: the installation instructions for all Sigstore tools are geared toward Golang developers.

The next thing is that on the Sigstore documentation site, I couldn’t find information on how to verify that the Cosign binaries I installed were the ones that Sigstore team produced. And the last thing that I noticed after installing the CLIs is the details I got about the binaries. Running cosign version and rekor-cli version gives the following output.

$ cosign version
  ______   ______        _______. __    _______ .__   __.
 /      | /  __  \      /       ||  |  /  _____||  \ |  |
|  ,----'|  |  |  |    |   (----`|  | |  |  __  |   \|  |
|  |     |  |  |  |     \   \    |  | |  | |_ | |  . `  |
|  `----.|  `--'  | .----)   |   |  | |  |__| | |  |\   |
 \______| \______/  |_______/    |__|  \______| |__| \__|
cosign: A tool for Container Signing, Verification and Storage in an OCI registry.

GitVersion:    1.13.0
GitCommit:     6b9820a68e861c91d07b1d0414d150411b60111f
GitTreeState:  "clean"
BuildDate:     2022-10-07T04:37:47Z
GoVersion:     go1.19.2
Compiler:      gc
Platform:      linux/amd64Sigstore documentation site
$ rekor-cli version
  ____    _____   _  __   ___    ____             ____   _       ___
 |  _ \  | ____| | |/ /  / _ \  |  _ \           / ___| | |     |_ _|
 | |_) | |  _|   | ' /  | | | | | |_) |  _____  | |     | |      | |
 |  _ <  | |___  | . \  | |_| | |  _ <  |_____| | |___  | |___   | |
 |_| \_\ |_____| |_|\_\  \___/  |_| \_\          \____| |_____| |___|
rekor-cli: Rekor CLI

GitVersion:    v0.12.2
GitCommit:     unknown
GitTreeState:  unknown
BuildDate:     unknown
GoVersion:     go1.18.2
Compiler:      gc
Platform:      linux/amd64

Cosign CLI provides details about the build of the binary, Rekor CLI does not. Using the above process to install the binaries may seem insecure, but this seems to be by design, as explained in Sigstore Issue #2300: Verify the binary downloads when installing from .deb (or any other binary release).

Here is the catch, though! I looked at the above experience as a novice user going through the Sigstore documentation. Of course, as with any other technical documentation, this one is incomplete and not updated with the implementation. There is no documentation on how to verify the Cosign binary, but there is one describing how to verify Rekor binaries. If you go to the Sigstore Github organization and specifically to the Cosign and Rekor release pages, you will see that they’ve published the signatures and the SBOMs for both tools. You will also find binaries for Rekor that you can download. So you can verify the signature of the release binaries before installing. Here is what I did for Rekor CLI version that I had downloaded:

$ COSIGN_EXPERIMENTAL=1 cosign verify-blob \
    --cert https://github.com/sigstore/rekor/releases/download/v0.12.2/rekor-cli-linux-amd64-keyless.pem \
    --signature https://github.com/sigstore/rekor/releases/download/v0.12.2/rekor-cli-linux-amd64-keyless.sig \
    https://github.com/sigstore/rekor/releases/download/v0.12.2/rekor-cli-linux-amd64

tlog entry verified with uuid: 38665ab8dc42600de87ed9374e86c83ac0d7d11f1a3d1eaf709a8ba0d9a7e781 index: 4228293
Verified OK

Verifying the Cosign binary is trickier, though, because you need to have Cosign already installed to verify it. Here is the output if you already have Cosign installed and you want to move to a newer version:

$ COSIGN_EXPERIMENTAL=1 cosign verify-blob \
    --cert https://github.com/sigstore/cosign/releases/download/v1.13.0/cosign-linux-amd64-keyless.pem 
    --signature https://github.com/sigstore/cosign/releases/download/v1.13.0/cosign-linux-amd64-keyless.sig 
    https://github.com/sigstore/cosign/releases/download/v1.13.0/cosign-linux-amd64

tlog entry verified with uuid: 6f1153edcc399b22b016709a218127fc7d5e9fb7071cd4812a9847bf13f65190 index: 4639787
Verified OK

If you are installing Cosign for the first time and downloading the binaries from the release page, you can follow a process similar to the one for verifying Rekor releases. I have submitted an issue to update the Cosign documentation with release verification instructions.

I would rate the installation experience no worse than any other tool geared toward hardcore engineers.

Let’s get into the scenarios.

Using Cosign to Sign Container Images with a KMS Key

Here are the two images that I will use for the first scenario:

$ docker images
REPOSITORY                                                 TAG       IMAGE ID       CREATED         SIZE
562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample   v1        b40ba874cb57   2 minutes ago   138MB
tsmacrtestcssc.azurecr.io/flasksample                      v1        b40ba874cb57   2 minutes ago   138MB

Using Cosign With a Key Stored in AWS KMS

Let’s go over the AWS experience first.

# Sign into the registry
$ aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 562077019569.dkr.ecr.us-west-2.amazonaws.com
Login Succeeded

# And push the image after that
$ docker push 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1

Signing the container image with the AWS key was relatively easy. Though, be careful when you omit the host and make sure you add that third backslash; otherwise, you will get errors. Here is what I got on the first attempt, which puzzled me a little.

$ cosign sign --key awskms://61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Error: signing [562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1]: getting signer: reading key: kms get: kms specification should be in the format awskms://[ENDPOINT]/[ID/ALIAS/ARN] (endpoint optional)
main.go:62: error during command execution: signing [562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1]: getting signer: reading key: kms get: kms specification should be in the format awskms://[ENDPOINT]/[ID/ALIAS/ARN] (endpoint optional)

$ cosign sign --key awskms://arn:aws:kms:us-west-2:562077019569:key/61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Warning: Tag used in reference to identify the image. Consider supplying the digest for immutability.
Error: signing [562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1]: recursively signing: signing digest: getting fetching default hash function: getting public key: operation error KMS: GetPublicKey, failed to parse endpoint URL: parse "https://arn:aws:kms:us-west-2:562077019569:key": invalid port ":key" after host
main.go:62: error during command execution: signing [562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1]: recursively signing: signing digest: getting fetching default hash function: getting public key: operation error KMS: GetPublicKey, failed to parse endpoint URL: parse "https://arn:aws:kms:us-west-2:562077019569:key": invalid port ":key" after host

Of course, when I typed the URIs correctly, the image was signed, and the signature got pushed to the registry.

$ cosign sign --key awskms:///61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Warning: Tag used in reference to identify the image. Consider supplying the digest for immutability.
Pushing signature to: 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample

Interestingly, I didn’t get the tag warning when using the Key ID incorrectly. I got it when I used the ARN incorrectly as well as when I used the Key ID correctly. Also, I struggled to interpret the error messages, which made me wonder about the consistency of the implementation, but I will cover more about that in the conclusions.

One nice thing was that I was able to copy the Key ID and the Key ARN and directly paste them into the URI without modification. Unfortunately, this was not the case with Azure Key Vault 🙁 .

Using Cosign to Sign Container Images With Azure Key Vault Key

According to the Cosign documentation, I had to set three environment variables to use keys stored in Azure Key Vault. It looks as if service principal is the only authentication option that Cosign implemented. So, I created one and gave it all the necessary permissions to Key Vault. I’ve also set the required environment variables with the service principal credentials.

As I hinted above, my first attempt to sign with a key stored in Azure Key Vault failed. Unlike the AWS experience, copying the key identifier from the Azure Portal and pasting it into the URI (without the https:// part) won’t do the job.

$ cosign sign --key azurekms://tsm-kv-usw3-tst-cssc.vault.azure.net/keys/sigstore-azure-test-key/91ca3fb133614790a51fc9c04bd96890 tsmacrtestcssc.azurecr.io/flasksample:v1
Error: signing [tsmacrtestcssc.azurecr.io/flasksample:v1]: getting signer: reading key: kms get: kms specification should be in the format azurekms://[VAULT_NAME][VAULT_URL]/[KEY_NAME]
main.go:62: error during command execution: signing [tsmacrtestcssc.azurecr.io/flasksample:v1]: getting signer: reading key: kms get: kms specification should be in the format azurekms://[VAULT_NAME][VAULT_URL]/[KEY_NAME]

If you decipher the help text that you get from the error message: kms specification should be in the format azurekms://[VAULT_NAME][VAULT_URL]/[KEY_NAME], you would assume that there are two ways to construct the URI:

  1. Using the key vault name and the key name like this
    azurekms://tsm-kv-usw3-tst-cssc/sigstore-azure-test-key
    The assumption is that Cosign automatically appends .vault.azure.net at the end.
  2. Using the key vault hostname (not URL or identifier) and the key name like this
    azurekms://tsm-kv-usw3-tst-cssc.vault.azure.net/sigstore-azure-test-key

The first one just hung for minutes and did not complete. I’ve tried it several times, but the behavior was consistent.

$ cosign sign --key azurekms://tsm-kv-usw3-tst-cssc/sigstore-azure-test-key tsmacrtestcssc.azurecr.io/flasksample:v1
Warning: Tag used in reference to identify the image. Consider supplying the digest for immutability.
^C
$

I assume the problem is that it tries to connect to a host named tsm-kv-usw3-tst-cssc , but it seems that it was not timing out. The hostname one brought me a step further. It seems that the call to Azure Key Vault was made, and I got the following error:

$ cosign sign --key azurekms://tsm-kv-usw3-tst-cssc.vault.azure.net/sigstore-azure-test-key tsmacrtestcssc.azurecr.io/flasksample:v1
Warning: Tag used in reference to identify the image. Consider supplying the digest for immutability.
Error: signing [tsmacrtestcssc.azurecr.io/flasksample:v1]: recursively signing: signing digest: signing the payload: keyvault.BaseClient#Sign: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="Forbidden" Message="The user, group or application 'appid=04b07795-xxxx-xxxx-xxxx-02f9e1bf7b46;oid=f4650a81-f57d-4fb3-870c-e84fe859f68a;numgroups=1;iss=https://sts.windows.net/08c1c649-bfdd-439e-8e5b-5ff31c72ce4e/' does not have keys sign permission on key vault 'tsm-kv-usw3-tst-cssc;location=westus3'. For help resolving this issue, please see https://go.microsoft.com/fwlink/?linkid=2125287" InnerError={"code":"ForbiddenByPolicy"}
main.go:62: error during command execution: signing [tsmacrtestcssc.azurecr.io/flasksample:v1]: recursively signing: signing digest: signing the payload: keyvault.BaseClient#Sign: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="Forbidden" Message="The user, group or application 'appid=04b07795-8ddb-461a-bbee-02f9e1bf7b46;oid=f4650a81-f57d-4fb3-870c-e84fe859f68a;numgroups=1;iss=https://sts.windows.net/08c1c649-bfdd-439e-8e5b-5ff31c72ce4e/' does not have keys sign permission on key vault 'tsm-kv-usw3-tst-cssc;location=westus3'. For help resolving this issue, please see https://go.microsoft.com/fwlink/?linkid=2125287" InnerError={"code":"ForbiddenByPolicy"}

Now, this was a very surprising error. And mainly because the AppId from the error message (04b07795-xxxx-xxxx-xxxx-02f9e1bf7b46) didn’t match the AppId (or Client ID) of the environment variable that I have set as per the Cosign documentation.

$ echo $AZURE_CLIENT_ID
a59eaa16-xxxx-xxxx-xxxx-dca100533b89

Note that I masked parts of the IDs for privacy reasons.

My first assumption was that the AppId from the error message was for my user account, with which I signed in using Azure CLI. This assumption turned out to be true. Not knowing the intended behavior, I filed an issue for the Sigstore team to clarify and document the Azure Key Vault authentication behavior. After restarting the terminal (it seems to restart is the norm in today’s software products 😉 ), I was able to move another step forward. Now, having only signed in with the service principal credentials, I got the following error:

$ cosign sign --key azurekms://tsm-kv-usw3-tst-cssc.vault.azure.net/sigstore-azure-test-key tsmacrtestcssc.azurecr.io/flasksample:v1
Warning: Tag used in reference to identify the image. Consider supplying the digest for immutability.
Error: signing [tsmacrtestcssc.azurecr.io/flasksample:v1]: recursively signing: signing digest: signing the payload: keyvault.BaseClient#Sign: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="BadParameter" Message="Key and signing algorithm are incompatible. Key https://tsm-kv-usw3-tst-cssc.vault.azure.net/keys/sigstore-azure-test-key/91ca3fb133614790a51fc9c04bd96890 is of type 'RSA', and algorithm 'ES256' can only be used with a key of type 'EC' or 'EC-HSM'."
main.go:62: error during command execution: signing [tsmacrtestcssc.azurecr.io/flasksample:v1]: recursively signing: signing digest: signing the payload: keyvault.BaseClient#Sign: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="BadParameter" Message="Key and signing algorithm are incompatible. Key https://tsm-kv-usw3-tst-cssc.vault.azure.net/keys/sigstore-azure-test-key/91ca3fb133614790a51fc9c04bd96890 is of type 'RSA', and algorithm 'ES256' can only be used with a key of type 'EC' or 'EC-HSM'."

Apparently, I have generated an incompatible key! Note that RSA keys are not supported by Cosign, as I documented in the following Sigstore documentation issue. After generating a new key, the signing finally succeeded.

$ cosign sign --key azurekms://tsm-kv-usw3-tst-cssc.vault.azure.net/sigstore-azure-test-key-ec tsmacrtestcssc.azurecr.io/flasksample:v1
Warning: Tag used in reference to identify the image. Consider supplying the digest for immutability.
Pushing signature to: tsmacrtestcssc.azurecr.io/flasksample

OK! I was able to get through the first three steps of Scenario 1: Sign Container Images With Existing Keys from KMS. Next, I will add some other artifacts to the image – aka attestations. I will use only one of the cloud vendors for that because I don’t expect differences in the experience.

Adding SBOM Attestation With Cosign

Using Syft, I can generate an SBOM for the container image that I have built. Then I can use Cosign to sign and push the SBOM to the registry. Keep in mind that you need to be signed into the registry to generate the SBOM. Below are the steps to generate the SBOM (nothing to do with Cosign). The SBOM generated is also available in my Github test repo.

# Sign into AWS ERC
$ aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 562077019569.dkr.ecr.us-west-2.amazonaws.com

# Generate the SBOM
$ syft packages 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1 -o spdx-json > flasksample-v1.spdx

Cosign CLI’s help shows the following message how to add an attestation to an image using AWS KMS key.

cosign attest --predicate <FILE> --type <TYPE> --key awskms://[ENDPOINT]/[ID/ALIAS/ARN] <IMAGE>

When I was running this test, there was no explanation of what the --type <TYPE> parameter was. I decided just to give it a try.

$ cosign attest --predicate flasksample-v1.spdx --type sbom --key awskms:///arn:aws:kms:us-west-2:562077019569:key/61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Error: signing 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1: invalid predicate type: sbom
main.go:62: error during command execution: signing 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1: invalid predicate type: sbom

Trying spdx-json as a type also doesn’t work. There were a couple of places here and here, where Cosign documentation spoke about custom predicate types, but none of the examples showed how to use the parameter. I decided to give it one last try.

$ cosign attest --predicate flasksample-v1.spdx --type "cosign.sigstore.dev/attestation/v1" --key awskms:///arn:aws:kms:us-west-2:562077019569:key/61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Error: signing 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1: invalid predicate type: cosign.sigstore.dev/attestation/v1
main.go:62: error during command execution: signing 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1: invalid predicate type: cosign.sigstore.dev/attestation/v1

Obviously, this was not yet documented, and it was not clear what values could be provided for it. Here the issue asking to clarify the purpose of the --type <TYPE> parameter. From the documentation examples, it seemed that this parameter could be safely omitted. So, I gave it a shot! Running the command without the parameter worked fine and pushed the attestation to the registry.

$ cosign attest --predicate flasksample-v1.spdx --key awskms:///arn:aws:kms:us-west-2:562077019569:key/61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Using payload from: flasksample-v1.spdx

One thing that I noticed with the attestation experience is that it pushed a single artifact with .att at the end of the tag. I will come back to this in the next post. Now, let’s push the SLSA attestation for this image.

Adding SLSA Attestation With Cosign

As I mentioned above, I will cheat with the SLSA attestation because I do all those steps manually and docker build doesn’t generate SLSA provenance. I will use this sample for the SLSA provenance attestation.

$ cosign attest --predicate flasksample-v1.slsa --key awskms:///arn:aws:kms:us-west-2:562077019569:key/61c124fb-bf47-4f95-a805-65dda7cd08ae 562077019569.dkr.ecr.us-west-2.amazonaws.com/flasksample:v1
Using payload from: flasksample-v1.slsa

Cosign did something, as we can see on the console as well as in the registry – the digest of the .att artifact changed.

The question, though, is what exactly happened?

In the next post of the series, I will go into detail about what is happening behind the scenes, where I will look deeper at the artifacts created by Cosign.

Summary

To summarize my experience so far, here is what I think.

  • As I mentioned above, the installation experience for the tools is no worse than any other tool targeted to engineers. Improvements in the documentation would be beneficial for the first-use experience, and I filed a few issues to help with that.
  • Signing with a key stored in AWS was easy and smooth. Unfortunately, the implementation followed the same pattern for Azure Key Vault. I think it would be better to follow the patterns for the specific cloud vendor. There is no expectation that each cloud vendor will follow the same naming, URI, etc. patterns; changing those may result in more errors than benefits for the user.
  • While Cosign hides a lot of the complexities behind the scenes, providing some visibility into what is happening will be good. For example, if you create a key in Azure Key Vault, Cosign CLI will automatically create a key that it supports. That will avoid the issue I encountered with the RSA keys, but it may not be the main scenario used in the enterprise.

Next time, I will spend some time looking at the artifacts created by Cosign and understanding their purpose, as well as how to verify those using Cosign and the keys stored in the KMS.

In my last post, Implementing Quarantine Pattern for Container Images, I wrote about how to implement a quarantine pattern for container images and how to use policies to prevent the deployment of an image that doesn’t meet certain criteria. In that post, I also mentioned that the quarantine flag (not to be confused with the quarantine pattern 🙂) has certain disadvantages. Since then, Steve Lasker has convinced me that the quarantine flag could be useful in certain scenarios. Many of those scenarios are new and will play a role in the containers’ secure supply chain improvements. Before we look at the scenarios, let’s revisit how the quarantine flag works.

What is the Container Image Quarantine Flag?

As you remember from the previous post, the quarantine flag is set on an image at the time the image is pushed to the registry. The expected workflow is shown in the flow diagram below.

The quarantine flag is set on the image for as long as the Quarantine Processor completes the actions and removes the image from quarantine. We will go into detail about what those actions can be later on in the post. The important thing to remember is that, while in quarantine, the image can be pulled only by the Quarantine Processor. Neither the Publisher nor the Consumer or other actor should be able to pull the image from the registry while in quarantine. The way this is achieved is through special permissions that are assigned to the Quarantine Processor that the other actors do not have. Such permissions can be quarantine pull, and quarantine push, and should allow pulling artifacts from and pushing artifacts to the registry while the image is in quarantine.

Inside the registry, you will have a mix of images that are in quarantine and images that are not. The quarantined ones can only be pulled by the Quarantine Processor, while others can be pulled by anybody who has access to the registry.

Quarantining images is a capability that needs to be implemented in the container registry. Though this is not a standard capability, very few, if any, container registries implement it. Azure Container Registry (ACR) has a quarantine feature that is in preview. As explained in the previous post, the quarantine flag’s limitations are still valid. Mainly, those are:

  • If you need to have more than one Quarantine Processor, you need to figure out a way to synchronize their operations. The Quarantine Processor who completes the last action should remove the quarantine flag.
  • Using asynchronous processing is hard to manage. The Quarantine Processor manages all the actions and changes the flag. If you have an action that requires asynchronous processing, the Quarantine Processor needs to wait for the action to complete to evaluate the result and change the flag.
  • Last, you should not set the quarantine flag once you remove it. If you do that, you may break a lot of functionality and bring down your workloads. The problem is that you do not have the granularity of control over who can and cannot pull the image except for giving them the Quarantine Processor role.

With all that said, though, if you have a single Quarantine Processor, the quarantine flag can be used to prepare the image for use. This can be very helpful in the secure supply chain scenarios for containers, where the CI/CD pipelines do not only push the images to the registries but also produce additional artifacts related to the images. Let’s look at a new build scenario for container images that you may want to implement.

Quarantining Images in the CI/CD Pipeline

The one place where the quarantine flag can prove useful is in the CI/CD pipeline used to produce a compliant image. Let’s assume that for an enterprise, a compliant image is one that is signed, has an SBOM that is also signed, and passed a vulnerability scan with no CRITICAL or HIGH severity vulnerabilities. Here is the example pipeline that you may want to implement.

In this case, the CI/CD agent is the one that plays the Quarantine Processor role and manages the quarantine flag. As you can see, the quarantine flag is automatically set in step 4 when the image is pushed. Steps 5, 6, 7, and 8 are the different actions performed on the image while it is in quarantine. While those actions are not complete, the image should not be pullable by any consumer. For example, some of those actions, like the vulnerability scan, may take a long time to complete. You don’t want a developer to accidentally pull the image before the vulnerability scan is done. If one of those actions fails for any reason, the image should stay in quarantine as non-compliant.

Protecting developers from pulling non-compliant images is just one of the scenarios that a quarantine flag can help with. Another one is avoiding triggers for workflows that are known to fail if the image is not compliant.

Using Events to Trigger Image Workflows

Almost every container registry has an eventing mechanism that allows you to trigger workflows based on events in the registry. Typically, you would use the image push event to trigger the deployment of your image for testing or production. In the above case, if your enterprise has a policy for only deploying images with signatures, SBOMs, and vulnerability reports, your deployment will fail if the deployment is triggered right after step 4. The deployment should be triggered after step 9, which will ensure that all the required actions on the image are performed before the deployment starts.

To avoid the triggering of the deployment, the image push event should be delayed till after step 9. A separate event quarantine push can be emitted in step 4 that can be used to trigger actions related to the quarantine of the image. Note of caution here, though! As we mentioned previously, synchronizing multiple actors who can act on the quarantine flag can be tricky. If the CI/CD pipeline is your Quarantine Processor, you may feel tempted to use the quarantine push event to trigger some other workflow or long-running action. An example of such action can be an asynchronous malware scanning and detonation action, which cannot be run as part of the CI/CD pipeline. The things to be aware of are:

  • To be able to pull the image, the malware scanner must also have the Quarantine Processor role assigned. This means that you will have more than one concurrent Quarantine Processor acting on the image.
  • The Quarantine Processor that finishes first will remove the quarantine flag or needs to wait for all other Quarantine Processors to complete. This, of course, adds complexity to managing the concurrency and various race conditions.

I would strongly suggest that you have only one Quarantine Processor and use it to manage all activities from it. Else, you can end up with inconsistent states of the images that do not meet your compliance criteria.

When Should Events be Fired?

We already mentioned in the previous section the various events you may need to implement in the registry:

  • A quarantine push event is used to trigger workflows that are related to images in quarantine.
  • An image push event is the standard event triggered when an image is pushed to the registry.

Here is a flow diagram of how those events should be fired.

This flow offers a logical sequence of events that can be used to trigger relevant workflows. The quarantine workflow should be trigerred by the quarantine push event, while all other workflows should be triggered by the image push event.

If you look at the current implementation of the quarantine feature in ACR, you will notice that both events are fired if the registry quarantine is not enabled (note that the feature is in preview, and functionality may change in the future). I find this behavior confusing. The reason, albeit philosophical, is simple – if the registry doesn’t support quarantine, then it should not send quarantine push events. The behavior should be consistent with any other registry that doesn’t have quarantine capability, and only the image push event should be fired.

What Data Should the Events Contain?

The consumers of the events should be able to make a decision on how to proceed based on the information in the event. The minimum information that needs to be provided in the event should be:

  • Timestamp
  • Event Type: quarantine or push
  • Repository
  • Image Tag
  • Image SHA
  • Actor

This information will allow the event consumers to subscribe to registry events and properly handle them.

Audit Logging for Quarantined Images

Because we are discussing a secure supply chain for containers, we should also think about traceability. For quarantine-enabled registries, a log message should be added at every point the status of the image is changed. Once again, this is something that needs to be implemented by the registry, and it is not standard behavior. At a minimum, you should log the following information:

  • When the image is put into quarantine (initial push)
    • Timestamp
    • Repository
    • Image Tag
    • Image SHA
    • Actor/Publisher
  • When the image is removed from quarantine (quarantine flag is removed)
    Note: if the image is removed from quarantine, the assumption is that is passed all the quarantine checks.

    • Timestamp
    • Repository
    • Image Tag
    • Image SHA
    • Actor/Quarantine Processor
    • Details
      Details can be free-form or semi-structured data that can be used by other tools in the enterprise.

One question that remains is whether a message should be logged if the quarantine does not pass after all actions are completed by the Quarantine Processor. It would be good to get the complete picture from the registry log and understand why certain images stay in quarantine forever. On the other side, though, the image doesn’t change its state (it is in quarantine anyway), and the registry needs to provide an API just to log the message. Because the API to remove the quarantine is not a standard OCI registry API, a single API can be provided to both remove the quarantine flag and log the audit message if the quarantine doesn’t pass. ACR quarantine feature uses the custom ACR API to do both.

Summary

To summarize, if implemented by a registry, the quarantine flag can be useful in preparing the image before allowing its wider use. The quarantine activities on the image should be done by a single Quarantine Processor to avoid concurrency and inconsistencies in the registry. The quarantine flag should be used only during the initial setup of the image before it is released for wider use. Reverting to a quarantine state after the image is published for wider use can be dangerous due to the lack of granularity for actor permissions. Customized policies should continue to be used for images that are published for wider use.

One important step in securing the supply chain for containers is preventing the use of “bad” images. I intentionally use the word “bad” here. For one enterprise, “bad” may mean “vulnerable”; for another, it may mean containing software with an unapproved license; for a third, it may be an image with a questionable signature; possibilities are many. Also, “bad” images may be OK to run in one environment (for example, the local machine of a developer for bug investigation) but not in another (for example, the production cluster). Lastly, the use of “bad” images needs to be verified in many phases of the supply chain – before release for internal use, before build, before deployment, and so on. The decision of whether a container image is “bad” cannot be made in advance and depends on the consumer of the image. One common way to prevent the use of “bad” images is the so-called quarantine pattern. The quarantine pattern prevents an image from being used unless certain conditions are met.

Let’s look at a few scenarios!

Scenarios That Will Benefit from a Quarantine Pattern

Pulling images from public registries and using those in your builds or for deployments is risky. Such public images may have vulnerabilities or malware included. Using them as base images or deploying them to your production clusters bypass any possible security checks, compromising your containers’ supply chain. For that reason, many enterprises ingest the images from a public registry into an internal registry where they can perform additional checks like vulnerability or malware scans. In the future, they may sign the images with an internal certificate, generate a Software Bill of Material (SBOM), add provenance data, or something else. Once those checks are done (or additional data about the image is generated), the image is released for internal use. The public images are in the quarantined registry before they are made available for use in the internal registry with “golden” (or “blessed 🙂 ) images.

Another scenario is where an image is used as a base image to build a new application image. Let’s say that two development teams use the  debian:bullseye-20220228 as a base image for their applications. The first application uses  libc-bin, while the second one doesn’t. libc-bin in that image has several critical and high severity vulnerabilities. The first team may not want to allow the use of the debian:bullseye-20220228 as a base image for their engineers, while the second one may be OK with it because the libc-bin vulnerabilities may not impact their application. You need to selectively allow the image to be used in the second team’s CI/CD pipeline but not in the first.

In the deployment scenario, teams may be OK deploying images with the developers’ signatures in the DEV environments, while the PROD ones should only accept images signed with the enterprise keys.

As you can see, deciding whether an image should be allowed for use or not is not a binary decision, and it depends on the intention of its use. In all scenarios above, an image has to be “quarantined” and restricted for certain use but allowed for another.

Options for Implementing the Quarantine Pattern

So, what are the options to implement the quarantine pattern for container images?

Using a Quarantine Flag and RBAC for Controlling the Access to an Image

This is the most basic but least flexible way to implement the quarantine pattern. Here is how it works!

When the image is pushed to the registry, the image is immediately quarantined, i.e. the quarantine flag on the image is set to  TRUE. A separate role like QuarantineReader is created in the registry and assigned to the actor or system allowed to perform tasks on the image while in quarantine. This role allows the actor or system to pull the image to perform the needed tasks. It also allows changing the quarantine flag from TRUE to FALSE when the task is completed.

The problem with this approach becomes obvious in the scenarios above. Take, for example, the ingestion of public images scenario. In this scenario, you have more than one actor that needs to modify the quarantine flag: the vulnerability scanner, the malware scanner, the signer, etc., before the images are released for internal use. All those tasks are done outside the registry, and some of them may run on a schedule or take a long time to complete (vulnerability and malware scans, for example). All those systems need to be assigned the QuarantineReader role and allowed to flip the flag when done. The problem, though, is that you need to synchronize between those services and only change the quarantine flag from TRUE to FALSE only after all the tasks are completed.

Managing concurrency between tasks is a non-trivial job. This complicates the implementation logic for the registry clients because they need to interact with each other or an external system that synchronizes all tasks and keeps track of their state. Unless you want to implement this logic into the registry, which I would not recommend.

One additional issue with this approach is its extensibility. What if you need to add one more task to the list of things that you want to do on the image before being allowed for use? You need to crack open the code and implement the hooks to the new system.

Lastly, some of the scenarios above are not possible at all. If you need to restrict access to the image to one team and not another, the only way to do it is to assign the QuarantineReader role to the first team. This is not optimal, though, because the meaning of the role is only to assign it so systems that will perform tasks to take the image out of quarantine and not use it for other purposes. Also, if you want to make decisions based on the content of vulnerability reports or SBOMs, this quarantine flag approach is not applicable at all.

Using Declarative Policy Approach

A more flexible approach is to use a declarative policy. The registry can be used to store all necessary information about the image, including vulnerability and malware reports, SBOMs, provenance information, and so on. Well, soon, registries will be able to do that 🙂 If your registry supports ORAS reference types, you can start saving those artifacts right now. In the future, thanks to the Reference Types OCI Working Group, every OCI-compliant registry should be able to do the same. How does that work?

When the image is initially pushed to the registry, no other artifacts are attached to it. Each individual system that needs to perform a task on the image can run on its own schedule. Once it completes the task, it pushes a reference type artifact to the registry with the subject of the image in question. Every time the image is pulled from the registry, the policy evaluates if the required reference artifacts are available; if not, the image is not allowed for use. You can define different policies for different situations as long as the policy engine understands the artifact types. Not only that, but you can even make decisions on the content of the artifacts as long as the policy engine is intelligent enough to interpret those artifacts.

Using the declarative policy approach, the same image will be allowed for use by clients with different requirements. Extending this is as simple as implementing a new policy, which in most cases doesn’t require coding.

Where Should the Policy Engine be Implemented?

Of course, the question that gets raised is where the policy engine should be implemented – as part of the registry or outside of it. I think the registries are intended to store the information and not make policy decisions. Think of a registry as yet another storage system – it has access control implemented but the only business logic it holds is how to manage the data. Besides that, there are already many policy engines available – OPA is the one that immediately comes to mind, that is flexible enough to enable this functionality relatively easily. Policy engines are already available, and different systems are already integrated with them. Adding one more engine as part of the registry will just increase the overhead of managing policies.

Summary

To summarise, using a declarative policy-based approach to control who should and shouldn’t be able to pull an artifact from the registry is more flexible and extensible. Adding capabilities to the policy engines to understand the artifact types and act on those will allow enterprises to develop custom controls tailored to their own needs. In the future, when policy engines can understand the content of each artifact, those policies will be able to evaluate SBOMs, vulnerability reports, and other content. This will open new opportunities to define fine-grained controls for the use of registry artifacts.

While working on a process of improving the container secure supply chain, I often need to go over the current challenges of patching container vulnerabilities. With the introduction of Automatic VM Patching, having those conversations are even more challenging because there is always the question: “Why can’t we patch containers the same way we patch VMs?” Really, why can’t we? First, let’s look at how VM and container workloads differ.

How do VM and Container Workloads Differ?

VM-based applications are considered legacy applications, and VMs fall under the category of Infrastructure-as-a-Service (IaaS) compute services. One of the main characteristics of IaaS compute services is the persistent local storage that can be used to save data on the VM. Typically the way you use the VMs for your application is as follows:

  • You choose a VM image from the cloud vendor’s catalog. The VM image specifies the OS and its version you want to run on the VM.
  • You create the VM from that image and specify the size of the VM. The size includes the vCPUs, memory, and persistent storage to be used by the VM.
  • You install the additional software you need for your application on the VM.

From this point onward, the VM workload state is saved to the persistent storage attached to the VM. Any changes to the OS (like patches) are also committed to the persistent storage, and next time the VM workload needs to be spun up, those are loaded from there. Here are the things to remember for VM-based workloads:

  • VM image is used only once when the VM workload is created.
  • Changes to the VM workload are saved to the persistent storage; the next time the VM is started, those changes are automatically loaded.
  • If a VM workload is moved to a different hardware, the changes will still be loaded from the persistent storage.

How do containers differ, though?

Whenever a new container workload is started, the container image is used to create the container (similar to the VM). If the container workload is stopped and started on the same VM or hardware, any changes to the container will also be automatically loaded. However, because orchestrators do not know whether the new workload will end up on the same VM (or hardware due to resource constraints), they do not stop but destroy the containers, and if a new one needs to be spun up, they use the container image again to create it.

That is a major distinction between VMs and containers. While the VM image is used only once when the VM workload is created, the container images are used repeatedly to re-create the container workload when moving from one place to another and increasing capacity. Thus, when a VM is patched, the patches will be saved to the VM’s persistent storage, while the container patches need to be available in the container image for the workloads to be always patched.

The bottom line is, unlike VMs, when you think of how to patch containers, you should target improvements in updating the container images.

A Timeline of a Container Image Patch

For this example, we will assume that we have an internal machine learning team that builds their application image using  python:3.10-bullseye as a base image. We will concentrate on the timelines for fixing the OpenSSL vulnerabilities CVE-2022-0778 and CVE-2022-1292. The internal application team’s dependency is OpenSSL <– Debian <– Python. Those are all Open Source Software (OSS) projects driven by their respective communities. Here is the timeline of fixes for those vulnerabilities by the OSS community.

2022-03-08: python:3.10.2-bullseye Released

Python publishes  puthon:3.10.2-bullseye container image. This is the last Python image before the CVE-2022-0778 OpenSSL vulnerability was fixed.

2022-03-15: OpenSSL CVE-2022-0778 Fixed

OpenSSL publishes fix for CVE-2022-0778 impacting versions 1.0.2 - 1.0.2zc, 1.1.1 - 1.1.1m, and 3.0.0 - 3.0.1.

2022-03-16: debian:bullseye-20220316 Released

Debian publishes  debian:bullseye-20220316 container image that includes a fix for CVE-2022-0778.

2022-03-18: python:3.10.3-bullseye Released

Python publishes  python:3.10.3-bullseye container image that includes a fix for CVE-2022-0778.

2022-05-03: OpenSSL CVE-2022-1292 Fixed

OpenSSL publishes fix for CVE-2022-1292 impacting versions 1.0.2 – 1.0.2zd, 1.1.1 – 1.1.1n, and 3.0.0 – 3.0.2.

2022-05-09: debian:bullseye-20220509 Released

Debian publishes debian:bullseye-20220316 container image that DOES NOT include a fix for CVE-2022-1292.

2022-05-27: debian:bullseye-20220527 Released

Debian publishes  debian:bullseye-20220527 container image that includes a fix for CVE-2022-1292.

2022-06-02: python:3.10.4-bullseye Released

Python publishes  python:3.10.4-bullseye container image that includes a fix for CVE-2022-1292.

There are a few important things to notice in this timeline:

  • CVE-2022-0778 was fixed in the whole chain within three days only.
  • In comparison, CVE-2022-1292 took 30 days to fix in the whole chain.
  • Also, in the case of CVE-2022-1292, Debian released a container image after the fix from OpenSSL was available, but that image DID NOT contain the fix.

The bottom line is:

  • Timelines for fixes by the OSS communities are unpredictable.
  • The latest releases of container images do not necessarily contain the latest software patches.

SLAs and the Typical Process for Fixing Container Vulnerabilities

The typical process teams use to fix vulnerabilities in container images is waiting for the fixes to appear in the upstream images. In our example, the machine learning team must wait for the fixes to appear in the python:3.10-bullseye image first, then rebuild their application image, test the new image, and re-deploy to their production workloads if tests pass. Let’s call this process wait-rebuild-test-redeploy (or WRTR if you like acronyms:)).

The majority of enterprises have established SLAs for fixing vulnerabilities. For those that have not established such yet, things will soon change due to the Executive Order for Improving the Nation’s Cybersecurity. Many enterprises model their patching processes based on the FedRAMP 30/90/180 rules specified in the FedRAMP Continuous Monitoring Strategy Guide. According to the FedRAMP rules, high severity vulnerabilities must be remediated within 30 days. CISA’s Operational Directive for Reducing the Risk of Known Exploited Vulnerabilities has much more stringent timelines of two weeks for vulnerabilities published in CISA’s Known Exploited Vulnerabilities Catalog.

Let’s see how the timelines for patching the abovementioned OpenSSL vulnerabilities fit into those SLAs for the machine learning team using the typical process for patching containers.

CVE-2022-0778 was published on March 15th, 2022. It is a high severity vulnerability, and according to the FedRAMP guidelines, the machine learning team has till April 14th, 2022, to fix the vulnerability in their application image. Considering that the python:3.10.3-bullseye image was published on March 18th, 2022, the machine learning team has 27 days to rebuild, test, and redeploy the image. This sounds like a reasonable time for those activities. Luckily, CVE-2022-0778 is not in the CISA’s catalog, but the team would still have 11 days for those activities if it was.

The picture with CVE-2022-1292 does not look so good, though. The vulnerability was published on May 3rd, 2022. It is a critical severity vulnerability, and according to the FedRAMP guidelines, the machine learning team has till June 2nd, 2022, to fix the vulnerability. Unfortunately, though, python:3.10.4-bullseye image was published on June 2nd, 2022. This means that the team needs to do the re-build, testing, and re-deployment on the same day the community published the image. Either the team needs to be very efficient with their processes or work around the clock that day to complete all the activities (after hoping the community will publish a fix for the python image before the SLA deadline). That is a very unrealistic expectation and also impacts the team’s morale. If by any chance, the vulnerability appeared on the CISA’s catalog (which luckily it did not), the team would not be able to fix it within the two-week SLA.

That proves that the wait-rebuild-test-redeploy (WRTR) process is ineffective in meeting the SLAs for fixing vulnerabilities in container images. But, what can you currently do to improve this and take control of the timelines?

Using Multi-Stage Builds to Fix Container Vulnerabilities

Until the container technology evolves and a more declarative way for patching container images is available, teams can use multi-stage builds to build their application images and fix the base image vulnerabilities. This is easily done in the CI/CD pipeline. This approach will also allow teams to control the timelines for vulnerability fixes and meet their SLAs. Here is an example how you can solve the issue with patching the above example:

FROM python:3.10.2-bullseye as baseimage

RUN apt-get update; \
     apt-get upgrade -y

RUN adduser appuser

FROM baseimage

USER appuser

WORKDIR /app

CMD [ "python", "--version" ]

In the above Dockerfile, the first stage of the build updates the base image with the latest patches. The second stage builds the application and runs it with the appropriate user permissions. Using this approach you awoid the wait part in the WRTD process above and you can always meet your SLAs with simple re-build of the image.

Of course, this approach also has drawbacks. One of its biggest issues is the level of control teams have over what patches are applied. Another one is that some teams do not want to include layers in their images that do not belong to the application (i.e. modify the base image layers). Those all are topics for another post 🙂

Photo by Webstacks on Unsplash

In Part 1 of the series Signatures, Key Management, and Trust in Software Supply Chains, I wrote about the basic concepts of identities, signatures, and attestation. In this one, I will expand on the house buying scenario, that I hinted about in Part 1, and will describe a few ways to exploit it in the physical world. Then, I will map this scenario to the digital world and delve into a few possible exploits. Throughout this, I will also suggest a few possible mitigations in both the physical as well as the digital world. The whole process as you may have already known is called threat modeling.

Exploiting Signatures Without Attestation in the Offline World

For the purpose of this scenario, we will assume that the parties involved are me and the title company. The document that needs to be signed is the deed (we can also call it the artifact). Here is a visual representation of the scenario:

Here is how the trust is established:

  • The title company has an inherent trust in the government.
  • This means that the title company will trust any government-issued identification like a driving license.
  • In my meeting with the title company, I present my driving license.
  • The title company verifies the driving license is legit and establishes trust in me.
  • Last, the title company trusts the signature that I use to sign the deed in front of them.
  • From here on, the title company trusts the deed to proceed with the transaction.

As we can see, establishing trust between the parties involves two important conditions – implicit trust in a central authority and verification of identity. Though, this process is easily exploitable with fake IDs (like fake driving license) as shown in the picture below.

In this case, an imposter can obtain a fake driving license and impersonate me in the transaction. If the title company can be fooled that the driving license is issued by the government, they can falsely establish trust in the imposter and allow him to sign the deed. From there on, the title company considers the deed trusted and continues with the transaction.

The problem here is with the verification step – the title company does not do a real-time verification if the driving license is legitimate. The verification step is done manually and offline by an employee of the title company and relies on her or his experience to recognize forged driving licenses. If this “gate” is passed, the signature on the deed becomes official and will not be verified anymore in the process.

There is one important step in this process that we didn’t mention yet. When the title company employee verifies the driving license, she or he also takes a photocopy of the driving license and attaches it to the documentation. This photocopy becomes part of the audit trail for the transaction if later on is discovered that the transaction needs to be reverted.

Exploiting Signatures Without Attestation in the Digital World

The above process is easily transferable to the digital world. In the following GitHub project I have an example of signing a simple text file artifact.txt. The example uses self-signed certificates for verifying the identity and the signature.

There are two folders in the repository. The real folder contains the files used to generate a key and X.509 certificate that is tied to my real identity and verified using my real domain name toddysm.com. The fake folder contains the files used to generate a key and X.509 certificate that is tied to an imposter identity that can be verified with a look-alike (or fake) domain. The look-alike domain uses homographs to replace certain characters in my domain name. If the imposter has ownership of the imposter domain, obtaining a trusted certificate with that domain name is easily achievable.

The dilemma you are presented with is, which certificate to trust – the one here or the one here. When you verify both certificates using the following commands:

openssl x509 -nameopt lname,utf8 -in [cert-file].crt -text -noout | grep Subject:
openssl x509 -nameopt lname,utf8 -in [cert-file].crt -text -noout | grep Issuer:

they both return visually indistinguishable information:

Subject: countryName=US, stateOrProvinceName=WA, localityName=Seattle, organizationName=Toddy Mladenov, commonName=toddysm.com, emailAddress=me@toddysm.com
Issuer: countryName=US, stateOrProvinceName=WA, localityName=Seattle, organizationName=Toddy Mladenov, commonName=toddysm.com, emailAddress=me@toddysm.com

It is the same as looking at two identical driving licenses, a legitimate one and a forged one, that have no visible differences.

The barrier for this exploit using PGP keys and SSH keys is even lower. While X.509 certificates need to be issued by a trusted certificate authority (CA), PGP and SSH keys can be issued by anybody. Here is a corresponding example of a valid PGP key and an imposter PGP key. Once again, which one would you trust?

Though, compromising CAs is not something that we can ignore. There are numerous examples where forged certificates issued by legitimate CAs are used:

Let’s also not forget that Stuxnet malware was signed by compromised JMicron and Realtec private keys. In the case of compromised CA, malicious actors don’t even need to use homographs to deceive the public – they can issue the certificate with the real name and domain.

Unlike the physical world though, the digital one misses the very important step of collecting audit information when the signature is verified. I will come back to that in the next post of the series where I plan to explore the various controls that can be put to increase security.

Based on the above though, it is obvious that the trust whether in a single entity or a central certificate authority (CA), has highly diminished in recent years.

Oh, and don’t trust the keys that I published on GitHub! 🙂 Anybody can copy them or generate new ones with my information – unfortunately obtaining that information is quite easy nowadays.

Exploiting Signatures With Attestation in the Offline World

Let’s look at the example I introduced in the previous post where more parties are involved in the process of selling my house. Here is the whole scenario!

Because I am unable to attend the signing of the documents, I need to issue a power of attorney for somebody to represent me. This person will be able to sign the documents on my behalf. First and foremost, I need to trust that person. But my trust in this person doesn’t automatically transfer to the title company that will handle the transaction. For the title company to trust my representative, the power of attorney needs to be attested by a certified notary. Only then will the title company trust the power of attorney document and accept the signature of my representative.

Here is the question: “How the introduction of the notary increases the security?” Note that I used the term “increase security”. While there is no 100% guarantee that this process will not fail…

By adding one more step to the process, we introduce an additional obstacle that reduces the probability for malicious activity to happen, which increases the security.

What the notary will eventually prevent is that my “representative” forcefully makes me sign the power of attorney. My security is compromised and now my evil representative can use the power of attorney to sell my house to himself for just a dollar. The purpose of the notary is to attest that I willfully signed the document and was present (and in good health) during the signing. Of course, this can easily be exploited if both, the representative and the notary are evil, as shown in the below diagram.

As you can see in this scenario, all parties have valid government-issued IDs that the title company trusts. However, the process is compromised if there is collusion between the malicious actor (evil representative) and the notary.

Other ways to exploit this process are if the notary or my representative are both or individually impersonated. The impersonation is described in the section above – Exploiting Signatures Without Attestation in the Offline World.

Exploiting Signatures With Attestation in the Digital World

There is a lot of talks recently about implementing attestation systems that will save signature receipts in an immutable ledger. This is presented as the silver bullet solution for signing software artifacts (check out the Sigstore project). Similar to the notary example in the previous section, this approach may increase security but it may also have a negative impact. Because they compare themselves to Let’s Encrypt, let me take a stab at how Let’s Encrypt impacted the security on the Web.

Before Let’s Encrypt, only owners that want to invest money to pay for valid certificates had HTTPS enabled on their websites. More importantly, though, browsers showed a clear indicator when a site was using plain HTTP protocol and not the secure one. From a user’s point of view it was easy to make the decision that if the browser address bar was red, you should not enter your username and password or your credit card. Recognizing malicious sites was relatively easy because malicious actors didn’t want to spend the money and time to get a valid certificate.

Let’s Encrypt (and the browser vendors) changed that paradigm. Being free, Let’s Encrypt allows anybody to issue a valid (and “trusted”??? 🤔) certificate and enable HTTPS for their site. Not only that but Let’s Encrypt made it so easy that you can get the certificate issued and deployed to your web server using automation within seconds. The only proof you need to provide is the ownership of the domain name for your server. At the same time, Google led the campaign to change the browser indicators to show a very mediocre lock icon in the address bar that nobody except maybe a few pays any attention to anymore. As a result, every malicious website now has HTTPS enabled and there is no indication in the browser to tell you that it is malicious. In essence, the lock gives you a false sense of security.

I would argue that Let’s Encrypt (and the browser vendors) in fact decreased the security on the web instead of increasing it. Let me be clear! While I think Let’s Encrypt (and the browser vendors) decreased the security, what they provide had a tremendous impact on privacy. Privacy should not be discounted! Though in marketing messages those two terms are used interchangeably and this is not for the benefit of the users.

In the digital world, the CA can play the role of the notary in the physical world. The CA verifies the identity of the entity that wants to sign artifacts and issues a “trusted” certificate. Similar to a physical world notary, the CA will issue a certificate for both legit as well as malicious actors, and unlike the physical world, the CA has very basic means to verify identities. In the case of Let’s Encrypt this is the domain ownership. In the case of Sigstore that will be a GitHub account. Everyone can easily buy a domain or register a GitHub account and get a valid certificate. This doesn’t mean though that you should trust it.

Summary

The takeaway from this post for you should be that every system can be exploited. We learn and create systems that reduce the opportunities for exploitation but that doesn’t make them bulletproof. Also, when evaluating technologies we should not only look at the shortcomings of the previous technology but also at the shortcoming of the new shiny one. Just adding attestation to the signatures will not be enough to make signatures more secure.

In the next post, I will look at some techniques that we can employ to make signatures and attestations more secure.

Photo by Erik Mclean on Unsplash

 

 

For the past few months, I’ve been working on a project for a secure software supply chain, and one topic that seems to always start passionate discussions is the software signatures. The President’s Executive Order on Improving the Nation’s Cybersecurity (EO) is a pivotal point for the industry. One of the requirements is for vendors to document the supply chain for software artifacts. Proving the provenance of a piece of software is a crucial part of the software supply chain, and signatures play a main role in the process. Though, there are conflicting views on how signatures should work. There is the traditional PKI (Public Key Infrastructure) approach that is well established in the enterprises, but there are other traditional and emerging technologies that are brought up in discussions. These include PGP key signatures, SSH key signatures, and the emerging ephemeral key (or keyless) signatures (here, here, and lately here).

While PKI is well established, the PKI shortcomings were outlined by Bruce Schneier and Carl Elisson more than 20 years ago in their paper. The new approaches are trying to overcome those shortcomings and democratize the signatures the same way Let’s Encrypt democratized HTTPS for websites. Though, the question is whether those new technologies improve security over PKI? And if so, how? In a series of posts, I will lay out my view of the problem and the pros and cons of using one or another signing approach, how the trust is established, and how to manage the signing keys. I will start with the basics using simple examples that relate to everyday life and map those to the world of digital signatures.

In this post, I will go over the identity, signature, and attestation concepts and explain why those matter when establishing trust.

What is Identity?

Think about your own experience. Your identity is you! You are identified by your gender, skin color, facial and body characteristics, thumbprint, iris print, hair color, DNA etc. Unless you have an identical twin, you are unique in the world. Even if you are identical twins, there are differences like thumbprints and iris prints that make you unique. The same is true for other entities like enterprises, organizations, etc. Organizations have names, tax numbers, government registrations, addresses, etc. As a general rule, changing your identity is hard if not impossible. You can have plastic surgery but you cannot change your DNA. The story may be a bit different for organizations that can rename themselves, get bought or sold, change headquarters, etc. but it is still pretty easy to uniquely identify organizations.

All the above points that identities are:

  • unique
  • and impossible (or very hard) to change

In the digital world, identities are an abstract concept. In my opinion, it is wrong to think that identities can be changed in both the physical and the digital world. Although we tend to think that they can be changed, this is not true – what can be changed is the way we prove our identity. We will cover that shortly but before that, let’s talk about trust.

If you are a good friend of mine, you may be willing to trust me but if you just met me, your level of trust will be pretty low. Trust is established based on historical evidence. The longer you know me, and the longer I behave honestly, the more you will be willing to trust me. Sometimes I may not be completely honest, or I may borrow some money from you and not return them. But I may buy you a beer every time we go out and offset that cost and you may be willing to forgive me. It is important to note that trust is very subjective, and while you may be very forgiving, another friend of mine maybe not. He may decide that I am not worth his trust and never borrow me money again.

How do We Prove Our Identity?

In the physical world, we prove our identity using papers like a driving license, a passport, an ID card, etc. Each one of those documents is issued for a purpose:

  • The driving license is mainly used to prove you can drive a motorized vehicle on the US streets. Unless it is an enhanced driving license, you (soon) will not be able to use it to board a domestic flight. However, you cannot cross borders with your driving license and you cannot use it to even rent a car in Europe (unless you have an international driving license).
  • To cross borders you need a passport. The passport is the only document that is recognized by border authorities in other countries that you visit. You cannot use your US driving license to cross the borders in Europe. The interesting part is that you do not need a driving license to get a passport or vice versa.
  • You also have your work badge. Your work badge identifies you as an employee of a particular organization. Despite the fact that you have a driving license and a passport, you cannot enter the buildings without your badge. However, to prove to your employer that you are who you are for them to issue you the badge, you must have a driving license or a passport.

In the digital world, there are similar concepts to prove our identity.

  • You can use a username, password and another factor (2FA/MFA token) to prove your identity to a particular system.
  • App secret that you can generate in a system can also be used to prove your identity.
  • OAuth or SSO (single sign-on) token issued by a third party is another way to prove your identity to a particular system. That system though needs to trust the third party.
  • SSH key can be an alternate way to prove your identity. You can use it in conjunction with username/password combination or separately.
  • You can use PGP key to prove your identity to an email recipient.
  • Or use a TLS certificate to prove the identity of your website.
  • And finally, you can use an X.509 certificate to prove your identity.

As you can see, similar to the physical world, in the digital world you have multiple ways to prove your identity to a system. You can use more than one way for a single system. The example that comes to mind is GitHub – you can use app secret or SSH key to push your changes to your repository.

How Does Trust Tie to the Concepts Above? Let’s say that I am a good developer. My code published on GitHub has a low level of bugs, it is well structured, well documented, easy to use, and updated regularly. You decide that you can trust my GitHub account. However, I also have DockerHub account that I am negligent with – I don’t update the containers regularly, they have a lot of vulnerabilities, and are sloppily built. Although you are my friend and you trust my GitHub account, you are not willing to trust my DockerHub account. This example shows that trust is not only subjective but also based on context.

OK, What Are Signatures?

Here is where things become interesting! In the physical world, a signature is a person’s name written in that person’s handwriting. Just the signature does not prove my identity. Wikipedia’s entry for signature defines the traditional function of a signature as follows:

…to permanently affix to a document a person’s uniquely personal, undeniable self-identification as physical evidence of that person’s personal witness and certification of the content of all, or a specified part, of the document.

The keyword above is self-identification. This word in the definition has a lot of implications:

  • First, as a signer, I can have multiple signatures that I would like to use for different purposes. I.e. my identity may use different signatures for different purposes.
  • Second, nobody attests to my signature. This means that the trust is put in a single entity – the signer.
  • Third, a malicious person can impersonate me and use my signature for nefarious purposes.

Interestingly though, we are willing to accept the signature as proof of identity depending on the level of trust we have in the signer. For example, if I borrow $50 from you and give you a receipt with my signature the I will pay you back in 30 days, you may be willing to accept it even if you don’t know me too much (i.e. your level of trust is relatively low). This is understandable because we decide to lower our level of trust to just self-identification. I can increase your level of trust if I show you my driving license that has my signature printed on it and you can compare both signatures. However, showing you my driver’s license is actually an attestation, which is covered in detail below.

In the digital world, to create a signature, you need a private key and to verify a signature, you need a public key (check the Digital Signature article on Wikipedia). The private and the public key are related and work in tandem – the private key signs the content and the public key verifies the signature. You own both but keep the private secret and publish the public to everybody to use. From the examples I have above, you can use PGP, SSH, and X.509 to sign content. However, they have differences:

  • PGP is a self-generated key-pair with additional details like name and email address included in the public certificate, that can be used for (pseudo)identification of the entity that signs the content. You can think of it as similar to a physical signature, where, in addition to the signature you verbally provide your name and home address as part of the signing process.
  • SSH is also a self-generated key pair but has no additional information attached. Think of it as the plain physical signature.
  • With X.509 you have a few options:
    • Self-generated key-pair similar to the PGP approach but you can provide more self-identifying information. When signing with such a private key you can assume that it is similar to the physical signature, where you verbally provide your name, address, and date of birth.
    • Domain Validated (DV) certificate that validates your ownership of a particular domain (this is exactly what Let’s Encrypt does). Think of this as similar to a physical signature where you verbally provide your name, address, and date of birth as well as show a utility bill with your name and address as part of the signing process.
    • Extended Validation (EV) certificate that validates your identity using legal documents. For example, this can be your passport as an individual or your state and tax registrations as an organization.
      Both, DV and EV X.509 certificates are issued by Certificate Authorities (CA), which are trusted authorities on the Internet or within the organization.

Note: X.509 is actually an ITU standard defining the format of public-key certificates and is at the basis of the PKI. The key pair can be generated using different algorithms. Though, the term X.509 is used (maybe incorrectly) as a synonym for the key-pair also.

Without any other variables in the mix, the level of trust that you may put on the above digital approaches would most probably be the following: (1-Lowest) SSH, (2) PGP and self-signed X.509, (3) DV X,509, and (4-Highest) EC X.509. Keep in mind that DV and EV X.509 are actually based on attestation, which is described next.

So, What is Attestation?

We finally came to it! Attestation, according to Meriam-Webster dictionary, is an official verification of something as true or authentic. In the physical world, one can increase the level of trust in a signature by having a Notary attest to the signature (lower level of trust) and adding government apostille (higher level of trust used internationally). In many states notaries are required (or highly encouraged) to keep a log for tracking purposes. While you may be OK with having only my signature on a paper for $50 loan, you certainly would want to have a notary attesting to a contract for selling your house to me for $500K. The level of trust in a signature increases when you add additional parties who attest to the signing process.

In the digital world, attestation is also present. As we’ve mentioned above, CAs act as the digital notaries who verify the identity of the signer and issue digital certificates. This is done for the DV and EV X.509 certificates only though. There is no attestation for PGP, SSH, and self-signed X.509 certificates. For digital signatures, there is one more traditional method of attestation – the Timestamp Authority (TSA). The TSA’s role is to provide an accurate timestamp of the signing to avoid tampering with the time by changing the clock on the computer where the signing occurs. Note that the TSA attests only for the accuracy of the timestamp of signing and not for the identity of the signer. One important thing to remember here is that without attestation you cannot fully trust the signature.

Here is a summary of the signing approaches and the level of trust we discussed so far.

Signing Keys and Trust

Signing Approach Level of Trust
SSH Key 1 - Lowest
PGP Key 2 - Low
X.509 Self-Signed 2 - Low
X.509 DV 3 - Medium
X.509 EV 4 - High

Now, that we’ve established the basics let’s talk about the validity period and why it matters.

Validity Period and Why it Matters?

Every identification document that you own in the physical world has an expiration date. OK, I lied! I have a German driving license that doesn’t have an expiration date. But this is an exception, and I can claim that I am one of the last who had that privilege – newer driving licenses in Germany have an expiration date. US driving licenses have an expiration date and an issue date. You need to renew your passport every five years in the US. Different factors determine why an identification document may expire. For a driving license, the reason may be that you lost some of your vision and you are not capable of driving anymore. For a passport, it may be because you moved to another country, became a citizen, and forfeit your right to be a US citizen.

Now, let’s look at physical signatures. Let’s say that I want to issue a power of attorney to you to represent me in the sale of my house while I am on a business trip for four weeks in Europe. I have two options:

  • Write you a power of attorney without an expiration date and have a notary attest to it (else nobody will believe you that you can represent me).
  • Write you a power of attorney that expires four weeks from today and have a notary attest to it.

Which one do you think is more “secure” for me? Of course the second one! The second power of attorney will give you only a limited period to sell my house. While this does not prevent you from selling it in a completely different transaction than the one I want, you are still given some time constraints. The counterparts in the transaction will check the power of attorney and will note the expiration date. If there is a final meeting four weeks and a day from now, that will require you to sign the final papers for the transaction, they should not allow you to do that because the power of attorney is not valid anymore.

Now, here is an interesting situation that often gets overlooked. Let’s say that I sign the power of attorney on Jan 1st, 2022. The power of attorney is valid till the end of day Jan 28th, 2022. I use my driving license to identify myself to the notary. My driving license has an expiration date of Jan 21st, 2022. Also, the notary’s license expires on Jan 24th, 2022. What is the last date that the power of attorney is valid? I will leave this exploration for one of the subsequent posts.

Time constraints are a basic measure to increase my security and prevent you from selling my house and pocketing the money later in the year. I will expand on this example in my next post where I will look at different ways to exploit signatures. But the basic lesson here is: the more time you have to exploit something, the higher probability there is for you to do so. Also, another lesson is: put an expiration date on all of your powers of attorney!

How does this look in the digital world?

  • SSH keys do not have expiration dates. Unless you provide the expiration date in the signature itself, the signature will be valid forever.
  • PGP keys have expiration dates a few years in the future. I just created a new key and it is set to expire on Jan 8th, 2026. If I sign an artifact with it and don’t provide an expiration date for the signature, it will be considered valid until Jan 8th, 2026.
  • X.509 certificates also have long expiration dates – 3, 12, or 24 months. Let’s Encrypt certificates have 3 months expiration dates. Root CA certificates have even longer expiration dates, which can be dangerous as we will explore in the future. Let’s Encrypt was the first to reduce the length of validity of their certificates to increase the security of certificate compromise because domains change hands quite often. Enterprises followed suit because the number of stolen enterprise certificates is growing.

Note: In the next post, I will expand a little bit more into the relationships between keys and signatures but for now, you can use them as the example above where I mention the various validity periods for documents used for the power of attorney.

Summary

If nothing else, here are the main takeaways that you should remember from this post:

  • Signatures cannot infer identities. Signatures can be forged even in the digital world.
  • One identity can have many signatures. Those signatures can be used for different purposes.
  • For a period of time, a signature can infer identity if it is attested to. However, the longer time passes, the lower the trust in this signature should be. Also, the period of time is subjective and dependent on the risk level of the signature consumer.
  • To increase security, signatures must expire. The shorter the expiration period, the higher the security (but also other constraints should be put in place).
  • Before trusting a signature, you should verify if the signed asset is still trustable. This is in line with the zero-trust principle for security: “Never trust, always verify!”.

Take a note that in the last bullet point, I intentionally use the term “asset is trustable” and not “signature is valid”. In the next post, I will go into more detail about what that means, how signatures can be exploited, and how context can provide value.

Featured image by StockSnap.