Embed the Origin Dockerfiles into Your Docker Container Images Using Labels
With the recent Solorigate incident, a lot of emphasis is put on determining the origin of the software running in an enterprise. For Docker container images, this will mean to embed in the image the Dockerfile the image was built from. However, tracking down the software origin is not so trivial to do. For closed-source software, we blindly trust the vendors and if we are lucky enough, we may get a signed piece of code. For open-source one, we rarely check the SHA signature and never even think of verifying what source code this binary was produced from. In talks with customers, I quite often hear them asking, how can they verify what sources a container image is built from. They want to attribute each image with metadata that links to the Dockerfile used to build the image as well as the Git commit and the developer who triggered the build.
There are many articles that discuss this problem. Here are two recent examples. Richard Lander from the Microsoft .NET team writes in his blog post Staying safe with .NET containers about the pedigree and provenance of the software we run and how to think about it. in his post Embedding source code version information in Docker images offers one solution to the problem.
Josh Hendrick’s proposal is in the direction I would go, but one problem I have with it is that it requires special handling in the application that runs in the container to obtain this information. I would prefer to have this information readily available without the need to run the container image. Docker images and the Open Container Initiative already have specified ways to do that without adding special files to your image. In this post, I will outline another way you can embed this information into your images and easily retrieve it without any changes to your application.
Using Docker Image Labels
Docker images spec has already built-in functionality to add labels to the image. Labels are intended to be set during build time. They also show up when inspecting the image using docker image inspect
, which makes them the right choice to specify the Dockerfile and the other build origin details. One more argument that makes them the right choice for this information is that the labels are layers in the image, and thus immutable. If you change the label in an image the resulting image SHA will change.
To demonstrate how labels can be used to embed the Dockerfile and other origin information into the Docker image, I have published a dynamic labels sample on GitHub. The sample uses a base Python image and implements a simple functionality to print the container’s environment variables. Let’s walk through it step by step.
The Dockerfile is quite simple.
FROM python:slim ARG IMAGE_COMMITTER ARG IMAGE_DOCKERFILE ARG IMAGE_COMMIT_SHA LABEL "build.user"=${IMAGE_COMMITTER} LABEL "build.sha"=${IMAGE_COMMIT_SHA} LABEL "build.dockerfile"=${IMAGE_DOCKERFILE} ADD ./samples/dynamic-labels/source / CMD ["python", "/show_environment.py"]
Lines 2-4 define the build arguments that need to be set during the build of the image. Lines 5-7 set the three labels build.user
, build.sha
, and build.dockerfile
that we want to embed in the image. build.dockerfile
is the URL to the Dockerfile in the GitHub repository, while the build.sha
is the Git commit that triggers the build. If you build the image locally with some dummy build arguments you will see that new layers are created for each of the lines 5-7.
toddysm@MacBook-Pro ~ % docker build -t test --build-arg IMAGE_COMMITTER=toddysm --build-arg IMAGE_DOCKERFILE=https://test.com --build-arg IMAGE_COMMIT_SHA=12345 -f .\samples\dynamic-labels\Dockerfile . Sending build context to Docker daemon 376.3kB Step 1/9 : FROM python:slim ---> 8c84baace4b3 Step 2/9 : ARG IMAGE_COMMITTER ---> Running in 71ad05f20d20 Removing intermediate container 71ad05f20d20 ---> fe56c62b9903 Step 3/9 : ARG IMAGE_DOCKERFILE ---> Running in fe468c44e9fc Removing intermediate container fe468c44e9fc ---> b776dca57bd7 Step 4/9 : ARG IMAGE_COMMIT_SHA ---> Running in 849a82225c31 Removing intermediate container 849a82225c31 ---> 3a4c6c23a699 Step 5/9 : LABEL "build.user"=${IMAGE_COMMITTER} ---> Running in fd4bfb8d5b5b Removing intermediate container fd4bfb8d5b5b ---> 2e9be17c48ff Step 6/9 : LABEL "build.sha"=${IMAGE_COMMIT_SHA} ---> Running in 892323d73495 Removing intermediate container 892323d73495 ---> b7bc6559629d Step 7/9 : LABEL "build.dockerfile"=${IMAGE_DOCKERFILE} ---> Running in 98687b8dd9fb Removing intermediate container 98687b8dd9fb ---> 35e97d273cbc Step 8/9 : ADD ./samples/dynamic-labels/source / ---> 9e71859892b1 Step 9/9 : CMD ["python", "/show_environment.py"] ---> Running in 366b1b6c3bea Removing intermediate container 366b1b6c3bea ---> e7cb39a21c2a Successfully built e7cb39a21c2a Successfully tagged test:latest
You can inspect the image and see the labels by issuing the command docker image inspect --format='{{json .Config.Labels}}' <imagename>
.
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' test | jq { "build.dockerfile":"https://test.com", "build.sha":"12345", "build.user":"toddysm" }
Now, let’s automate the process with the help of GitHub Actions. I have created one GitHub Action to build and push the image to DockerHub and another to build and push to Azure Container Registry (ACR). Both actions are similar in the steps they use. The first two steps are the same for both actions. They will build the URL to the Dockerfile using the corresponding GitHub Actions variables:
- name: 'Set environment variable for Dockerfile URL for push' if: ${{ github.event_name == 'push' }} run: echo "DOCKERFILE_URL=${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/blob/${GITHUB_REF#refs/*/}/samples/dynamic-labels/Dockerfile" >> $GITHUB_ENV - name: 'Set environment variable for Dockerfile URL for pull request' if: ${{ github.event_name == 'pull_request' }} run: echo "DOCKERFILE_URL=${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/blob/${GITHUB_BASE_REF#refs/*/}/samples/dynamic-labels/Dockerfile" >> $GITHUB_ENV
Then, there will be specific steps to sign into DockerHub or Azure. After that, the build steps are the ones where the labels are set. Here, for example, is the build step that buildx
and automatically pushes the image to DockerHub:
- name: Build and push id: docker_build uses: docker/build-push-action@v2 with: context: ./ file: ./samples/dynamic-labels/Dockerfile push: true tags: ${{ secrets.DOCKER_HUB_REPONAME }}:build-${{ github.run_number }} build-args: | IMAGE_COMMITTER=${{ github.actor }} IMAGE_DOCKERFILE=${{ env.DOCKERFILE_URL }} IMAGE_COMMIT_SHA=${{ github.sha }}
The build step for building the image and pushing to Azure Container Registry uses the traditional docker build
approach:
- name: Build and push id: docker_build uses: azure/docker-login@v1 with: login-server: ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }} username: ${{ secrets.ACR_REGISTRY_USERNAME }} password: ${{ secrets.ACR_REGISTRY_PASSWORD }} - run: | docker build -f ./samples/dynamic-labels/Dockerfile -t ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}/${{ secrets.ACR_REPOSITORY_NAME }}:build-${{ github.run_number }} --build-arg IMAGE_COMMITTER=${{ github.actor }} --build-arg IMAGE_DOCKERFILE=${{ env.DOCKERFILE_URL }} --build-arg IMAGE_COMMIT_SHA=${{ github.sha }} . docker push ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}/${{ secrets.ACR_REPOSITORY_NAME }}:build-${{ github.run_number }}
After the actions complete, the images are available in DockerHub and Azure Container Registry. Here is how the image looks like in DockerHub:
If you scroll down a little, you will see the labels that appear in the list of layers:
The URL points you to the Dockerfile that was used to create the image while the commit SHA can be used to identify the latest changes that are done on the project that is used to build the image. If you pull the image locally, you can also see the labels using the command:
toddysm@MacBook-Pro ~ % docker pull toddysm/tmstests:build-36 build-36: Pulling from toddysm/tmstests 45b42c59be33: Already exists 8cd3485318db: Already exists 2f564129f025: Pull complete cf1573f5a21e: Pull complete ceec8aed2dab: Pull complete 78b1088f77a0: Pull complete Digest: sha256:7862c2a31970916fd50d3ab38de0dad74a180374d41625f014341c90c4b55758 Status: Downloaded newer image for toddysm/tmstests:build-36 docker.io/toddysm/tmstests:build-36 toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' toddysm/tmstests:build-36 { "build.dockerfile":"https://github.com/CrimsonPinnacle/container-image-inspector/blob/development/samples/dynamic-labels/Dockerfile", "build.sha":"e80e6ef86f86a11d6a73aea8d8c41700c4d3d7c5", "build.user":"toddysm" }
To summarize, the benefit of using labels for embedding the Dockerfile and other origin information into the container images is that those are considered immutable layers of the image. Thus, they cannot be changed without changing the image.
Who is Using Docker Image Labels?
Unfortunately, labels are not widely used if at all đ Checking several popular images from DockerHub yields the following results:
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' busybox | jq null toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' alpine | jq null toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' ubuntu | jq null
Tracking down the sources from which the Alpine image is built would require much higher effort.
What is Next for Checking Docker Image Origins?
There are a couple of community initiatives that will play a role in determining the origin of container images.
- Notary V2 will allow images to be signed. Having the origin information embedded into the image and adding an official signature to the image will increase the confidence in the legitimacy of the image.
- OCI manifest specification allows artifacts (i.e. images) to be annotated with arbitrary metadata. Unfortunately, Docker doesn’t support those yet. Hopefully, in the future, Docker images will add support for arbitrary metadata that can be included in the image manifest.
- An implementation of metadata service (see metadata service draft from Steve Lasker) as part of the registry will enable additional capabilities to provide origin information for the images.
Summary
While image metadata is great to annotate images with useful information and enable search and querying capabilities, the metadata is kept outside of the image layers and can mutate over time. Verifying the authenticity of the metadata and keeping a history of the changes will be a harder problem to solve. Docker already provides a way to embed the Dockerfile and other image origin information as immutable layers of the image itself. Using dynamically populated Docker image labels, developers can right now provide origin information and increase the supply chain confidence for their images.