Friendly face on parachute

How often the following happens to you? You write your client code, you call an API, and receive a 404 Not found response. You start investigating the issue in your code; change a line here or there; spend hours troubleshooting just to find out that the issue is on the server-side, and you can’t do anything about it. Well, welcome to the microservices world! A common mistake I often see developers make is returning an improper response code or passing through the response code from another service.

Let’s see how we can avoid this. But first, a crash course on modern applications implemented with microservices and HTTP status response codes.

How Modern Microservices Applications Work?

I will try to avoid going deep into the philosophical reasons why we need microservices and the benefits (or disadvantages) of using them. This is not the point of this post.

We will start with a simple picture.

Microservices ApplicationAs you can see in the picture, we have a User that interacts with the Client Application that calls Microservice #1 to retrieve some information from the server (aka the cloud 🙂). The Client Application may need to call multiple (micro)services to retrieve all the information the User needs. Still, the part we will concentrate on is that Microservice #1 itself can call other services (Microservice #2 in this simple example) on the backend to perform its business logic. In a complex application (especially if not well architected), the chain of service calls may go well beyond two. But let’s stick with two for now. Also, let’s assume that Microservice #1 and Microservice #2 use REST, and their responses use the HTTP response status codes.

A basic call flow can be something like this. I also include the appropriate HTTP status response codes in each step.

  1. The User clicks on a button in the Client Application.
  2. The Client Application makes an HTTP request to Microservice #1.
  3. Microservice #1 needs additional business logic to complete the request and make an HTTP call to Microservice #2.
  4. Microservice #2 performs the additional business logic and responds to Microservice #1 using a 200 OK response code.
  5. Microservice #1 completes the business logic and responds to the Client Application with a 200 OK response code.
  6. The Client Application performs the action that is attached to the button, and the User is happy.

This is the so-called happy path. Everybody expects the flow to be executed as described above. If everything goes as planned, we don’t need to think anymore and implement the functionality behind the next button. Unfortunately, things often don’t go as planned.

What Can Go Wrong?

Many things! Or at a minimum, the following:

  1. The Client Application fails because of a bug before it even calls Microservice #1.
  2. The Client Application sends invalid input when calling Microservice #1.
  3. Microservice #1 fails before calling Microservice #2.
  4. Microservice #1 sends invalid input when calling Microservice #2.
  5. Microservice #2 fails while performing its business logic.
  6. Microservice #1 fails after calling Microservice #2.
  7. The Client Application fails after Microservice #1 responds.

For those cases (non-happy path? or maybe sad-path? 😉 ) the designers of the HTTP protocol wisely specified two separate sets of response codes:

The guidance for those is quite simple:

  • Client errors should be returned if the client did something wrong. In such cases, the client can change the parameters of the request and fix the issue. The important thing to remember is that the client can fix the issue without any changes on the server-side.
    A typical example is the famous 404 Not found error. If you (the user) mistype the URL path in the browser address bar, the browser (the client application) will request the wrong resource from the server (Microservice #1 in this case). The server (Microservice #1) will respond with a 404 Not found error to the browser (the client application) and the browser will show you “Oops, we couldn’t find the page” message. Well, in the past the browser just showed you the 404 Not found error but we learned a long time ago that this is not user-friendly (You see where I am going with this, right?).
  • Server errors should be returned if the issue occurred on the server-side and the client (and the user) cannot do anything to fix it.
    A simple example is a wrong connection string in the service configuration (Microservice #1 in our case). If the connection string used to configure Microservice #1 with the endpoint and credentials for Microservice #2 is wrong, the client application and the user cannot do anything to fix it. The most appropriate error to return in this case would be 500 Internal server error.

Pretty simple and logical, right? Though, one thing, we as engineers often forget, is who the client and who the server is.

So, Who Is the Client and Who Is the Server?

First, the client and server are two system components that interact directly with each other (think, no intermediaries). If we take the picture from above and change the labels of the arrows, it becomes pretty obvious.

Microservices Application Clients and Servers

We have three clients and three servers:

  • The user is a client of the client application, and the client application is a server for the user.
  • The client application is a client of Microservice #1, and Microservice #1 is a server for the client application.
  • Microservice #1 is a client of Microservice #2, and Microservice #2 is a server for Microservice #1.

Having this picture in mind, the engineers implementing each one of the microservices should think about the most appropriate response code for their immediate client using the guidelines above. It is better if we use examples to explain what response codes each service should return in different situations.

What HTTP Response Codes Should Microservices Return?

A few days ago I was discussing the following situation with one of our engineers. Our service, Azure Container Service (ACR), has a security feature allowing customers to encrypt their container images using customer-managed keys (CMK). For this feature to work, customers need to upload a key in Azure Key Vault (AKV). When the Docker client tries to pull an image, ACR retrieves the key from AKV, decrypts the image, and sends it back to the Docker client. (BTW, I know that ACR and AKV are not microservices 🙂 ) Here is a visual:

Docker pull encrypted image from ACR

In the happy-path scenario, everything works as expected. However, a customer submitted a support request complaining that he is not able to pull his images from ACR. When he tries to pull an image using the Docker client, he receives a 404 Not found error, but when he checks in the Azure Portal, he is able to see the image in the list.

Because the customer couldn’t figure it out by himself, he submitted a support request. The support engineer was also not able to figure out the issue, and had to escalate to the product group. It turned out that the customer deleted the Key Vault and ACR was not able to retrieve the key to decrypt the image. However, the implemented flow looked like this:

  1. Docker client requests an image from ACR.
  2. ACR sees that the image is encrypted and requests the key from the Key Vault.
  3. The Azure Key Vault service looks up the key and figures out that the key (or the whole Key Vault) is missing.
  4. Azure Key Vault returns 404 Not found to ACR for the key ACR tries to access.
  5. ACR passes through the 404 Not found error to the Docker client.
  6. Docker client shows a message to the user that the image cannot be found.

The end result: everybody is confused! Why?

Where the Confusion Comes From?

The investigation chain goes from left to right: Docker client –> ACR –> AKV. Both the customer and the support engineer were concentrated on figuring out why the image is missing in ACR. They were looking only at the Docker client –> ACR part of the chain. The customer’s assumption was that the Docker client is doing something wrong, i.e. requesting the wrong image. This would be the correct assumption because 404 Not found is a client error telling the client that is requesting something that doesn’t exist. Hence, the customer checked the portal and when he saw the image in the list, he was puzzled. The next assumption is that something is wrong on the ACR side. Here is where the customer decided to submit a support request for somebody to check if the data in ACR is corrupted. The support engineer checked the ACR backend and all the data was in sync.

This is a great example where the wrong HTTP response code can send the whole investigation into a rabbit hole. To avoid that, here is the guidance! Microservices should return response codes that are relevant to the business logic they implement and ones that help the client take appropriate actions. “Well”, you will say: “Isn’t that the whole point of HTTP status response codes?” It is! But for whatever reasons, we continue to break this rule. The key words in the above guidance are “the business logic they implement”, not the business logic of the services they call. (By the way, this is the same with exceptions. You don’t catch generic Exception, you catch SpecificException. You don’t pass through exceptions, you catch them and wrap them in a useful way for the calling code).

Business Logic and Friendly HTTP Response Codes

Think about the business logic of each one of the services above!

One way to decide which HTTP response code to return is to think about the resource your microservice is handling. ACR is the service responsible for handling the container images. The business logic that ACR implements should provide status codes relavant to the “business” of images. Azure Key Vault implement business logic that handles keys, secrets, and certificates (not images). Key Vault should return status codes that are relevant to the keys, secrets, and certificates. Azure Key Vault is a downstream service and cannot know what the key is used for, hence cannot provide details to the upstream client (Docker) what the error is. It is responsibility of the ACR to provide the approapriate status code to the upstream client.

Here is how the flow in the above scenario should be implemented:

  1. Docker client requests an image from ACR.
  2. ACR sees that the image is encrypted and requests the key from the Key Vault.
  3. The Azure Key Vault service looks up the key and figures out that the key (or the whole Key Vault) is missing.
  4. Azure Key Vault returns 404 Not found to ACR for the key ACR tries to access.
  5. ACR handles the 404 Not found from Azure Key Vault but wraps it in a error that is relevant to the requested image.
  6. Instead 404 Not found, ACR returns 500 Internal server error with a message clarifying the issue.
  7. Docker client shows a message to the user that it cannot pull the image because of an issue on the server.

The Q&A Approach

Another way that you can use to decide what response code to return is to take the Questions-and-Answers approach and build a simple IF-THEN logic (aka. decition tree). Here is how this can work for our example:

  • Docker: Pull image from ACR
    • ACR: Q: Is the image ivailable?
      • A: Yes
        (Note to myself: Requesting the image cannot be a client error anymore.)

        • Q: Is the image encrypted?
          • A: Yes
            • ACR: Request the key from Key Vault
              • AKV: Q: Is the key available?
                • A: Yes
                  • AKV: Return the key to ACR
                • A: No
                  • AKV: Return 404 [key] Not found error
            • ACR: Q: Did I get a key?
              • A: Yes
                • ACR: Decrypt the image
                • ACR: Return 200 OK with the image payload
              • A: No (I got 404 [key] Not found)
                • ACR: I cannot decrypt the image
                  (Note to myself: There is nothing the client did wrong! It is all the server fault)
                • ACR: Return 500 Internal server error “I cannot decrypt the image”
          • A: No (image is not encrypted)
            • ACR: Return 200 OK with the image payload
      • A: No (image does not exist)
        • ACR: Return 404 [image] Not found error

Note that the above flow is simplified. For example, in a real implementation, you may need to check if the client is authenticated and authorized to pull the image. Nevertheless, the concept is the same – you will just need to have more Q&As.

Summary

As you can see, it is important to be careful what HTTP response codes you return from your microservices. If you return the wrong message, you may end up with more work than you expect. Here are the main points that is worth remembering:

  • Return 400 errors only if the client can do something to fix the issue. If the client cannot do anything to fix it, 500 errors are the only appropriate ones.
  • Do not pass through the response codes you receive from upstream services. Handle each response from upstream services and wrap it according to the business logic you are implementing.
  • When implementing your services, think about the resource you are handling in those services. Return HTTP status response codes that are relevant to the resource you are handling.
  • Use the Q&A approach to decide what is the appropriate response code to return for your service and the resource that is requested by the client.

By using those guidelines, your microservices will become more friendly and easier to troubleshoot.

Featured image by Nick Page on Unsplash

With the recent Solorigate incident, a lot of emphasis is put on determining the origin of the software running in an enterprise. For Docker container images, this will mean to embed in the image the Dockerfile the image was built from. However, tracking down the software origin is not so trivial to do. For closed-source software, we blindly trust the vendors and if we are lucky enough, we may get a signed piece of code. For open-source one, we rarely check the SHA signature and never even think of verifying what source code this binary was produced from. In talks with customers, I quite often hear them asking, how can they verify what sources a container image is built from. They want to attribute each image with metadata that links to the Dockerfile used to build the image as well as the Git commit and the developer who triggered the build.

There are many articles that discuss this problem. Here are two recent examples. Richard Lander from the Microsoft .NET team writes in his blog post Staying safe with .NET containers about the pedigree and provenance of the software we run and how to think about it. Josh Hendrick in his post Embedding source code version information in Docker images offers one solution to the problem.

Josh Hendrick’s proposal is in the direction I would go, but one problem I have with it is that it requires special handling in the application that runs in the container to obtain this information. I would prefer to have this information readily available without the need to run the container image. Docker images and the Open Container Initiative already have specified ways to do that without adding special files to your image. In this post, I will outline another way you can embed this information into your images and easily retrieve it without any changes to your application.

Using Docker Image Labels

Docker images spec has already built-in functionality to add labels to the image. Labels are intended to be set during build time. They also show up when inspecting the image using docker image inspect, which makes them the right choice to specify the Dockerfile and the other build origin details. One more argument that makes them the right choice for this information is that the labels are layers in the image, and thus immutable. If you change the label in an image the resulting image SHA will change.

To demonstrate how labels can be used to embed the Dockerfile and other origin information into the Docker image, I have published a dynamic labels sample on GitHub. The sample uses a base Python image and implements a simple functionality to print the container’s environment variables. Let’s walk through it step by step.

The Dockerfile is quite simple.

FROM python:slim
ARG IMAGE_COMMITTER
ARG IMAGE_DOCKERFILE
ARG IMAGE_COMMIT_SHA
LABEL "build.user"=${IMAGE_COMMITTER}
LABEL "build.sha"=${IMAGE_COMMIT_SHA}
LABEL "build.dockerfile"=${IMAGE_DOCKERFILE}
ADD ./samples/dynamic-labels/source /
CMD ["python", "/show_environment.py"]

Lines 2-4 define the build arguments that need to be set during the build of the image. Lines 5-7 set the three labels build.user, build.sha, and build.dockerfilethat we want to embed in the image. build.dockerfile is the URL to the Dockerfile in the GitHub repository, while the build.sha is the Git commit that triggers the build. If you build the image locally with some dummy build arguments you will see that new layers are created for each of the lines 5-7.

toddysm@MacBook-Pro ~ % docker build -t test --build-arg IMAGE_COMMITTER=toddysm --build-arg IMAGE_DOCKERFILE=https://test.com --build-arg IMAGE_COMMIT_SHA=12345 -f .\samples\dynamic-labels\Dockerfile .
Sending build context to Docker daemon  376.3kB
Step 1/9 : FROM python:slim
 ---> 8c84baace4b3
Step 2/9 : ARG IMAGE_COMMITTER
 ---> Running in 71ad05f20d20
Removing intermediate container 71ad05f20d20
 ---> fe56c62b9903
Step 3/9 : ARG IMAGE_DOCKERFILE
 ---> Running in fe468c44e9fc
Removing intermediate container fe468c44e9fc
 ---> b776dca57bd7
Step 4/9 : ARG IMAGE_COMMIT_SHA
 ---> Running in 849a82225c31
Removing intermediate container 849a82225c31
 ---> 3a4c6c23a699
Step 5/9 : LABEL "build.user"=${IMAGE_COMMITTER}
 ---> Running in fd4bfb8d5b5b
Removing intermediate container fd4bfb8d5b5b
 ---> 2e9be17c48ff
Step 6/9 : LABEL "build.sha"=${IMAGE_COMMIT_SHA}
 ---> Running in 892323d73495
Removing intermediate container 892323d73495
 ---> b7bc6559629d
Step 7/9 : LABEL "build.dockerfile"=${IMAGE_DOCKERFILE}
 ---> Running in 98687b8dd9fb
Removing intermediate container 98687b8dd9fb
 ---> 35e97d273cbc
Step 8/9 : ADD ./samples/dynamic-labels/source /
 ---> 9e71859892b1
Step 9/9 : CMD ["python", "/show_environment.py"]
 ---> Running in 366b1b6c3bea
Removing intermediate container 366b1b6c3bea
 ---> e7cb39a21c2a
Successfully built e7cb39a21c2a
Successfully tagged test:latest

You can inspect the image and see the labels by issuing the command docker image inspect --format='{{json .Config.Labels}}' <imagename>.

toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' test | jq
{
  "build.dockerfile":"https://test.com",
  "build.sha":"12345",
  "build.user":"toddysm"
}

Now, let’s automate the process with the help of GitHub Actions. I have created one GitHub Action to build and push the image to DockerHub and another to build and push to Azure Container Registry (ACR). Both actions are similar in the steps they use. The first two steps are the same for both actions. They will build the URL to the Dockerfile using the corresponding GitHub Actions variables:

- name: 'Set environment variable for Dockerfile URL for push'
  if: ${{ github.event_name == 'push' }}
  run: echo "DOCKERFILE_URL=${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/blob/${GITHUB_REF#refs/*/}/samples/dynamic-labels/Dockerfile" >> $GITHUB_ENV

- name: 'Set environment variable for Dockerfile URL for pull request'
  if: ${{ github.event_name == 'pull_request' }}
  run: echo "DOCKERFILE_URL=${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/blob/${GITHUB_BASE_REF#refs/*/}/samples/dynamic-labels/Dockerfile" >> $GITHUB_ENV

Then, there will be specific steps to sign into DockerHub or Azure. After that, the build steps are the ones where the labels are set. Here, for example, is the build step that buildx and automatically pushes the image to DockerHub:

- name: Build and push
  id: docker_build
  uses: docker/build-push-action@v2
  with:
    context: ./
    file: ./samples/dynamic-labels/Dockerfile
    push: true
    tags: ${{ secrets.DOCKER_HUB_REPONAME }}:build-${{ github.run_number }}
    build-args: |
      IMAGE_COMMITTER=${{ github.actor }}
      IMAGE_DOCKERFILE=${{ env.DOCKERFILE_URL }}
      IMAGE_COMMIT_SHA=${{ github.sha }}

The build step for building the image and pushing to Azure Container Registry uses the traditional docker build approach:

- name: Build and push
  id: docker_build
  uses: azure/docker-login@v1
  with:
    login-server: ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}
    username: ${{ secrets.ACR_REGISTRY_USERNAME }}
    password: ${{ secrets.ACR_REGISTRY_PASSWORD }}
- run: |
    docker build -f ./samples/dynamic-labels/Dockerfile -t ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}/${{ secrets.ACR_REPOSITORY_NAME }}:build-${{ github.run_number }} --build-arg IMAGE_COMMITTER=${{ github.actor }} --build-arg IMAGE_DOCKERFILE=${{ env.DOCKERFILE_URL }} --build-arg IMAGE_COMMIT_SHA=${{ github.sha }} .
    docker push ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}/${{ secrets.ACR_REPOSITORY_NAME }}:build-${{ github.run_number }}

After the actions complete, the images are available in DockerHub and Azure Container Registry. Here is how the image looks like in DockerHub:

Docker container image with labels

If you scroll down a little, you will see the labels that appear in the list of layers:

The URL points you to the Dockerfile that was used to create the image while the commit SHA can be used to identify the latest changes that are done on the project that is used to build the image. If you pull the image locally, you can also see the labels using the command:

toddysm@MacBook-Pro ~ % docker pull toddysm/tmstests:build-36
build-36: Pulling from toddysm/tmstests
45b42c59be33: Already exists
8cd3485318db: Already exists
2f564129f025: Pull complete
cf1573f5a21e: Pull complete
ceec8aed2dab: Pull complete
78b1088f77a0: Pull complete
Digest: sha256:7862c2a31970916fd50d3ab38de0dad74a180374d41625f014341c90c4b55758
Status: Downloaded newer image for toddysm/tmstests:build-36
docker.io/toddysm/tmstests:build-36
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' toddysm/tmstests:build-36
{
  "build.dockerfile":"https://github.com/CrimsonPinnacle/container-image-inspector/blob/development/samples/dynamic-labels/Dockerfile",
  "build.sha":"e80e6ef86f86a11d6a73aea8d8c41700c4d3d7c5",
  "build.user":"toddysm"
}

To summarize, the benefit of using labels for embedding the Dockerfile and other origin information into the container images is that those are considered immutable layers of the image. Thus, they cannot be changed without changing the image.

Who is Using Docker Image Labels?

Unfortunately, labels are not widely used if at all 🙁 Checking several popular images from DockerHub yields the following results:

toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' busybox | jq
null
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' alpine | jq 
null
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' ubuntu | jq 
null

Tracking down the sources from which the Alpine image is built would require much higher effort.

What is Next for Checking Docker Image Origins?

There are a couple of community initiatives that will play a role in determining the origin of container images.

  • Notary V2 will allow images to be signed. Having the origin information embedded into the image and adding an official signature to the image will increase the confidence in the legitimacy of the image.
  • OCI manifest specification allows artifacts (i.e. images) to be annotated with arbitrary metadata. Unfortunately, Docker doesn’t support those yet. Hopefully, in the future, Docker images will add support for arbitrary metadata that can be included in the image manifest.
  • An implementation of metadata service (see metadata service draft from Steve Lasker) as part of the registry will enable additional capabilities to provide origin information for the images.

Summary

While image metadata is great to annotate images with useful information and enable search and querying capabilities, the metadata is kept outside of the image layers and can mutate over time. Verifying the authenticity of the metadata and keeping a history of the changes will be a harder problem to solve. Docker already provides a way to embed the Dockerfile and other image origin information as immutable layers of the image itself. Using dynamically populated Docker image labels, developers can right now provide origin information and increase the supply chain confidence for their images.