Friendly face on parachute

How often the following happens to you? You write your client code, you call an API, and receive a 404 Not found response. You start investigating the issue in your code; change a line here or there; spend hours troubleshooting just to find out that the issue is on the server-side, and you can’t do anything about it. Well, welcome to the microservices world! A common mistake I often see developers make is returning an improper response code or passing through the response code from another service.

Let’s see how we can avoid this. But first, a crash course on modern applications implemented with microservices and HTTP status response codes.

How Modern Microservices Applications Work?

I will try to avoid going deep into the philosophical reasons why we need microservices and the benefits (or disadvantages) of using them. This is not the point of this post.

We will start with a simple picture.

Microservices ApplicationAs you can see in the picture, we have a User that interacts with the Client Application that calls Microservice #1 to retrieve some information from the server (aka the cloud 🙂). The Client Application may need to call multiple (micro)services to retrieve all the information the User needs. Still, the part we will concentrate on is that Microservice #1 itself can call other services (Microservice #2 in this simple example) on the backend to perform its business logic. In a complex application (especially if not well architected), the chain of service calls may go well beyond two. But let’s stick with two for now. Also, let’s assume that Microservice #1 and Microservice #2 use REST, and their responses use the HTTP response status codes.

A basic call flow can be something like this. I also include the appropriate HTTP status response codes in each step.

  1. The User clicks on a button in the Client Application.
  2. The Client Application makes an HTTP request to Microservice #1.
  3. Microservice #1 needs additional business logic to complete the request and make an HTTP call to Microservice #2.
  4. Microservice #2 performs the additional business logic and responds to Microservice #1 using a 200 OK response code.
  5. Microservice #1 completes the business logic and responds to the Client Application with a 200 OK response code.
  6. The Client Application performs the action that is attached to the button, and the User is happy.

This is the so-called happy path. Everybody expects the flow to be executed as described above. If everything goes as planned, we don’t need to think anymore and implement the functionality behind the next button. Unfortunately, things often don’t go as planned.

What Can Go Wrong?

Many things! Or at a minimum, the following:

  1. The Client Application fails because of a bug before it even calls Microservice #1.
  2. The Client Application sends invalid input when calling Microservice #1.
  3. Microservice #1 fails before calling Microservice #2.
  4. Microservice #1 sends invalid input when calling Microservice #2.
  5. Microservice #2 fails while performing its business logic.
  6. Microservice #1 fails after calling Microservice #2.
  7. The Client Application fails after Microservice #1 responds.

For those cases (non-happy path? or maybe sad-path? 😉 ) the designers of the HTTP protocol wisely specified two separate sets of response codes:

The guidance for those is quite simple:

  • Client errors should be returned if the client did something wrong. In such cases, the client can change the parameters of the request and fix the issue. The important thing to remember is that the client can fix the issue without any changes on the server-side.
    A typical example is the famous 404 Not found error. If you (the user) mistype the URL path in the browser address bar, the browser (the client application) will request the wrong resource from the server (Microservice #1 in this case). The server (Microservice #1) will respond with a 404 Not found error to the browser (the client application) and the browser will show you “Oops, we couldn’t find the page” message. Well, in the past the browser just showed you the 404 Not found error but we learned a long time ago that this is not user-friendly (You see where I am going with this, right?).
  • Server errors should be returned if the issue occurred on the server-side and the client (and the user) cannot do anything to fix it.
    A simple example is a wrong connection string in the service configuration (Microservice #1 in our case). If the connection string used to configure Microservice #1 with the endpoint and credentials for Microservice #2 is wrong, the client application and the user cannot do anything to fix it. The most appropriate error to return in this case would be 500 Internal server error.

Pretty simple and logical, right? Though, one thing, we as engineers often forget, is who the client and who the server is.

So, Who Is the Client and Who Is the Server?

First, the client and server are two system components that interact directly with each other (think, no intermediaries). If we take the picture from above and change the labels of the arrows, it becomes pretty obvious.

Microservices Application Clients and Servers

We have three clients and three servers:

  • The user is a client of the client application, and the client application is a server for the user.
  • The client application is a client of Microservice #1, and Microservice #1 is a server for the client application.
  • Microservice #1 is a client of Microservice #2, and Microservice #2 is a server for Microservice #1.

Having this picture in mind, the engineers implementing each one of the microservices should think about the most appropriate response code for their immediate client using the guidelines above. It is better if we use examples to explain what response codes each service should return in different situations.

What HTTP Response Codes Should Microservices Return?

A few days ago I was discussing the following situation with one of our engineers. Our service, Azure Container Service (ACR), has a security feature allowing customers to encrypt their container images using customer-managed keys (CMK). For this feature to work, customers need to upload a key in Azure Key Vault (AKV). When the Docker client tries to pull an image, ACR retrieves the key from AKV, decrypts the image, and sends it back to the Docker client. (BTW, I know that ACR and AKV are not microservices 🙂 ) Here is a visual:

Docker pull encrypted image from ACR

In the happy-path scenario, everything works as expected. However, a customer submitted a support request complaining that he is not able to pull his images from ACR. When he tries to pull an image using the Docker client, he receives a 404 Not found error, but when he checks in the Azure Portal, he is able to see the image in the list.

Because the customer couldn’t figure it out by himself, he submitted a support request. The support engineer was also not able to figure out the issue, and had to escalate to the product group. It turned out that the customer deleted the Key Vault and ACR was not able to retrieve the key to decrypt the image. However, the implemented flow looked like this:

  1. Docker client requests an image from ACR.
  2. ACR sees that the image is encrypted and requests the key from the Key Vault.
  3. The Azure Key Vault service looks up the key and figures out that the key (or the whole Key Vault) is missing.
  4. Azure Key Vault returns 404 Not found to ACR for the key ACR tries to access.
  5. ACR passes through the 404 Not found error to the Docker client.
  6. Docker client shows a message to the user that the image cannot be found.

The end result: everybody is confused! Why?

Where the Confusion Comes From?

The investigation chain goes from left to right: Docker client –> ACR –> AKV. Both the customer and the support engineer were concentrated on figuring out why the image is missing in ACR. They were looking only at the Docker client –> ACR part of the chain. The customer’s assumption was that the Docker client is doing something wrong, i.e. requesting the wrong image. This would be the correct assumption because 404 Not found is a client error telling the client that is requesting something that doesn’t exist. Hence, the customer checked the portal and when he saw the image in the list, he was puzzled. The next assumption is that something is wrong on the ACR side. Here is where the customer decided to submit a support request for somebody to check if the data in ACR is corrupted. The support engineer checked the ACR backend and all the data was in sync.

This is a great example where the wrong HTTP response code can send the whole investigation into a rabbit hole. To avoid that, here is the guidance! Microservices should return response codes that are relevant to the business logic they implement and ones that help the client take appropriate actions. “Well”, you will say: “Isn’t that the whole point of HTTP status response codes?” It is! But for whatever reasons, we continue to break this rule. The key words in the above guidance are “the business logic they implement”, not the business logic of the services they call. (By the way, this is the same with exceptions. You don’t catch generic Exception, you catch SpecificException. You don’t pass through exceptions, you catch them and wrap them in a useful way for the calling code).

Business Logic and Friendly HTTP Response Codes

Think about the business logic of each one of the services above!

One way to decide which HTTP response code to return is to think about the resource your microservice is handling. ACR is the service responsible for handling the container images. The business logic that ACR implements should provide status codes relavant to the “business” of images. Azure Key Vault implement business logic that handles keys, secrets, and certificates (not images). Key Vault should return status codes that are relevant to the keys, secrets, and certificates. Azure Key Vault is a downstream service and cannot know what the key is used for, hence cannot provide details to the upstream client (Docker) what the error is. It is responsibility of the ACR to provide the approapriate status code to the upstream client.

Here is how the flow in the above scenario should be implemented:

  1. Docker client requests an image from ACR.
  2. ACR sees that the image is encrypted and requests the key from the Key Vault.
  3. The Azure Key Vault service looks up the key and figures out that the key (or the whole Key Vault) is missing.
  4. Azure Key Vault returns 404 Not found to ACR for the key ACR tries to access.
  5. ACR handles the 404 Not found from Azure Key Vault but wraps it in a error that is relevant to the requested image.
  6. Instead 404 Not found, ACR returns 500 Internal server error with a message clarifying the issue.
  7. Docker client shows a message to the user that it cannot pull the image because of an issue on the server.

The Q&A Approach

Another way that you can use to decide what response code to return is to take the Questions-and-Answers approach and build a simple IF-THEN logic (aka. decition tree). Here is how this can work for our example:

  • Docker: Pull image from ACR
    • ACR: Q: Is the image ivailable?
      • A: Yes
        (Note to myself: Requesting the image cannot be a client error anymore.)

        • Q: Is the image encrypted?
          • A: Yes
            • ACR: Request the key from Key Vault
              • AKV: Q: Is the key available?
                • A: Yes
                  • AKV: Return the key to ACR
                • A: No
                  • AKV: Return 404 [key] Not found error
            • ACR: Q: Did I get a key?
              • A: Yes
                • ACR: Decrypt the image
                • ACR: Return 200 OK with the image payload
              • A: No (I got 404 [key] Not found)
                • ACR: I cannot decrypt the image
                  (Note to myself: There is nothing the client did wrong! It is all the server fault)
                • ACR: Return 500 Internal server error “I cannot decrypt the image”
          • A: No (image is not encrypted)
            • ACR: Return 200 OK with the image payload
      • A: No (image does not exist)
        • ACR: Return 404 [image] Not found error

Note that the above flow is simplified. For example, in a real implementation, you may need to check if the client is authenticated and authorized to pull the image. Nevertheless, the concept is the same – you will just need to have more Q&As.

Summary

As you can see, it is important to be careful what HTTP response codes you return from your microservices. If you return the wrong message, you may end up with more work than you expect. Here are the main points that is worth remembering:

  • Return 400 errors only if the client can do something to fix the issue. If the client cannot do anything to fix it, 500 errors are the only appropriate ones.
  • Do not pass through the response codes you receive from upstream services. Handle each response from upstream services and wrap it according to the business logic you are implementing.
  • When implementing your services, think about the resource you are handling in those services. Return HTTP status response codes that are relevant to the resource you are handling.
  • Use the Q&A approach to decide what is the appropriate response code to return for your service and the resource that is requested by the client.

By using those guidelines, your microservices will become more friendly and easier to troubleshoot.

Featured image by Nick Page on Unsplash

In my previous post What to Desire from a Good Image Annotator?, I wrote about the high-level capabilities of an Image Annotation Tool. In this one, I will go over the requirements for the actual image annotations or as you may also know it, tagging. I will use two images as examples. The first one is a scanned receipt. The receipt example can be used to generalize the broader category of scanned documents, whether financial, legal, or others. The second example is of a cityscape. That one can be used to generalize any other image.

Annotating Store Receipt

Let’s start with the receipt. A receipt is a scanned document that contains financial information. Below is just one way that you may want to annotate a receipt.

Annotated receipt

In this example, I have decided to annotate the receipt using the logical grouping of information printed on it. Each region is a rectangle that contains the part of the image that belongs together. Here is the list of regions and their possible annotations:

  • Region ID: 1
    Annotation: Store Logo
    Description: This can be the store logo or just the name printed on the receipt
  • Region ID: 2
    Annotation: Store Details
    Description: This can include information like address, phone number, store number, etc.
  • Region ID: 3
    Annotation: Receipt Metadata
    Description: This can be the date and time, receipt number as well as another receipt specific metadata
  • Region ID: 4
    Annotation: Cashier Details
    Description: This is information about the cashier
  • Region ID: 5
    Annotation: Items
    Description: Those are the purchased items, the quantities and the individual item price
  • Region ID: 6
    Annotation: Receipt Summary
    Description: This is the summary of the information for the purchase like subtotal amount, tax and the total amount
  • Region ID: 7
    Annotation: Customer Information
    Description: This is information about the customer and any loyalty programs she or he participates to
  • Region ID: 8
    Annotation: Merchant Details
    Description: This is additional information about the merchant
  • Region ID: 9
    Annotation: Transaction Type
    Description: This is information about the transaction
  • Region ID: 10
    Annotation: Transaction Details
    Description: This contains information about the transaction with the payment card processor. It can include transaction ID, the card type and number, timestamp, authorization code, etc.
  • Region ID: 11
    Annotation: Transaction Amounts
    Description: This summarizes the amounts for the transaction with the payment card processor
  • Region ID: 12
    Annotation: Transaction Status
    Description: This is the status of the transaction – i.e., Approved or Declined
  • Region ID: 13
    Annotation: Transaction Info
    Description: Those are technical details about the transaction
  • Region ID: 14
    Annotation: Copy Owner
    Description: This is information about the ownership of the receipt. Usually, this is Merchant or Customer
  • Region ID: 15, 16, and 17
    Annotation: Additional Details
    Description: Those can be various things like return policies, disclaimers, advertisement, surveys, notes, and so on. In this example, we have 15 as Return Policy, 16 as Survey and 17 as Additional Notes

When you think about it, the above areas will be the ones that your eyesd will immediately look to find information. For example, if you want to know what store the receipt was from, you will directly look at the top where the logo should be (Region #1); if you want to know what the total amount is, your eyes will steer towards the receipt summary (Region #6) and so on. Majority of us will follow a similar approach for separating the data because it is something that we do every day in our minds.

Few things to note about the annotations above. First, not every receipt will have all the information from above. Some receipts will have more and some less. Second, annotations evolve. After annotating a certain number of receipts, you start building a pattern and make fewer changes the more you annotate. However, after some time, you may discover that the patterns you developed need to be updated. A straightforward example is a better name for the annotation. If this happens, you need to go back and change the names. Third, there is no standard way to name those annotations. You and I will undoubtedly have different names for the same thing.

Now, let’s write a few requirements from this receipt example.

  1. The first thing we did is to draw the rectangular regions that we want to annotate. And this is our first and simplest requirement.
  2. The second thing we did is to annotate the rectangular region. When we create the annotation, we should be able to add additional information like description of the annotation
  3. The third thing we want is to be able to update annotation information retrospectively.

Those are good as a beginning. But to provide more context and backup our requirements, it will be useful to think about how those annotations will be used, i.e., define our use cases. I kind of hinted to those above.

Use Case #1: Logo Recognition

Let’s say; you are developing classification application that is used to recognize the store the receipt if from. You can easily do this by looking at the store logo only and develop a machine learning algorithm that returns the name of the store by recognizing the logo. For this, the only region you will need is Region 1 with the logo. Thus, you can just cut this region from the receipt and teach your algorithm only on the logo. That way you minimize the noise from the rest of the receipt and your algorithm can have better accuracy.

Use Case #2: Receipt Amount Extraction

If your application needs to extract the summary amounts from the receipt, you can concentrate on Region 6. That region contains all the information you will need. Few things you can do with this region are:

  • Binarize the area
  • Straighten the text
  • OCR the text
  • Analyze the extracted text (not an image related task anymore:))

This use case is applicable for any other are you annotated on the receipt. It doesn’t matter whether you want to obtain the credit card number or the timestamp; the approach will be the same.

Nested Annotations

Now, let’s look at another way to annotate the same receipt.

Annotated Receipt

If your application needs to determine what are your shopping habbits based on geography, you will need to extract detailed information about the store location. Thus, you will want to annotate the receipt as above to know which part is the street address, which is the city, etc. But those regions are all nested in Region 2 from our first annotation pass. It will be useful to have both types of annotations and use them for different use cases.

So, the requirements for the tool will be:

That is also very relevant in the next example, where we have areas with buildings but also want to annotate a single building.

Annotating Cityscapes

Annotating landscapes, cityscapes or other images with real objects is very similar to the receipt annotation. However, real objects rarely have regular shapes in pictures. Here is an example from a picture I took in Tokyo some time ago.

Annotated Cityscape

In this example, I have annotated only a few of the objects: two buildings (1 and 2), a crane (3), soccer field (4) and a tree (5). The requirements for annotating landscapes are not too different from the requirements for annotating documents. There is just one more thing we need to add to the tool to support real object tagging:

There are many use cases that you can develop for real-object recognition, and for that, versatile annotation capabilities will be important in any tool.

Additional Requirements for Annotations

All requirements that I have listed above are specific to the objects or areas in the pictures. However, we need to have the ability to add meta information to the whole picture. Well, you may think we already have a way to do that! We can use the EXIF data. The EXIF data is helpful, and it is automatically populated by the camera or the editing tool. However, it has limited capabilities for free-form meta-information because its fields are standardized.

For example, if you want to capture information who annotated the image last and at what time, you cannot use the EXIF fields for that. You can repurpose some EXIF fields, but you will lose the the original information. What we need is a simple way to create key-value metadata for the image. Of course, having the ability to see the EXIF information would be a helpful feature, although maybe not a high priority one.

With all that, I believe we have enough requirements to start working on tool design. If you are curious to follow the development or participate in it, you can head over to the Image Annotator Github project. The next thing we need to do is to do some design work. That includes UI design, back-end design, and data model.

Recently, I started looking for an image annotation tool that we can use to annotate a few thousand images. There are quite a few that you can choose from. Starting with paid ones like Labelbox and Dataturks, to free and open source ones like coco-annotator, imglab and labelimg. The paid ones have also a free tier that is limited but can give you a sense of their capabilities.

Each one of those tools can be used to create an image dataset, but without going into details in each one of them, here are the things we were looking to have to consider an image annotation tool fit for our needs:

  • First, and foremost – usability. I like nice UI, but it shouldn’t take too much time and clicks to get to the core functionality of the tool – i.e., annotating images. Navigating within the annotator, creating projects and adding images should be smooth and easy. Users shouldn’t be tech savvy to set up and start using the annotation tool.
  • Second is security. I wouldn’t put this one so high on the list if I didn’t need to deal with sensitive data. Unfortunately, even some of the commercially available annotation tools neglected basic security features like encryption. Annotating healthcare images like x-rays or financial documents on such platforms will be out of the question. Also, detailed privacy and data handling policies are crucial for handling sensitive images.
  • The third is scalability. The stand-alone annotation tools are limited to the resource of the machine they are running on, but the hosted ones lack detailed information about the limits they impose on the various plans. In addition to the number of projects and users, which are most often quoted in the commercial plans, the number of images per dataset or the total amount of storage available will be good to know.
  • Fourth is versatility. And, with versatility, I don’t mean only the ways one annotates images (polygons vs. rectangles for example) but also export formats like COCO, PASCAL VOC, YOLO, etc.; ability to choose different storage backends like files, AWS S3, Azure Storage or Dropbox and so on. Like everything else in technology, there is no single standard, and a good image annotation tool should satisfy the needs of different audiences.
  • The fifth is collaboration. An image annotator should allow many (if not thousands of) people to collaborate on the project. Datasets like PASCAL VOC or COCO consists of hundreds of thousands of images and millions of annotations, work that is beyond the scale of a single person or small team. Enabling lightweight access to external collaborators is crucial to the success of such a tool.

OK! Let’s be a little bit more concrete about what requirements I have for an image annotation tool.

Image Annotation Tools Usability

Before we go into the usability requirements, let’s look at the typical steps a user must go through to start annotating images:

  1. Select the tool or the service
  2. Install the tool or sign up to the service
  3. Create a project or data set
  4. Chose data storage location (if applicable)
  5. Determine users and user access (if applicable)
  6. Upload image
  7. Annotate image

To select the tool and the service, users need to be clear about what they are getting. Providing a good description of the features the annotation tool offers is crucial for the successful selection. Some developers rely on trials, but I would like to save my time for registering and installation if I know upfront the tool will not work for me.

If the tool is a stand-alone tool, its installation should not be a hassle. A lot of the open source tools rely on the tech savviness of their users, which can be an obstacle. Besides, I am reluctant installing something on my machine if it will turn out not what I need. Trials and easy removal instructions are crucial.

One of the biggest usability issues I saw in the tools are the convoluted flows for creating projects and datasets. Either bugs or unclear flows resulted in a lot of frustration when trying out some of the tools. For me it is simple – it should follow the ages old concept of files and folders.

Being clear where the data is stored and how to access it or get it out of there is important. For some of the tools I have tested, it took a while to understand where the data is; others used some proprietary structures and (surprisingly) no export capabilities.

Adding users and determining access is always a hassle (and not only in image annotation tools). Still, there should be a clear workflow for doing that as well as opening the dataset to the public if needed.

Although I may have an existing dataset, I may want to add new images to it – either one by one or in bulk. This should be one of the most prominent flows in the annotation tool. At any point in the UI I should be able to easily upload a single or multiple files.

For me, the annotation UI should mimic the UI of the traditional image editing software like Adobe Photoshop. You have menus on the top, tools on the left, working are in the middle and properties on the right. Well, it may be boring or not modern but it is familiar and intuitive.

Securing Annotated Images

We deal with scanned financial documents that can contain highly sensitive information like names, addresses, some times credit card details, account numbers or even social security numbers. Some of our customers would like to have a tool that allows them to annotate medical images like x-rays – those images can also contain personal information in their metadata (if, for example DICOM format is used).

Unless the annotation tool is a standalone tool that people can install on their local machine, using secure HTTPS is a no-brainer and the least you can do from security point of view (surprisingly some of the SaaS services lacked even here). However, security goes far beyond that. Things that should be added are:

  • Encrypting the storage where the annotated images are stored. Hosted or self-managed keys should be allowed.
  • Proper authentication mechanisms should be added. Multi-Factor-Authentication should be used for higher security.
  • Good Role Based Access Control (RBAC) should be implemented. For example some people should be able to just view the annotated images, while others to annotate and edit those.
  • Change logs should be kept as part of the application. For example, will be important to know who certain annotation and whether it was correct or not.

Scalability for Image Annotators

A good dataset can contain hundreds of thousands of images – the COCO dataset for 2017 has 118K images in its training dataset. Depending on the quality of the images, the storage needed to store those can vary from 10s of GB to 100s of GB to PB and more. Having an ability to grow the storage is essential to the success of an image annotation tool.

On the users side, a dataset of hundred thousand images may require hundreds of people to annotate. Being able to support large userbase without a huge impact on the cost is also important (hence, the user-based licensing may not be the best option for such a SaaS offering because a single user may annotate only 2-3 images from the whole dataset).

The back-end or APIs handling the annotation of the images should also be able to scale to the number of images and users without problems.

Versatile Export Options for Annotated Images

Rarely the image annotation tool is tightly coupled with the machine learning system that will use the images. Also, the same annotations can be used by various teams using different systems to create the machine learning models. Clear explanation of the format used to store the annotations is a must-have but also the ability to export the annotations in common formats will be essential for the success and usefulness of the tool.

The word “export” hear may mean different things. It doesn’t always need to be download the images and annotations in the desired format but simply saving the annotations in this format.

I would start with defining a versatile format for storing the image annotations and then offer different “export” options, whether for download or just conversion in the original storage.

Collaborating While Annotating Images

Having a single person create an image dataset with hundreds of thousands of images is unrealistic. Such a task requires the collaboration of many people who can be spread around the world. Having the ability to not only give them access to annotate the images but also to comment and give suggestions to already existing annotations is a feature that should be high on the priority list.

Annotations, like software, are not free of bugs. Hence, the image annotation tool should allow for collaboration similar to what modern software development tools enable. This may not be V1 feature but should certainly come soon after.

Now, that I have a good idea what we would like to have from an image annotation tool, it is time to think of how to implement one that incorporates the above mentioned functionality. In the next post, I will look at what we would like to annotate and how to approach the data model for annotations.