Blog

Friendly face on parachute

How often the following happens to you? You write your client code, you call an API, and receive a 404 Not found response. You start investigating the issue in your code; change a line here or there; spend hours troubleshooting just to find out that the issue is on the server-side, and you can’t do anything about it. Well, welcome to the microservices world! A common mistake I often see developers make is returning an improper response code or passing through the response code from another service.

Let’s see how we can avoid this. But first, a crash course on modern applications implemented with microservices and HTTP status response codes.

How Modern Microservices Applications Work?

I will try to avoid going deep into the philosophical reasons why we need microservices and the benefits (or disadvantages) of using them. This is not the point of this post.

We will start with a simple picture.

Microservices ApplicationAs you can see in the picture, we have a User that interacts with the Client Application that calls Microservice #1 to retrieve some information from the server (aka the cloud 🙂). The Client Application may need to call multiple (micro)services to retrieve all the information the User needs. Still, the part we will concentrate on is that Microservice #1 itself can call other services (Microservice #2 in this simple example) on the backend to perform its business logic. In a complex application (especially if not well architected), the chain of service calls may go well beyond two. But let’s stick with two for now. Also, let’s assume that Microservice #1 and Microservice #2 use REST, and their responses use the HTTP response status codes.

A basic call flow can be something like this. I also include the appropriate HTTP status response codes in each step.

  1. The User clicks on a button in the Client Application.
  2. The Client Application makes an HTTP request to Microservice #1.
  3. Microservice #1 needs additional business logic to complete the request and make an HTTP call to Microservice #2.
  4. Microservice #2 performs the additional business logic and responds to Microservice #1 using a 200 OK response code.
  5. Microservice #1 completes the business logic and responds to the Client Application with a 200 OK response code.
  6. The Client Application performs the action that is attached to the button, and the User is happy.

This is the so-called happy path. Everybody expects the flow to be executed as described above. If everything goes as planned, we don’t need to think anymore and implement the functionality behind the next button. Unfortunately, things often don’t go as planned.

What Can Go Wrong?

Many things! Or at a minimum, the following:

  1. The Client Application fails because of a bug before it even calls Microservice #1.
  2. The Client Application sends invalid input when calling Microservice #1.
  3. Microservice #1 fails before calling Microservice #2.
  4. Microservice #1 sends invalid input when calling Microservice #2.
  5. Microservice #2 fails while performing its business logic.
  6. Microservice #1 fails after calling Microservice #2.
  7. The Client Application fails after Microservice #1 responds.

For those cases (non-happy path? or maybe sad-path? 😉 ) the designers of the HTTP protocol wisely specified two separate sets of response codes:

The guidance for those is quite simple:

  • Client errors should be returned if the client did something wrong. In such cases, the client can change the parameters of the request and fix the issue. The important thing to remember is that the client can fix the issue without any changes on the server-side.
    A typical example is the famous 404 Not found error. If you (the user) mistype the URL path in the browser address bar, the browser (the client application) will request the wrong resource from the server (Microservice #1 in this case). The server (Microservice #1) will respond with a 404 Not found error to the browser (the client application) and the browser will show you “Oops, we couldn’t find the page” message. Well, in the past the browser just showed you the 404 Not found error but we learned a long time ago that this is not user-friendly (You see where I am going with this, right?).
  • Server errors should be returned if the issue occurred on the server-side and the client (and the user) cannot do anything to fix it.
    A simple example is a wrong connection string in the service configuration (Microservice #1 in our case). If the connection string used to configure Microservice #1 with the endpoint and credentials for Microservice #2 is wrong, the client application and the user cannot do anything to fix it. The most appropriate error to return in this case would be 500 Internal server error.

Pretty simple and logical, right? Though, one thing, we as engineers often forget, is who the client and who the server is.

So, Who Is the Client and Who Is the Server?

First, the client and server are two system components that interact directly with each other (think, no intermediaries). If we take the picture from above and change the labels of the arrows, it becomes pretty obvious.

Microservices Application Clients and Servers

We have three clients and three servers:

  • The user is a client of the client application, and the client application is a server for the user.
  • The client application is a client of Microservice #1, and Microservice #1 is a server for the client application.
  • Microservice #1 is a client of Microservice #2, and Microservice #2 is a server for Microservice #1.

Having this picture in mind, the engineers implementing each one of the microservices should think about the most appropriate response code for their immediate client using the guidelines above. It is better if we use examples to explain what response codes each service should return in different situations.

What HTTP Response Codes Should Microservices Return?

A few days ago I was discussing the following situation with one of our engineers. Our service, Azure Container Service (ACR), has a security feature allowing customers to encrypt their container images using customer-managed keys (CMK). For this feature to work, customers need to upload a key in Azure Key Vault (AKV). When the Docker client tries to pull an image, ACR retrieves the key from AKV, decrypts the image, and sends it back to the Docker client. (BTW, I know that ACR and AKV are not microservices 🙂 ) Here is a visual:

Docker pull encrypted image from ACR

In the happy-path scenario, everything works as expected. However, a customer submitted a support request complaining that he is not able to pull his images from ACR. When he tries to pull an image using the Docker client, he receives a 404 Not found error, but when he checks in the Azure Portal, he is able to see the image in the list.

Because the customer couldn’t figure it out by himself, he submitted a support request. The support engineer was also not able to figure out the issue, and had to escalate to the product group. It turned out that the customer deleted the Key Vault and ACR was not able to retrieve the key to decrypt the image. However, the implemented flow looked like this:

  1. Docker client requests an image from ACR.
  2. ACR sees that the image is encrypted and requests the key from the Key Vault.
  3. The Azure Key Vault service looks up the key and figures out that the key (or the whole Key Vault) is missing.
  4. Azure Key Vault returns 404 Not found to ACR for the key ACR tries to access.
  5. ACR passes through the 404 Not found error to the Docker client.
  6. Docker client shows a message to the user that the image cannot be found.

The end result: everybody is confused! Why?

Where the Confusion Comes From?

The investigation chain goes from left to right: Docker client –> ACR –> AKV. Both the customer and the support engineer were concentrated on figuring out why the image is missing in ACR. They were looking only at the Docker client –> ACR part of the chain. The customer’s assumption was that the Docker client is doing something wrong, i.e. requesting the wrong image. This would be the correct assumption because 404 Not found is a client error telling the client that is requesting something that doesn’t exist. Hence, the customer checked the portal and when he saw the image in the list, he was puzzled. The next assumption is that something is wrong on the ACR side. Here is where the customer decided to submit a support request for somebody to check if the data in ACR is corrupted. The support engineer checked the ACR backend and all the data was in sync.

This is a great example where the wrong HTTP response code can send the whole investigation into a rabbit hole. To avoid that, here is the guidance! Microservices should return response codes that are relevant to the business logic they implement and ones that help the client take appropriate actions. “Well”, you will say: “Isn’t that the whole point of HTTP status response codes?” It is! But for whatever reasons, we continue to break this rule. The key words in the above guidance are “the business logic they implement”, not the business logic of the services they call. (By the way, this is the same with exceptions. You don’t catch generic Exception, you catch SpecificException. You don’t pass through exceptions, you catch them and wrap them in a useful way for the calling code).

Business Logic and Friendly HTTP Response Codes

Think about the business logic of each one of the services above!

One way to decide which HTTP response code to return is to think about the resource your microservice is handling. ACR is the service responsible for handling the container images. The business logic that ACR implements should provide status codes relavant to the “business” of images. Azure Key Vault implement business logic that handles keys, secrets, and certificates (not images). Key Vault should return status codes that are relevant to the keys, secrets, and certificates. Azure Key Vault is a downstream service and cannot know what the key is used for, hence cannot provide details to the upstream client (Docker) what the error is. It is responsibility of the ACR to provide the approapriate status code to the upstream client.

Here is how the flow in the above scenario should be implemented:

  1. Docker client requests an image from ACR.
  2. ACR sees that the image is encrypted and requests the key from the Key Vault.
  3. The Azure Key Vault service looks up the key and figures out that the key (or the whole Key Vault) is missing.
  4. Azure Key Vault returns 404 Not found to ACR for the key ACR tries to access.
  5. ACR handles the 404 Not found from Azure Key Vault but wraps it in a error that is relevant to the requested image.
  6. Instead 404 Not found, ACR returns 500 Internal server error with a message clarifying the issue.
  7. Docker client shows a message to the user that it cannot pull the image because of an issue on the server.

The Q&A Approach

Another way that you can use to decide what response code to return is to take the Questions-and-Answers approach and build a simple IF-THEN logic (aka. decition tree). Here is how this can work for our example:

  • Docker: Pull image from ACR
    • ACR: Q: Is the image ivailable?
      • A: Yes
        (Note to myself: Requesting the image cannot be a client error anymore.)

        • Q: Is the image encrypted?
          • A: Yes
            • ACR: Request the key from Key Vault
              • AKV: Q: Is the key available?
                • A: Yes
                  • AKV: Return the key to ACR
                • A: No
                  • AKV: Return 404 [key] Not found error
            • ACR: Q: Did I get a key?
              • A: Yes
                • ACR: Decrypt the image
                • ACR: Return 200 OK with the image payload
              • A: No (I got 404 [key] Not found)
                • ACR: I cannot decrypt the image
                  (Note to myself: There is nothing the client did wrong! It is all the server fault)
                • ACR: Return 500 Internal server error “I cannot decrypt the image”
          • A: No (image is not encrypted)
            • ACR: Return 200 OK with the image payload
      • A: No (image does not exist)
        • ACR: Return 404 [image] Not found error

Note that the above flow is simplified. For example, in a real implementation, you may need to check if the client is authenticated and authorized to pull the image. Nevertheless, the concept is the same – you will just need to have more Q&As.

Summary

As you can see, it is important to be careful what HTTP response codes you return from your microservices. If you return the wrong message, you may end up with more work than you expect. Here are the main points that is worth remembering:

  • Return 400 errors only if the client can do something to fix the issue. If the client cannot do anything to fix it, 500 errors are the only appropriate ones.
  • Do not pass through the response codes you receive from upstream services. Handle each response from upstream services and wrap it according to the business logic you are implementing.
  • When implementing your services, think about the resource you are handling in those services. Return HTTP status response codes that are relevant to the resource you are handling.
  • Use the Q&A approach to decide what is the appropriate response code to return for your service and the resource that is requested by the client.

By using those guidelines, your microservices will become more friendly and easier to troubleshoot.

Featured image by Nick Page on Unsplash

In the last few months, I started seeing more and more customers using Azure Container Registry (or ACR) for storing their Helm charts. However, many of them are confused about how to properly push and use the charts stored in ACR. So, in this post, I will document a few things that need the most clarifications. Let’s start with some definitions!

Helm 2 and Helm 3 – what are those?

Before we even start!

Helm 2 is NOT supported and you should not use it! Period! If you need more details just read Helm’s blog post Helm 2 and the Charts Project Are Now Unsupported from Fri, Nov 13, 2020.

A nice date to choose for that announcement 🙂

OK, really – what are Helm 2 and Helm 3? When somebody says Helm 2 or Helm 3, they most often mean the version of the Helm CLI (i.e., Command Line Interface). The easiest way to check what version of the Helm CLI you have is to type:

$ helm version

in your Terminal. If the version is v2.x.x , then you have Helm (CLI) 2; if the version is v3.x.x then, you have Helm (CLI) 3. But, it is not that simple! You should also consider the API version. The API version is the version that you specify at the top of your chart (i.e. Chart.yaml) – you can read more about it in Helm Charts documentation. Here is a table that can come in handy for you:

wdt_ID apiVersion Helm 2 CLI Support Helm 3 CLI Support
1 v1 Yes Yes
2 v2 No Yes

What this table tells you is that Helm 2 CLI supports apiVersion V1, while Helm 3 CLI supports apiVersion V1 and V2. You should check the Helm Charts documentation linked above if you need more details about the differences, but the important thing to remember here is that Helm 3 CLI supports old charts, and (once again) there is no reason for you to use Helm 2.

We’ve cleared (I hope) the confusion around Helm 2 and Helm 3. Let’s see how ACR handles the Helm charts. For each one of those experiences, I will walk you step-by-step.

ACR and Helm 2

Azure Container Registry allows you to store Helm charts using the Helm 2 way (What? I will explain in a bit). However:

Helm 2 is NOT supported and you should not use it!

Now that you have been warned (twice! or was it three times?) let’s see how ACR handles Helm 2 charts. To avoid any ambiguity, I will use Helm CLI v2.17.0 for this exercise. At the time of this writing, it is the last published version of Helm 2.

$ helm version
Client: &version.Version{SemVer:"v2.17.0", GitCommit:"a690bad98af45b015bd3da1a41f6218b1a451dbe", GitTreeState:"clean"}

Initializing Helm and the Repository List

If you have a brand new installation of the Helm 2 CLI, you should initialize Helm and add the ACR to your repository list. You start with:

$ helm init
$ helm repo list
NAME     URL
stable   https://charts.helm.sh/stable
local    http://127.0.0.1:8879/charts

to initialize Helm and see the list of available repositories. Then you can add your ACR to the list by typing:

$ helm repo add --username <acr_username> --password <acr_password> <repo_name> https://<acr_login_server>/helm/v1/repo

For me, this looked like this:

$ helm repo add --username <myacr_username> --password <myacr_password> acrrepo https://tsmacrtestwus2acrhelm.azurecr.io/helm/v1/repo

Here is something very important: you must use the /helm/v1/repo path! If you do not specify the path you will see either 404: Not found error (for example, if you use the root URL without a path) or 403: Forbidden error (if you decide that you want to rename the repo part to something else).

I also need to make a side note here because the authentication can be a bit tricky. The following section applies to both Helm 2 and Helm 3.

Signing In To ACR Using the Helm (any version) CLI

Before you push and pull charts from ACR, you need to sign in. There are few different options that you can use to sign in to ACR using the CLI:

  • Using ACR Admin user (not recommended)
    If you have ACR Admin user enabled, you can use the Admin user username and password to sign in to ACR by simply specifying the --username and --password parameters for the Helm command.
  • Using a Service Principal (SP)
    If you need to push and pull charts using automation, you have most probably already set up a service principal for that. You can authenticate using the SP credentials by passing the app ID in the  --username and the client secret in the --password parameters for the Helm command. Make sure you assign the appropriate role to the service principal to allow access to your registry.
  • Using your own (user) credentials
    This one is the tricky one, and it is described in the ACR docs in az acr login with –expose-token section of the Authentication overview article. For this one, you must use the Azure CLI to obtain the token. Here are the steps:

    • Use the Azure CLI to sign in to Azure using your own credentials:
      $ az login

      This will pop up a browser window or give you an URL with a special code to use.

    • Next, sign in to ACR using the Azure CLI and add the --expose-token parameter:
      $ az acr login --name <acr_name_or_login_server> --expose-token

      This will sign you into ACR and will print an access token that you can use to sign in with other tools.

    • Last, you can sign in using the Helm CLI by passing a GUID-like string with zeros only (exactly this string 00000000-0000-0000-0000-000000000000) in the  --username parameter and the access token in the --password parameter. Here is how the command to add the Helm repository will look like:
      $ helm repo add --username "00000000-0000-0000-0000-000000000000" --password "eyJhbGciOiJSUzI1NiIs[...]24V7wA" <repo_name> https://<acr_login_server>/helm/v1/repo

Creating and Packaging Charts with Helm 2 CLI

Helm 2 doesn’t have out-of-the-box experience for pushing charts to a remote chart registry. You may wrongly assume that the helm-push plugin is the one that does that, but you will be wrong. This plugin will only allow you to push charts to Chartmuseum (although I can use it to try to push to any repo but will fail – a topic for another story). Helm’s guidance on how chart repositories should work is described in the documentation (… and this is the Helm 2 way that I mentioned above):

  • According to Chart Repositories article in Helm documentation, the repository is a simple web server that serves the index.yaml file that points to the chart TAR archives. The TAR archives can be served by the same web server or from other locations like Azure Storage.
  • In Store charts in your chart repository they describe the process to generate the index.yaml file and how to upload the necessary artifacts to static storage to serve them.

Disclaimer: the term Helm 2 way is my own term based on my interpretation of how things work. It allows me to refer to the two different approaches charts are saved. It is not an industry term not something that Helm refers to or uses.

I have created a simple chart called helm-test-chart-v2 on my local machine to test the push. Here is the output from the commands:

$ $ helm create helm-test-chart-v2
Creating helm-test-chart-v2

$ ls -al ./helm-test-chart-v2/
total 28
drwxr-xr-x 4 azurevmuser azurevmuser 4096 Aug 16 16:44 .
drwxr-xr-x 6 azurevmuser azurevmuser 4096 Aug 17 16:29 ..
-rw-r--r-- 1 azurevmuser azurevmuser 342 Aug 16 16:44 .helmignore
-rw-r--r-- 1 azurevmuser azurevmuser 114 Aug 16 16:44 Chart.yaml
drwxr-xr-x 2 azurevmuser azurevmuser 4096 Aug 16 16:44 charts
drwxr-xr-x 3 azurevmuser azurevmuser 4096 Aug 16 16:44 templates
-rw-r--r-- 1 azurevmuser azurevmuser 1519 Aug 16 16:44 values.yaml

$ helm package ./helm-test-chart-v2/
Successfully packaged chart and saved it to: /home/azurevmuser/helm-test-chart-v2-0.1.0.tgz

$ ls -al
total 48
drwxr-xr-x 6 azurevmuser azurevmuser 4096 Aug 17 16:31 .
drwxr-xr-x 3 root root 4096 Aug 14 14:12 ..
-rw------- 1 azurevmuser azurevmuser 780 Aug 15 22:48 .bash_history
-rw-r--r-- 1 azurevmuser azurevmuser 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 azurevmuser azurevmuser 3771 Feb 25 2020 .bashrc
drwx------ 2 azurevmuser azurevmuser 4096 Aug 14 14:15 .cache
drwxr-xr-x 6 azurevmuser azurevmuser 4096 Aug 15 21:46 .helm
-rw-r--r-- 1 azurevmuser azurevmuser 807 Feb 25 2020 .profile
drwx------ 2 azurevmuser azurevmuser 4096 Aug 14 14:12 .ssh
-rw-r--r-- 1 azurevmuser azurevmuser 0 Aug 14 14:18 .sudo_as_admin_successful
-rw------- 1 azurevmuser azurevmuser 1559 Aug 14 14:26 .viminfo
drwxr-xr-x 4 azurevmuser azurevmuser 4096 Aug 16 16:44 helm-test-chart-v2
-rw-rw-r-- 1 azurevmuser azurevmuser 3269 Aug 17 16:31 helm-test-chart-v2-0.1.0.tgz

Because Helm 2 doesn’t have a push chart functionality, the implementation is left up to the vendors. ACR has provided proprietary implementation (already deprecated, which is another reason to not use Helm 2) of the push chart functionality that is built into the ACR CLI.

Pushing and Pulling Charts from ACR Using Azure CLI (Helm 2)

Let’s take a look at how you can push Helm 2 charts to ACR using the ACR CLI. First, you need to sign in to Azure, and then to your ACR. Yes, this is correct; you need to use two different commands to sign into the ACR. Here is how this looks like for my ACR registry:

$ az login
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code AABBCCDDE to authenticate.
[
  {
    "cloudName": "AzureCloud",
    "homeTenantId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
    "id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
    "isDefault": true,
    "managedByTenants": [],
    "name": "ToddySM Sandbox",
    "state": "Enabled",
    "tenantId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
    "user": {
        "name": "toddysm_XXXXXXXX@outlook.com",
        "type": "user"
    }
  }
]
$ az acr login --name tsmacrtestwus2acrhelm.azurecr.io
The login server endpoint suffix '.azurecr.io' is automatically omitted.
You may want to use 'az acr login -n tsmacrtestwus2acrhelm --expose-token' to get an access token, which does not require Docker to be installed.
An error occurred: DOCKER_COMMAND_ERROR
Please verify if Docker client is installed and running.

Well, I do not have Docker running, but this is OK – you don’t need Docker installed for pushing the Helm chart. Though, it may be confusing because it leaves the impression that you may not be signed in to the ACR registry.

We will push the chart that we packaged already. Pushing it is done with the (deprecated) built-in command in ACR CLI. Here is the output:

$ az acr helm push --name tsmacrtestwus2acrhelm.azurecr.io helm-test-chart-v2-0.1.0.tgz
This command is implicitly deprecated because command group 'acr helm' is deprecated and will be removed in a future release. Use 'helm v3' instead.
The login server endpoint suffix '.azurecr.io' is automatically omitted.
{
  "saved": true
}

This seems to be successful, and I have a Helm chart pushed to ACR using the Helm 2 way (i.e. using the proprietary and deprecated ACR CLI implementation). The problem here is that it is hard to verify that the chart is pushed to the ACR registry. If you go to the portal, you will not see the repository that contains the chart. Here is a screenshot of my registry view in the Azure portal after I pushed the chart:

Azure Portal Not Listing Helm 2 ChartsAs you can see, the Helm 2 chart repository doesn’t appear in the list of repositories in the Azure portal, and you will not be able to browse the charts in that repository using the Azure portal. However, if you use the Helm command to search for the chart, the result will include the ACR repository. Here is the output from the command in my environment:

$ helm search helm-test-chart-v2
NAME                           CHART VERSION        APP VERSION        DESCRIPTION
acrrepo/helm-test-chart-v2     0.1.0                1.0                A Helm chart for Kubernetes
local/helm-test-chart-v2       0.1.0                1.0                A Helm chart for Kubernetes

Summary of the ACR and Helm 2 Experience

To summarize the ACR and Helm 2 experience, here are the main takeaways:

  • First, you should not use Helm 2 CLI and the proprietary ACR CLI implementation for working with Helm charts!
  • There is no push functionality for charts in the Helm 2 client and each vendor is implementing their own CLI for pushing charts to the remote repositories.
  • When you add ACR repository using the Helm 2 CLI you should use the following URL format https://<acr_login_server>/helm/v1/repo
  • If you push a chart to ACR using the ACR CLI implementation you will not see the chart in Azure Portal. The only way to verify that the chart is pushed to the ACR repository is to use the helm search command.

ACR and Helm 3

Once again, to avoid any ambiguity, I will use Helm CLI v3.6.2 for this exercise. Here is the complete version string:

PS C:> helm version
version.BuildInfo{Version:"v3.6.2", GitCommit:"ee407bdf364942bcb8e8c665f82e15aa28009b71", GitTreeState:"clean", GoVersion:"go1.16.5"}

Yes, I run this one in PowerShell terminal 🙂 And, of course, not in the root folder 😉 You can convert the commands to the corresponding Linux commands and prompts.

Let’s start with the basic thing!

Creating and Packaging Charts with Helm 3 CLI

There is absolutely no difference between the Helm 2 and Helm 3 experience for creating and packaging a chart. Here is the output:

PS C:> helm create helm-test-chart-v3
Creating helm-test-chart-v3

PS C:> ls .\helm-test-chart-v3\

    Directory: C:\Users\memladen\Documents\Development\Local\helm-test-chart-v3

Mode         LastWriteTime         Length         Name
----         -------------         ------         ----
d----    8/17/2021 9:42 PM                        charts
d----    8/17/2021 9:42 PM                        templates
-a---    8/17/2021 9:42 PM            349         .helmignore
-a---    8/17/2021 9:42 PM           1154         Chart.yaml
-a---    8/17/2021 9:42 PM           1885         values.yaml

PS C:> helm package .\helm-test-chart-v3\
Successfully packaged chart and saved it to: C:\Users\memladen\Documents\Development\Local\helm-test-chart-v3-0.1.0.tgz

PS C:> ls helm-test-*

    Directory: C:\Users\memladen\Documents\Development\Local

Mode         LastWriteTime         Length         Name
----         -------------         ------         ----
d----    8/17/2021 9:42 PM                        helm-test-chart-v3
-a---    8/17/2021 9:51 PM           3766         helm-test-chart-v3-0.1.0.tgz

From here on, though, things can get confusing! The reason is that you have two separate options to work with charts using Helm 3.

Using Helm 3 to Push and Pull Charts the Helm 2 Way

You can use Helm 3 to push the charts the same way you do that with Helm 2. First, you add the repo:

PS C:> helm repo add --username <myacr_username> --password <myacr_password> acrrepo https://tsmacrtestwus2acrhelm.azurecr.io/helm/v1/repo
"acrrepo" has been added to your repositories

PS C:> helm repo list
NAME         URL
microsoft    https://microsoft.github.io/charts/repo
acrrepo      https://tsmacrtestwus2acrhelm.azurecr.io/helm/v1/repo

Then, you can update the repositories and search for a chart:

PS C:> helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "microsoft" chart repository
...Successfully got an update from the "acrrepo" chart repository
Update Complete. ⎈Happy Helming!⎈

PS C:> helm search repo helm-test-chart
NAME                          CHART VERSION     APP VERSION     DESCRIPTION
acrrepo/helm-test-chart-v2    0.1.0             1.0             A Helm chart for Kubernetes

Ha, look at that! I can see the chart that I pushed using the ACR CLI in the ACR and Helm 2 section above – notice the chart name and the version. Also, notice that the Helm 3 search command has a bit different syntax – it wants you to clarify what you want to search (repo in our case).

I can use the ACR CLI to push the new chart that I just created using the Helm 3 CLI (after signing in to Azure):

PS C:> az acr helm push --name tsmacrtestwus2acrhelm.azurecr.io .\helm-test-chart-v3-0.1.0.tgz
This command is implicitly deprecated because command group 'acr helm' is deprecated and will be removed in a future release. Use 'helm v3' instead.
The login server endpoint suffix '.azurecr.io' is automatically omitted.
{
  "saved": true
}

By doing this, I have pushed the V3 chart to ACR and can pull it from there but, remember, this is the Helm 2 Way and the following are still true:

  • You will not see the chart in Azure Portal.
  • The only way to verify that the chart is pushed to the ACR repository is to use the helm search command.

Here is the result of the search command after updating the repositories:

PS C:> helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "microsoft" chart repository
...Successfully got an update from the "acrrepo" chart repository
Update Complete. ⎈Happy Helming!⎈

PS C:> helm search repo helm-test-chart
NAME                           CHART VERSION     APP VERSION     DESCRIPTION
acrrepo/helm-test-chart-v2     0.1.0             1.0             A Helm chart for Kubernetes
acrrepo/helm-test-chart-v3     0.1.0             1.16.0          A Helm chart for Kubernetes

You can see both charts, the one created with Helm 2 and the one created with Helm 3, available. This is understandable though because I pushed both charts the same way – by using the az acr helm command. Remember, though – both charts are stored in ACR using the Helm 2 way.

Using Helm 3 to Push and Pull Charts the OCI Way

Before proceeding, I changed the version property in the Chart.yaml to 0.2.0 to be able to differentiate between the charts I pushed. This is the same chart that I created in the previous section Creating and Packaging Charts with Helm 3 CLI.

You may have noticed that Helm 3 has a new chart command. This command allows you to (from the help text) “push, pull, tag, list, or remove Helm charts”. The subcommands under the chart command are experimental and you need to set the HELM_EXPERIMENTAL_OCI environment variable to be able to use them. Once you do that, you can save the chart. You can save the chart to the local registry cache with or without a registry FQDN (Fully Qualified Domain Name). Here are the commands:

PS C:> $env:HELM_EXPERIMENTAL_OCI=1

PS C:> helm chart save .\helm-test-chart-v3\ helm-test-chart-v3:0.2.0
ref:     helm-test-chart-v3:0.2.0
digest:  b6954fb0a696e1eb7de8ad95c59132157ebc061396230394523fed260293fb19
size:    3.7 KiB
name:    helm-test-chart-v3
version: 0.2.0
0.2.0: saved
PS C:> helm chart save .\helm-test-chart-v3\ tsmacrtestwus2acrhelm.azurecr.io/charts/helm-test-chart-v3:0.2.0
ref:     tsmacrtestwus2acrhelm.azurecr.io/charts/helm-test-chart-v3:0.2.0
digest:  ff5e9aea6d63d7be4bb53eb8fffacf12550a4bb687213a2edb07a21f6938d16e
size:    3.7 KiB
name:    helm-test-chart-v3
version: 0.2.0
0.2.0: saved

If you list the charts using the new chart command, you will see the following:

PS C:> helm chart list
REF                                                                  NAME                   VERSION     DIGEST     SIZE     CREATED
helm-test-chart-v3:0.2.0                                             helm-test-chart-v3     0.2.0       b6954fb    3.7 KiB  About a minute
tsmacrtestwus2acrhelm.azurecr.io/charts/helm-test-chart-v3:0.2.0     helm-test-chart-v3     0.2.0       ff5e9ae    3.7 KiB  About a minute

Few things to note here:

  • Both charts are saved in the local registry cache. Nothing is pushed yet to a remote registry.
  • You see only charts that are saved the OCI way. The charts saved the Helm 2 way are not listed using the helm chart list command.
  • The REF (or reference) for a chart can be truncated and you may not be able to see the full reference.

Let’s do one more thing! Let’s save the same chart with FQDN for the ACR as above but under a different repository. Here are the commands to save and list the charts:

PS C:> helm chart save .\helm-test-chart-v3\ tsmacrtestwus2acrhelm.azurecr.io/my-helm-charts/helm-test-chart-v3:0.2.0
ref: tsmacrtestwus2acrhelm.azurecr.io/my-helm-charts/helm-test-chart-v3:0.2.0
digest: daf106a05ad2fe075851a3ab80f037020565c75c5be06936179b882af1858e6a
size: 3.7 KiB
name: helm-test-chart-v3
version: 0.2.0
0.2.0: saved

PS C:> helm chart list
REF                                                                  NAME                   VERSION     DIGEST     SIZE     CREATED
helm-test-chart-v3:0.2.0                                             helm-test-chart-v3     0.2.0       b6954fb    3.7 KiB  11 minutes
tsmacrtestwus2acrhelm.azurecr.io/charts/helm-test-chart-v3:0.2.0     helm-test-chart-v3     0.2.0       ff5e9ae    3.7 KiB  11 minutes
tsmacrtestwus2acrhelm.azurecr.io/my-helm-charts/helm-test-chart...   helm-test-chart-v3     0.2.0       daf106a    3.7 KiB  About a minute

After doing this, we have three charts in the local registry:

  • helm-test-chart-v3:0.2.0 that is available only locally.
  • tsmacrtestwus2acrhelm.azurecr.io/charts/helm-test-chart-v3:0.2.0 that can be pushed to the remote ACR registry tsmacrtestwus2acrhelm.azurecr.io and saved in the charts repository.
  • and tsmacrtestwus2acrhelm.azurecr.io/my-helm-charts/helm-test-chart-v3:0.2.0 that can be pushed to the remote ACR registry tsmacrtestwus2acrhelm.azurecr.io and saved in the my-helm-charts repository.

Before we can push the charts to the ACR registry, we need to sign in using the following command:

PS C:> helm registry login tsmacrtestwus2acrhelm.azurecr.io --username <myacr_username> --password <myacr_password>

You can use any of the sign-in methods described in Signing in to ACR Using the Helm CLI section. And make sure you use your own ACR registry login server.

If we push the two charts that have the ACR FQDN, we will see them appear in the Azure portal UI. Here are the commands:

PS C:> helm chart push tsmacrtestwus2acrhelm.azurecr.io/charts/helm-test-chart-v3:0.2.0
The push refers to repository [tsmacrtestwus2acrhelm.azurecr.io/charts/helm-test-chart-v3]
ref: tsmacrtestwus2acrhelm.azurecr.io/charts/helm-test-chart-v3:0.2.0
digest: daf106a05ad2fe075851a3ab80f037020565c75c5be06936179b882af1858e6a
size: 3.7 KiB
name: helm-test-chart-v3
version: 0.2.0
0.2.0: pushed to remote (1 layer, 3.7 KiB total)

PS C:> helm chart push tsmacrtestwus2acrhelm.azurecr.io/my-helm-charts/helm-test-chart-v3:0.2.0
The push refers to repository [tsmacrtestwus2acrhelm.azurecr.io/my-helm-charts/helm-test-chart-v3]
ref: tsmacrtestwus2acrhelm.azurecr.io/my-helm-charts/helm-test-chart-v3:0.2.0
digest: daf106a05ad2fe075851a3ab80f037020565c75c5be06936179b882af1858e6a
size: 3.7 KiB
name: helm-test-chart-v3
version: 0.2.0
0.2.0: pushed to remote (1 layer, 3.7 KiB total)

And here is the result:

An important thing to note here is that:

  • Helm charts saved to ACR using the OCI way will appear in the Azure portal.

The approach here is a bit different than the Helm 2 way. You don’t need to package the chart into a TAR – saving the chart to the local registry is enough.

We need to do one last thing and we are ready to summarize the experience. Let’s use the helm search command to find our charts (of course using Helm 3). Here is the result of the search:

PS C:> helm search repo helm-test-chart 
NAME                           CHART VERSION     APP VERSION     DESCRIPTION 
acrrepo/helm-test-chart-v2     0.1.0             1.0             A Helm chart for Kubernetes 
acrrepo/helm-test-chart-v3     0.1.0             1.16.0          A Helm chart for Kubernetes

It yields the same result like the one we saw in Using Helm 3 to Push and Pull Charts the Helm 2 Way. The reason is that the helm search command doesn’t work for charts stored the OCI way. This is one limitation that the Helm team is working on fixing and is documented in Support for OCI registries in helm search #9983 issue on GitHub.

Summary of the ACR and Helm 3 Experience

To summarize the ACR and Helm 3 experience, here are the main takeaways:

  • First, you can use the Helm 3 CLI in conjunction with the az acr helm command to push and pull charts the Helm 2 way. Those charts will not appear in the Azure portal.
  • You can also use the Helm 3 CLI to (natively) push charts to ACR the OCI way. Those charts will appear in the Azure portal.
  • OCI features are experimental in the Helm 3 client and certain functionalities like helm search and helm repo do not work for charts saved and pushed the OCI way.

Conclusion

To wrap it up, when working with Helm charts and ACR (as well as other OCI compliant registries), you need to be careful which commands you use. As a general rule, always use the Helm 3 CLI and make a conscious decision whether you want to store the charts as OCI artifacts (the OCI way) or using the legacy Helm approach (the Helm 2 way). This should be a transition period and hopefully, at some point in the future, Helm will improve the support for OCI compliant charts and support the same scenarios that are currently enabled for legacy chart repositories.

Here is a summary table that gives a quick overview of what we described in this post.

wdt_ID Functionality Helm 2 CLI (legacy) Helm 3 (legacy) Helm 3 (OCI)
1 helm add repo Yes Yes No
2 helm search Yes Yes No
3 helm chart push No No Yes
4 helm chart list No No Yes
5 az acr helm push Yes Yes No
6 Chart appears in Azure portal No No Yes
7 Example chart helm-test-chart-v2 helm-test-chart-v3 helm-test-chart-v3
8 Example chart version 0.1.0 0.1.0 0.2.0

In the last two posts Configuring a hierarchy of IoT Edge Devices at Home Part 1 – Configuring the IT Proxy and Configuring a hierarchy of IoT Edge Devices at Home Part 2 – Configuring the Enterprise Network (IT) we have set up the proxy and the top layer of the hierarchical IoT Edge network. This post will describe how to set up and configure the first nested layer in the Purdue network architecture for manufacturing – the Business Planning and Logistics (IT) layer. So, let’s get going!

A lot of the steps to configure the IoT Edge runtime on the device are repeating from the previous post. Nevertheless, though, let’s go over them again.

Configuring the Network for Layer 4 Device

The tricky part with the Layer 4 device is that it should have no Internet access. The problem with that is that you will need Internet access to install the IoT Edge runtime. In a typical scenario, the Layer 4 device will come with the IoT Edge runtime pre-installed and you will only need to plug in the device and configure it to talk to its parent. For our scenario though, we need to have the device Internet-enabled while installing the IoT Edge and lock it down after that. Here is the initial dhcpcd.conf configuration for the Layer 4 device:

interface eth0
static ip_address=10.16.6.4/16
static routers=10.16.8.4
static domain_name_servers=1.1.1.1

The only difference in the network configuration is the IP address of the device. This configuration will allow us to download the necessary software to the device before we restrict its access.

While speaking of necessary software, let’s install the netfilter-persistent and iptables-persistent and have them ready for locking the device networking later on:

sudo DEBIAN_FRONTEND=noninteractive apt install -y netfilter-persistent iptables-persistent

Create the Azure Cloud Resources

We already have created the resource group and the IoT Hub resource in Azure. The only new resource that we need to create is the IoT Edge device for Layer 4. One difference here is that, unlike the Layer 5 device, the Layer 4 device will have a parent. And this parent will be the Layer 5 device. Use the following command to create the Layer 4 device

az iot hub device-identity create --device-id L4-edge-pi --edge-enabled --hub-name tsm-ioth-eus-test
az iot hub device-identity parent set --device-id L4-edge-pi --parent-device-id L5-edge-pi --hub-name tsm-ioth-eus-test
az iot hub device-identity connection-string show --device-id L5-edge-pi --hub-name tsm-ioth-eus-test

The first command creates the IoT Edge device for Layer 4. The second command sets the Layer 5 device as the parent for the Layer 4 device. And, the third command returns the connection string for the Layer 4 device that we can use for the IoT Edge runtime configuration.

Installing Azure IoT Edge Runtime on the Layer 4 Device

Surprisingly to me, it the time between posting my last post and starting this one, the Tutorial to Create a Hierarchy of IoT Edge Devices changed. I will still use the steps from my previous post to provide consistency.

Creating Certificates

We will start again with the creation of certificates – this time for the Layer 4 device. We already have the root and the intermediate certificates for the chain. The only certificate that we need to create is the Layer 4 device certificate. We will use the following command for that:

./certGen.sh create_edge_device_ca_certificate "l4_certificate"

You should see the new certificates in the <WORKDIR>/certs folder:

drwxrwxrwx 1 toddysm toddysm 4096 Apr 16 10:46 .
drwxrwxrwx 1 toddysm toddysm 4096 Apr 16 10:46 ..
...
-rwxrwxrwx 1 toddysm toddysm 5891 Apr 16 10:46 iot-edge-device-ca-l4_certificate-full-chain.cert.pem
-r-xr-xr-x 1 toddysm toddysm 1931 Apr 16 10:46 iot-edge-device-ca-l4_certificate.cert.pem
-rwxrwxrwx 1 toddysm toddysm 7240 Apr 16 10:46 iot-edge-device-ca-l4_certificate.cert.pfx
...

The private keys are in the <WORKDIR>/private folder together with the root and intermediate keys:

drwxrwxrwx 1 toddysm toddysm 4096 Apr 16 10:46 .
drwxrwxrwx 1 toddysm toddysm 4096 Apr 16 10:46 ..
-r-xr-xr-x 1 toddysm toddysm 3326 Apr  8 14:54 azure-iot-test-only.intermediate.key.pem
-r-xr-xr-x 1 toddysm toddysm 3326 Apr  8 14:54 azure-iot-test-only.root.ca.key.pem
-r-xr-xr-x 1 toddysm toddysm 3243 Apr 16 10:46 iot-edge-device-ca-l4_certificate.key.pem
...

As before, we need to upload the relevant certificates to the Layer 4 device. From the <WORKDIR> folder on the workstation use the following commands:

scp ./certs/azure-iot-test-only.root.ca.cert.pem pi@pi-l4:.
scp ./certs/iot-edge-device-ca-l4_certificate-full-chain.cert.pem pi@pi-l4:.
scp ./private/iot-edge-device-ca-l4_certificate.key.pem pi@pi-l4:.

Connect to the Layer 4 device with ssh pi@pi-l4 and install the root CA using:

sudo cp ~/azure-iot-test-only.root.ca.cert.pem /usr/local/share/ca-certificates/azure-iot-test-only.root.ca.cert.pem.crt
sudo update-ca-certificates

Verify that the cert is installed with:

ls /etc/ssl/certs/ | grep azure

We will also move the device certificate chain and the private key to the /var/secrets folder:

sudo mkdir /var/secrets
sudo mkdir /var/secrets/aziot
sudo mv iot-edge-device-ca-l4_certificate-full-chain.cert.pem /var/secrets/aziot/
sudo mv iot-edge-device-ca-l4_certificate.key.pem /var/secrets/aziot/

We are all set to install and configure the Azure IoT Edge runtime.

Configuring the IoT Edge Runtime

As mentioned before the steps are described in Install or uninstall Azure IoT Edge for Linux and here is the configuration that we need to use in /etc/aziot/config.tomlfile:

hostname = "10.16.6.4"

parent_hostname = "10.16.7.4"

trust_bundle_cert = "file:///etc/ssl/certs/azure-iot-test-only.root.ca.cert.pem.pem"

[provisioning]
source = "manual"
connection_string = "HostName=tsm-ioth-eus-test.azure-devices.net;DeviceId=L4-edge-pi;SharedAccessKey=s0m#VeRYcRYpT1c$tR1nG"

[agent]
name = "edgeAgent"
type = "docker"
[agent.config]
image = "10.16.7.4:<port>/azureiotedge-agent:1.2"

[edge_ca]
cert = "file:///var/secrets/aziot/iot-edge-device-ca-l4_certificate-full-chain.cert.pem"
pk = "file:///var/secrets/aziot/iot-edge-device-ca-l4_certificate.key.pem"

Note two differences in this configuration compared to the configuration for the Layer 5 device:

  1. There is a parent_hostname configuration that has the Layer 5 device’s IP address as a value.
  2. The agent image is pulled from the parent registry 10.16.7.4:<port>/azureiotedge-agent:1.2 instead from mcr.microsoft.com (don’t forget to remove/change the <port> in the configuration depending on how you set up the parent registry).

Also, if the registry running on your Layer 5 device requires authentication, you will need to provide credentials in the [agent.config.auth] section.

Restricting the Network for the Layer 4 Device

Now, that we have the IoT Edge runtime configured, we need to restrict the traffic to and from the Layer 4 device. Here is what are we going to do:

  • We will allow SSH connectivity only from the IP addresses in the Demo/Support network, i.e. where our workstation is. However, we will use a stricter CIDR to allow only  10.16.9.x addresses. Here are the commands that will make that configuration:
    sudo iptables -A INPUT -i eth0 -p tcp --dport 22 -s 10.16.9.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    sudo iptables -A OUTPUT -p tcp --sport 22 -d 10.16.9.0/24 -m state --state ESTABLISHED -j ACCEPT
  • We will enable connectivity on the following ports 8080 (or whatever the registry port is), 8883, 443, and 5671 to the Layer 5 network only, i.e. 10.16.7.x addresses. Here the commands that will make that configuration:
    sudo iptables -A OUTPUT -p tcp --dport <registry-port> -d 10.16.7.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    sudo iptables -A INPUT -i eth0 -p tcp --sport <registry-port> -s 10.16.7.0/24 -m state --state ESTABLISHED -j ACCEPT
    sudo iptables -A OUTPUT -p tcp --dport 8883 -d 10.16.7.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    sudo iptables -A INPUT -i eth0 -p tcp --sport 8883 -s 10.16.7.0/24 -m state --state ESTABLISHED -j ACCEPT
    sudo iptables -A OUTPUT -p tcp --dport 5671 -d 10.16.7.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    sudo iptables -A INPUT -i eth0 -p tcp --sport 5671 -s 10.16.7.0/24 -m state --state ESTABLISHED -j ACCEPT
    sudo iptables -A OUTPUT -p tcp --dport 443 -d 10.16.7.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    sudo iptables -A INPUT -i eth0 -p tcp --sport 443 -s 10.16.7.0/24 -m state --state ESTABLISHED -j ACCEPT
    
  • We will also enable connectivity on the following ports 8080 (or whatever the registry port is), 8883, 443, and 5671 from the OT Proxy network only, i.e. 10.16.5.x addresses. Here the commands that will make that configuration:
    sudo iptables -A OUTPUT -p tcp --dport <registry-port> -d 10.16.5.0/24 -m state --state ESTABLISHED -j ACCEPT
    sudo iptables -A INPUT -i eth0 -p tcp --sport <registry-port> -s 10.16.5.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    sudo iptables -A OUTPUT -p tcp --dport 8883 -d 10.16.5.0/24 -m state --state ESTABLISHED -j ACCEPT
    sudo iptables -A INPUT -i eth0 -p tcp --sport 8883 -s 10.16.5.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    sudo iptables -A OUTPUT -p tcp --dport 5671 -d 10.16.5.0/24 -m state --state ESTABLISHED -j ACCEPT
    sudo iptables -A INPUT -i eth0 -p tcp --sport 5671 -s 10.16.5.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    sudo iptables -A OUTPUT -p tcp --dport 443 -d 10.16.5.0/24 -m state --state ESTABLISHED -j ACCEPT
    sudo iptables -A INPUT -i eth0 -p tcp --sport 443 -s 10.16.5.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    
  • Last, we need to disable any other traffic to and from the Layer 4 device. Here the commands for that:
    sudo iptables -P INPUT DROP
    sudo iptables -P OUTPUT DROP
    sudo iptables -P FORWARD DROP
    

Don’t forget to save the configuration and restart the device with:

sudo netfilter-persistent save
sudo systemctl reboot

Deploying Modules on the Layer 4 Device

Similar to the deployment of modules on the Layer 5 Device, we need to deploy the standard $edgeAgent and $edgeHub modules as well as a registry module. For this layer, I experimented with the API proxy module. Hence, the configurations include also the $IoTEdgeApiProxy module. There is also an API Proxy configuration that you need to set on the $IoTEdgeApiProxy module’s twin – for details, take a look at Configure the API proxy module for your gateway hierarchy scenario. Here are the Gists that you can use for deploying the modules on the Layer 4 device:

By using the API proxy, you can also leverage the certificates deployed on the device to connect to the registry over HTTPS. Thus, you don’t need to configure Docker on the clients to connect to an insecured registry.

Testing the Registry Module

To test the registry module, use the following commands:

docker login -u <sync_token_name> -p <sync_token_password> 10.16.6.4:<port_if_any>
docker pull 10.16.6.4:<port_if_any>/azureiotedge-simulated-temperature-sensor:1.0

Now, we have set up the first nested layer (Layer 4) for our nested IoT Edge infrastructure. In the next post, I will describe the steps to set up the OT Proxy and the second nested layer for IoT Edge (Layer 3).

Image by ejaugsburg from Pixabay

In my last post, I started explaining how to configure the IoT Edge device hierarchy’s IT Proxy. This post will go one layer down and set up the Layer 5 device from the Purdue model for manufacturing networks.

Reconfiguring The Network

While implementing network segregation in the cloud is relatively easy, implementing it with a limited number of devices and a consumer-grade network switch requires a bit more design. In Azure, the routing between subnets within a VNet is automatically configured using the subnet gateways. In my case, though, the biggest challenge was connecting each Raspberry Pi device to two separate subnets using the available interfaces – one WiFi and one Ethernet. In essence, I couldn’t use the WiFi interface because then I was limited to my Eero’s (lack of) capabilities (Well, I have an idea how to make this work, but this will be a topic of a future post :)). Thus, my only option was to put all devices in the same subnet and play with the firewall on each device to restrict the traffic. Here is how the picture looks like.

To be able to connect to each individual device from my laptop (i.e. playing the role of the jumpbox from the Azure IoT Edge for Industrial IoT sample), I had to configure a second network interface on it and give it an IP address from the 10.16.0.0/16 network (the Workstation in the picture above). There are multiple ways to do that with easiest one to buy a USB networking dongle and connect it to the switch with the rest of the devices. One more thing that will be helpful to do to speed up the work is to edit the /etc/hosts file on my laptop and add DNS names for each of the devices:

10.16.8.4       pi-itproxy
10.16.7.4       pi-l5
10.16.6.4       pi-l4
10.16.5.4       pi-otproxy
10.16.4.4       pi-l3
10.16.3.4       pi-opcua

The next thing I had to do is to go back to the IT Proxy and change the subnet mask to enable a broader addressing range. Connect to the IT Proxy device that we configured in Configuring a hierarchy of IoT Edge Devices at Home Part 1 – Configuring the IT Proxy using ssh pi@10.16.8.4. Then edit the DHCP configuration file with sudo vi /etc/dhcpcd.conf and change the subnet mask from /24 to /16:

interface eth0
static ip_address=10.16.8.4/16

Now, the IT Proxy is configured to address the broader 10.16.0.0/16 network. To restrict the communication between the devices, we will configure each individual device’s firewall.

Now, getting back to the Layer 5 configuration. Normally, the Layer 5 device is configured to have access to the Internet via the IT Proxy as a gateway. Edit the DHCP configuration file with sudo vi /etc/dhcpcd.conf and add the following at the end:

interface eth0
static ip_address=10.16.7.4/16
static routers=10.16.8.4
static domain_name_servers=1.1.1.1

Note that I added the Cloudflare’s DNS server to the list of DNS servers. The reason for that is that the proxy device will not do DNS resolution. You can also configure it with your home network’s DNS server. This should be enough for now, and we can start installing the IoT Edge runtime on the Layer 5 device.

Testing from my laptop, I am able to connect to both devices using their 10.16.0.0/16 network IP addresses:

ssh pi@pi-itproxy

connects me to the IT Proxy, and:

ssh pi@pi-l5

connects me to the Layer 5 Iot Edge device.

Create the Azure Cloud Resources

Before we start installing the Azure IoT Edge runtime on the device, we need to create an Azure IoT Hub and register the L5 device with it. This is well described in Microsoft’s documentation explaining how to deploy code to a Linux device. Here are the Azure CLI commands that will create the resource group and the IoT Hub resources:

az group create --name tsm-rg-eus-test --location eastus
az iot hub create --resource-group tsm-rg-eus-test --name tsm-ioth-eus-test --sku S1 --partition-count 4

Next is to register the IoT Edge device and get the connection string. Here the commands:

az iot hub device-identity create --device-id L5-edge-pi --edge-enabled --hub-name tsm-ioth-eus-test
az iot hub device-identity connection-string show --device-id L5-edge-pi --hub-name tsm-ioth-eus-test

The last command will return the connection string that we should use to connect the new device to Azure IoT Hub. It has the following format:

{
  "connectionString": "HostName=tsm-ioth-eus-test.azure-devices.net;DeviceId=L5-edge-pi;SharedAccessKey=s0m#VeRYcRYpT1c$tR1nG"
}

Save the connection string because we will need it in the next section to configure the Layer 5 device.

Installing Azure IoT Edge Runtime on the Layer 5 Device

Installing the Azure IoT Edge is described in the Tutorial: Create a hierarchy of IoT Edge devices article. The tutorial describes how to build a hierarchy with two devices only. The important part of the nested configuration is to generate the certificates and transfer them to the devices. So, let’s go over this step by step for the Layer 5 device.

Create Certificates

The first thing we need to do on the workstation, after cloning the IoT Edge GitHub repository with

git clone https://github.com/Azure/iotedge.git

,is to generate the root and the intermediate certificates (check folder /tools/CACertificates):

./certGen.sh create_root_and_intermediate

Those certificates will be used to generate the individual devices’ certificates. For now, we will only create a certificate for the Layer 5 device.

./certGen.sh create_edge_device_ca_certificate "l5_certificate"

After those two command, you should have the following in your <WORKDIR>/certs folder:

drwxrwxrwx 1 toddysm toddysm 4096 Apr  8 15:04 .
drwxrwxrwx 1 toddysm toddysm 4096 Apr  8 15:04 ..
-rwxrwxrwx 1 toddysm toddysm 3960 Apr  8 14:54 azure-iot-test-only.intermediate-full-chain.cert.pem
-r-xr-xr-x 1 toddysm toddysm 1976 Apr  8 14:54 azure-iot-test-only.intermediate.cert.pem
-rwxrwxrwx 1 toddysm toddysm 5806 Apr  8 14:54 azure-iot-test-only.intermediate.cert.pfx
-r-xr-xr-x 1 toddysm toddysm 1984 Apr  8 14:54 azure-iot-test-only.root.ca.cert.pem
-rwxrwxrwx 1 toddysm toddysm 5891 Apr  8 15:04 iot-edge-device-ca-l5_certificate-full-chain.cert.pem
-r-xr-xr-x 1 toddysm toddysm 1931 Apr  8 15:04 iot-edge-device-ca-l5_certificate.cert.pem
-rwxrwxrwx 1 toddysm toddysm 7240 Apr  8 15:04 iot-edge-device-ca-l5_certificate.cert.pfx

The content of the <WORKDIR>/private folder should be the following:

drwxrwxrwx 1 toddysm toddysm 4096 Apr  8 15:04 .
drwxrwxrwx 1 toddysm toddysm 4096 Apr  8 15:04 ..
-r-xr-xr-x 1 toddysm toddysm 3326 Apr  8 14:54 azure-iot-test-only.intermediate.key.pem
-r-xr-xr-x 1 toddysm toddysm 3326 Apr  8 14:54 azure-iot-test-only.root.ca.key.pem
-r-xr-xr-x 1 toddysm toddysm 3243 Apr  8 15:04 iot-edge-device-ca-l5_certificate.key.pem

We need to upload the relevant certificates to the Layer 5 device. From the <WORKDIR> folder on the workstation issue the following commands:

scp ./certs/azure-iot-test-only.root.ca.cert.pem pi@pi-l5:.
scp ./certs/iot-edge-device-ca-l5_certificate-full-chain.cert.pem pi@pi-l5:.

The above two commands will upload the public key for the root certificate and the certificate chain to the Layer 5 device. The following command will upload the device’s private key:

scp ./private/iot-edge-device-ca-l5_certificate.key.pem pi@pi-l5:.

Connect to the Layer 5 device with ssh pi@pi-l5 and install the root CA using:

sudo cp ~/azure-iot-test-only.root.ca.cert.pem /usr/local/share/ca-certificates/azure-iot-test-only.root.ca.cert.pem.crt
sudo update-ca-certificates

The response from this command should be:

Updating certificates in /etc/ssl/certs...
1 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.

To verify that the cert is installed, you can use:

ls /etc/ssl/certs/ | grep azure

We should also move the device certificate chain and the private key to the /var/secrets folder:

sudo mkdir /var/secrets
sudo mkdir /var/secrets/aziot
sudo mv iot-edge-device-ca-l5_certificate-full-chain.cert.pem /var/secrets/aziot/
sudo mv iot-edge-device-ca-l5_certificate.key.pem /var/secrets/aziot/

Configuring the IoT Edge Runtime

Installation and configuration of the IoT Edge runtime is described in the official Microsoft documentation (see Install or uninstall Azure IoT Edge for Linux ) but here the values you should set for the Layer 5 device configuration when editing the /etc/aziot/config.toml file:

hostname = "10.16.7.4"

trust_bundle_cert = "file:///etc/ssl/certs/azure-iot-test-only.root.ca.cert.pem.pem"

[provisioning]
source = "manual"
connection_string = "HostName=tsm-ioth-eus-test.azure-devices.net;DeviceId=L5-edge-pi;SharedAccessKey=s0m#VeRYcRYpT1c$tR1nG"

[agent.config]
image = "mcr.microsoft.com/azureiotedge-agent:1.2.0-rc4"
[edge_ca]
cert = "file:///var/secrets/aziot/iot-edge-device-ca-l5_certificate-full-chain.cert.pem"
pk = "file:///var/secrets/aziot/iot-edge-device-ca-l5_certificate.key.pem"

Apply the IoT Edge configuration with sudo iotedge config apply and check it with sudo iotedge check. At this point, the IoT Edge runtime should be running on the Layer 5 device. You can check the status with sudo iotedge system status.

Deploying Modules on the Layer 5 Device

The last thing we need to do is to deploy the required modules that will support the lower-layer device. In addition to the standard IoT Edge modules $edgeAgent and $edgeHub we need also to deploy a registry module.  The registry module is intended to serve the container images for the Layer 4 device. You should also deploy the API proxy module to enable a single endpoint for all services deployed on the IoT Edge device. Here are several Gists that you can use for deploying the modules on the Layer 5 device:

Testing the Registry Module

Before setting up the next layer of the hierarchical IoT edge infrastructure, we need to make sure that Layer 5 registry module is working properly. Without it, you will not be able to set up the next layer. From your laptop, you should be able to pull Docker images from the registry module. Depending on how you have deployed the registry module (using one of the Gists above), the pull commands should look something like this:

docker login -u <registry_username> -p <registry_password> 10.16.8.4:<port_if_any>
docker pull 10.16.8.4:<port_if_any>/azureiotedge-simulated-temperature-sensor:1.0

Now, we have set up the top layer (Layer 5) for our nested IoT Edge infrastructure. In the next post, I will describe the steps to set up the first nested layer of the hierarchical infrastructure – Layer 4.

Image by falco from Pixabay.

To provide support for the hierarchical Azure IoT Edge scenarios we started working on a connected registry implementation that will allow extension of the Azure container registry functionality to on-premises. For those of you who are not familiar with what a hierarchical IoT Edge scenario is, take a look at the Purdue network model used in the ISA 95 and ISA 99 standards – TL;DR: it is the network architecture that allows segregation of OT and IT traffic in manufacturing networks. While the Azure IoT team has provided a sample of the hierarchical IoT Edge environment, I wanted to reproduce it using physical ARM-based devices.

The problem with configuring a hierarchy of IoT Edge devices at home is that home networking devices do not allow advanced network configurations that you would normally expect from an enterprise-grade switch or router. While my Eero is good at its mesh WiFi capabilities, it is a very poor implementation of a switch and doesn’t allow the creation of multiple virtual networks or WiFI SSIDs. The routing capabilities are also quite limited, which to be honest, impact the ability to create secure IoT networks at home. However, I derail… To implement my configuration, I gathered a bunch of Raspberry Pi 4s, Pi Zeros, and an Nvidia Jetson Nano and got to work.

Here is the big picture of what are we implementing:

Hierarchy of IoT Edge devices - Purdue Networks

 

In this post, I will walk over the configuration of the IT Proxy in the IT DMZ layer shown in the picture above. For that, I decided to use a Raspberry Pi 4 device and configure it with the Squid proxy similar to the Azure IoT sample linked above. Before that though, let’s look at the device configuration.

Configuring the Raspberry Pi Network Interfaces

Raspberry Pi 4 has two network interfaces – WiFi one and Ethernet one. The idea here is to configure the WiFi interface to connect to my home network using the home network IP range  192.168.0.0/24 and wire the Ethernet interface to a simple switch and assign it a static IP address from the  10.16.8.0/24 network. Routing between the two interfaces should also be established so traffic from the  10.16.8.0/24 network can flow to the 192.168.0.0/24 network. Pulling some details from the Setting up a Raspberry Pi as a routed wireless access point article on the Raspberry Pi’s official website, I ended up with the following configuration.

Warning: Be careful here and don’t copy the commands directly from the Pi’s article! We are routing in the reverse direction – from the Ethernet interface to the WiFi interface.

  • Install the netfilter-persistent and the iptables-persistent plugin to be able to persist the routing rules between reboots:
    sudo DEBIAN_FRONTEND=noninteractive apt install -y netfilter-persistent iptables-persistent
  • Configure a static IP address for the Ethernet interface. To follow the Azure IoT sample, I choose 10.16.8.4. Edit the DHCP configuration file with sudo vi /etc/dhcpcd.conf and add the following at the end:
    interface eth0
    static ip_address=10.16.8.4/24
  • Enable routing by creating a new file with sudo vi /etc/sysctl.d/routed-proxy.conf and add the following in it:
    # Enable IPv4 routing
    net.ipv4.ip_forward=1
  • Next, create a firewall rule with:
    sudo iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE
  • Last, save the configuration and reboot the device:
    sudo netfilter-persistent save
    sudo systemctl reboot

Checking the configuration with ifconfig after reboot should give you something similar to:

pi@raspberrypi:~ $ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.16.8.4  netmask 255.255.255.0  broadcast 10.16.8.255
        inet6 fe80::6816:2aa8:ab8d:9f93  prefixlen 64  scopeid 0x20<link>
        ether dc:a6:32:77:fb:99  txqueuelen 1000  (Ethernet)
        RX packets 5565  bytes 1359978 (1.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 131  bytes 14399 (14.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 22  bytes 1848 (1.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 22  bytes 1848 (1.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.0.101  netmask 255.255.0.0  broadcast 192.168.255.255
        inet6 fe80::6788:d8e2:4e8a:558  prefixlen 64  scopeid 0x20<link>
        ether dc:a6:32:77:fb:9a  txqueuelen 1000  (Ethernet)
        RX packets 7619  bytes 1671937 (1.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1494  bytes 346279 (338.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

To test the connectivity through the newly configured router, I had to set up my Mac with a wired connection using a USB Ethernet adapter. Here the configuration:

traceroute yields satisfactory results showing that the traffic goes through the Raspberry Pi:

toddysm@MacBook-Pro ~ % traceroute 192.168.0.101
traceroute to 192.168.0.101 (192.168.0.101), 64 hops max, 52 byte packets
 1  192.168.0.101 (192.168.0.101)  9.146 ms  7.669 ms  10.125 ms
toddysm@MacBook-Pro ~ % traceroute 172.217.3.164
traceroute to 172.217.3.164 (172.217.3.164), 64 hops max, 52 byte packets
 1  10.16.8.4 (10.16.8.4)  2.555 ms  0.661 ms  0.377 ms
 2  * *^C
toddysm@MacBook-Pro ~ %

Installing and Configuring Squid Proxy

Installing Squid proxy is as trivial as typing the following command:

sudo apt install squid

However, there are two configuration settings you need to do to make sure that the proxy can be used from machines on the local network. Those settings are available in the Squid configuration file /etc/squid/squid.conf:

  • First, the port configuration for the proxy is set to bind to any IP address but when the proxy starts, it binds to the IPv6 addresses instead of the IPv4 ones. You need to change the following line:
    http_port 3128

    and add the IP address of the Ethernet port like this:

    http_port 10.16.8.4:3128
  • Second, access to the proxy is enabled only from the localhost. Uncomment the following line to enable access from the localnet:
    http_access allow localnet

    Also, make sure that the localnet is defined as:

    acl localnet src 0.0.0.1-0.255.255.255  # RFC 1122 "this" network (LAN)
    acl localnet src 10.0.0.0/8             # RFC 1918 local private network (LAN)
    acl localnet src 100.64.0.0/10          # RFC 6598 shared address space (CGN)
    acl localnet src 169.254.0.0/16         # RFC 3927 link-local (directly plugged) machines
    acl localnet src 172.16.0.0/12          # RFC 1918 local private network (LAN)
    acl localnet src 192.168.0.0/16         # RFC 1918 local private network (LAN)
    acl localnet src fc00::/7               # RFC 4193 local private network range
    acl localnet src fe80::/10              # RFC 4291 link-local (directly plugged) machines

    The latter should already be done later in the file.

To make sure that the proxy is properly configured, I changed the networking configuration in my Firefox browser on my Mac as follows:

I also turned off my Mac’s WiFi to make sure that the traffic goes through the wired interface and uses the proxy. The test was successful and now I have the top layer of the hierarchy of IoT Edge network configured.

In the next post, I will go over configuring the L5 of the Purdue network architecture and installing Azure IoT Edge runtime on it.

With the recent Solorigate incident, a lot of emphasis is put on determining the origin of the software running in an enterprise. For Docker container images, this will mean to embed in the image the Dockerfile the image was built from. However, tracking down the software origin is not so trivial to do. For closed-source software, we blindly trust the vendors and if we are lucky enough, we may get a signed piece of code. For open-source one, we rarely check the SHA signature and never even think of verifying what source code this binary was produced from. In talks with customers, I quite often hear them asking, how can they verify what sources a container image is built from. They want to attribute each image with metadata that links to the Dockerfile used to build the image as well as the Git commit and the developer who triggered the build.

There are many articles that discuss this problem. Here are two recent examples. Richard Lander from the Microsoft .NET team writes in his blog post Staying safe with .NET containers about the pedigree and provenance of the software we run and how to think about it. Josh Hendrick in his post Embedding source code version information in Docker images offers one solution to the problem.

Josh Hendrick’s proposal is in the direction I would go, but one problem I have with it is that it requires special handling in the application that runs in the container to obtain this information. I would prefer to have this information readily available without the need to run the container image. Docker images and the Open Container Initiative already have specified ways to do that without adding special files to your image. In this post, I will outline another way you can embed this information into your images and easily retrieve it without any changes to your application.

Using Docker Image Labels

Docker images spec has already built-in functionality to add labels to the image. Labels are intended to be set during build time. They also show up when inspecting the image using docker image inspect, which makes them the right choice to specify the Dockerfile and the other build origin details. One more argument that makes them the right choice for this information is that the labels are layers in the image, and thus immutable. If you change the label in an image the resulting image SHA will change.

To demonstrate how labels can be used to embed the Dockerfile and other origin information into the Docker image, I have published a dynamic labels sample on GitHub. The sample uses a base Python image and implements a simple functionality to print the container’s environment variables. Let’s walk through it step by step.

The Dockerfile is quite simple.

FROM python:slim
ARG IMAGE_COMMITTER
ARG IMAGE_DOCKERFILE
ARG IMAGE_COMMIT_SHA
LABEL "build.user"=${IMAGE_COMMITTER}
LABEL "build.sha"=${IMAGE_COMMIT_SHA}
LABEL "build.dockerfile"=${IMAGE_DOCKERFILE}
ADD ./samples/dynamic-labels/source /
CMD ["python", "/show_environment.py"]

Lines 2-4 define the build arguments that need to be set during the build of the image. Lines 5-7 set the three labels build.user, build.sha, and build.dockerfilethat we want to embed in the image. build.dockerfile is the URL to the Dockerfile in the GitHub repository, while the build.sha is the Git commit that triggers the build. If you build the image locally with some dummy build arguments you will see that new layers are created for each of the lines 5-7.

toddysm@MacBook-Pro ~ % docker build -t test --build-arg IMAGE_COMMITTER=toddysm --build-arg IMAGE_DOCKERFILE=https://test.com --build-arg IMAGE_COMMIT_SHA=12345 -f .\samples\dynamic-labels\Dockerfile .
Sending build context to Docker daemon  376.3kB
Step 1/9 : FROM python:slim
 ---> 8c84baace4b3
Step 2/9 : ARG IMAGE_COMMITTER
 ---> Running in 71ad05f20d20
Removing intermediate container 71ad05f20d20
 ---> fe56c62b9903
Step 3/9 : ARG IMAGE_DOCKERFILE
 ---> Running in fe468c44e9fc
Removing intermediate container fe468c44e9fc
 ---> b776dca57bd7
Step 4/9 : ARG IMAGE_COMMIT_SHA
 ---> Running in 849a82225c31
Removing intermediate container 849a82225c31
 ---> 3a4c6c23a699
Step 5/9 : LABEL "build.user"=${IMAGE_COMMITTER}
 ---> Running in fd4bfb8d5b5b
Removing intermediate container fd4bfb8d5b5b
 ---> 2e9be17c48ff
Step 6/9 : LABEL "build.sha"=${IMAGE_COMMIT_SHA}
 ---> Running in 892323d73495
Removing intermediate container 892323d73495
 ---> b7bc6559629d
Step 7/9 : LABEL "build.dockerfile"=${IMAGE_DOCKERFILE}
 ---> Running in 98687b8dd9fb
Removing intermediate container 98687b8dd9fb
 ---> 35e97d273cbc
Step 8/9 : ADD ./samples/dynamic-labels/source /
 ---> 9e71859892b1
Step 9/9 : CMD ["python", "/show_environment.py"]
 ---> Running in 366b1b6c3bea
Removing intermediate container 366b1b6c3bea
 ---> e7cb39a21c2a
Successfully built e7cb39a21c2a
Successfully tagged test:latest

You can inspect the image and see the labels by issuing the command docker image inspect --format='{{json .Config.Labels}}' <imagename>.

toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' test | jq
{
  "build.dockerfile":"https://test.com",
  "build.sha":"12345",
  "build.user":"toddysm"
}

Now, let’s automate the process with the help of GitHub Actions. I have created one GitHub Action to build and push the image to DockerHub and another to build and push to Azure Container Registry (ACR). Both actions are similar in the steps they use. The first two steps are the same for both actions. They will build the URL to the Dockerfile using the corresponding GitHub Actions variables:

- name: 'Set environment variable for Dockerfile URL for push'
  if: ${{ github.event_name == 'push' }}
  run: echo "DOCKERFILE_URL=${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/blob/${GITHUB_REF#refs/*/}/samples/dynamic-labels/Dockerfile" >> $GITHUB_ENV

- name: 'Set environment variable for Dockerfile URL for pull request'
  if: ${{ github.event_name == 'pull_request' }}
  run: echo "DOCKERFILE_URL=${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/blob/${GITHUB_BASE_REF#refs/*/}/samples/dynamic-labels/Dockerfile" >> $GITHUB_ENV

Then, there will be specific steps to sign into DockerHub or Azure. After that, the build steps are the ones where the labels are set. Here, for example, is the build step that buildx and automatically pushes the image to DockerHub:

- name: Build and push
  id: docker_build
  uses: docker/build-push-action@v2
  with:
    context: ./
    file: ./samples/dynamic-labels/Dockerfile
    push: true
    tags: ${{ secrets.DOCKER_HUB_REPONAME }}:build-${{ github.run_number }}
    build-args: |
      IMAGE_COMMITTER=${{ github.actor }}
      IMAGE_DOCKERFILE=${{ env.DOCKERFILE_URL }}
      IMAGE_COMMIT_SHA=${{ github.sha }}

The build step for building the image and pushing to Azure Container Registry uses the traditional docker build approach:

- name: Build and push
  id: docker_build
  uses: azure/docker-login@v1
  with:
    login-server: ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}
    username: ${{ secrets.ACR_REGISTRY_USERNAME }}
    password: ${{ secrets.ACR_REGISTRY_PASSWORD }}
- run: |
    docker build -f ./samples/dynamic-labels/Dockerfile -t ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}/${{ secrets.ACR_REPOSITORY_NAME }}:build-${{ github.run_number }} --build-arg IMAGE_COMMITTER=${{ github.actor }} --build-arg IMAGE_DOCKERFILE=${{ env.DOCKERFILE_URL }} --build-arg IMAGE_COMMIT_SHA=${{ github.sha }} .
    docker push ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}/${{ secrets.ACR_REPOSITORY_NAME }}:build-${{ github.run_number }}

After the actions complete, the images are available in DockerHub and Azure Container Registry. Here is how the image looks like in DockerHub:

Docker container image with labels

If you scroll down a little, you will see the labels that appear in the list of layers:

The URL points you to the Dockerfile that was used to create the image while the commit SHA can be used to identify the latest changes that are done on the project that is used to build the image. If you pull the image locally, you can also see the labels using the command:

toddysm@MacBook-Pro ~ % docker pull toddysm/tmstests:build-36
build-36: Pulling from toddysm/tmstests
45b42c59be33: Already exists
8cd3485318db: Already exists
2f564129f025: Pull complete
cf1573f5a21e: Pull complete
ceec8aed2dab: Pull complete
78b1088f77a0: Pull complete
Digest: sha256:7862c2a31970916fd50d3ab38de0dad74a180374d41625f014341c90c4b55758
Status: Downloaded newer image for toddysm/tmstests:build-36
docker.io/toddysm/tmstests:build-36
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' toddysm/tmstests:build-36
{
  "build.dockerfile":"https://github.com/CrimsonPinnacle/container-image-inspector/blob/development/samples/dynamic-labels/Dockerfile",
  "build.sha":"e80e6ef86f86a11d6a73aea8d8c41700c4d3d7c5",
  "build.user":"toddysm"
}

To summarize, the benefit of using labels for embedding the Dockerfile and other origin information into the container images is that those are considered immutable layers of the image. Thus, they cannot be changed without changing the image.

Who is Using Docker Image Labels?

Unfortunately, labels are not widely used if at all 🙁 Checking several popular images from DockerHub yields the following results:

toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' busybox | jq
null
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' alpine | jq 
null
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' ubuntu | jq 
null

Tracking down the sources from which the Alpine image is built would require much higher effort.

What is Next for Checking Docker Image Origins?

There are a couple of community initiatives that will play a role in determining the origin of container images.

  • Notary V2 will allow images to be signed. Having the origin information embedded into the image and adding an official signature to the image will increase the confidence in the legitimacy of the image.
  • OCI manifest specification allows artifacts (i.e. images) to be annotated with arbitrary metadata. Unfortunately, Docker doesn’t support those yet. Hopefully, in the future, Docker images will add support for arbitrary metadata that can be included in the image manifest.
  • An implementation of metadata service (see metadata service draft from Steve Lasker) as part of the registry will enable additional capabilities to provide origin information for the images.

Summary

While image metadata is great to annotate images with useful information and enable search and querying capabilities, the metadata is kept outside of the image layers and can mutate over time. Verifying the authenticity of the metadata and keeping a history of the changes will be a harder problem to solve. Docker already provides a way to embed the Dockerfile and other image origin information as immutable layers of the image itself. Using dynamically populated Docker image labels, developers can right now provide origin information and increase the supply chain confidence for their images.

In my previous post, Learn More About Your Home Network with Elastic SIEM – Part 1: Setting Up Elastic SIEM, I explained how you could set up Elastic SIEM on a Raspberry Pi[ad]. The next thing you would want to do is to collect the logs from your firewall and analyze them. Before I jump into the technical details, I should warn you that you may… not be able to do the steps below if you rely on consumer products or you use the equipment provided by your ISP.

Let me go on a short rant here! Every self-respected router vendor should allow firewall logs to be sent to an external system. A common approach is to use the SYSLOG protocol to collect the logs. If your router does not have this capability… well, I would suggest you buy a new, more advanced one.

I personally invested in a tiny Netgate SG-1100 box that runs the open-source PFSense router/firewall. You can, of course, install PFSense on your own hardware if you don’t want to buy a new device. PFSense allows you to configure up to three external log servers. Logstash, that we have configured in the previous post, can play the role of an SYSLOG server and send the events to Elasticsearch. Here is how simple the configuration of the PFSense log shipping looks:

The IP address 192.168.11.72 is the address of the Raspberry Pi, where the ELK SIEM is installed and 5140 is the port that Logstash uses to listen for incoming events. Thas is all you need to configure PFSense to send the logs to the ELK SIEM.

Our next step is to configure Logstash to collect the events from PFSense and feed them into an index in Elastic. The following project from Patrick Jennings will help you with the Logstash configuration. If you follow the instructions, you will see the new index show up in Kibana like this:

The last thing we need to do is to create a dashboard in Kibana to show the data collected from the firewall. Patrick Jennings’ project has pre-configured visualizations and a dashboard for the PFSense data. Unfortunately, when you import those, Kibana warns you that those need to be updated. The reason is that they use the old JSON format used by Kibana, and the latest versions require all objects to be described using the Newline Delimited NDJSON format (for more details, visit ndjson.org). The pfSense dashboard and visualization are available in my GitHub repository for Home SIEM.

Now, keep in mind that the pfSense logs will not feed into the SIEM functionality of the Elastic stack because it is not in the Elastic Common Schema (ECS) format. What we have created is just a dashboard that visualizes the firewall log data. Also, the dashboard and the visualizations are built using the pfSense data. If you use a different router/firewall, you will need to update the configuration to visualize the data, and things may not work out of the box. I would be curious to hear feedback on how other routers can send data to ELK.

In subsequent posts, I will describe how you can use Beats to get data from the machines connected to your local network and how you can dig deeper into the collected data.

 

 

 

 

 

Last night I had some free time to play with my network, and I ran  tcpdump out of curiosity. For a while, I’ve been interested to analyze what traffic is going through my home network, and the result of my test pushed me to get to work. I have a bunch of Raspberry Pi devices in my drawers, so, the simplest thing that I can do is get one and install Elastic SIEM on it. For those of you, who don’t know what SIEM is, it stands for Security Information and Event Management. My hope was that with it, I will be able to get a better understanding of the traffic on my home network.

Installing Elastic SIEM on Raspberry Pi

The first thing I had to do is to install the ELK stack on a Raspberry Pi. There are not too many good articles that explain how to set up Elastic SIEM on your Pi. According to Elastic, Elastic SIEM is generally available in Elastic Stack 7.6. So, installing the Elastic Stack should do the work.

A couple of notes before that:

  1. The first thing to keep in mind is that 8GB is the minimum requirement for the ELK stack. You can get around with 2GB Pi, but if you want to run the whole stack (Elasticsearch, Logstash, and Kibana) on a single device, make sure that you order Raspberry Pi 4 Model B Quad Core 64 Bit with 4GB[ad]. Even this one is a stretch if you collect a lot of information. A good option would be to split the stack over two devices: one for Elasticsearch and Kibana, and another one for the Logstash service
  2. Elastic has no builds for Raspbian. Hence, in the instructions below, I will use Debian packages and will describe the steps to install those on the Pi. This will require some custom configs and scripts, so be prepared for that. Well, this article is how to hack the installation and no warranties are provided 🙂
  3. You will not be able to use the ML functionality of Elasticsearch because it is not supported on the low-powered Raspbian device
  4. Below steps assume version 7.7.0 of the ELK stack. If you are installing a different version, make sure that you replace it accordingly in the commands below
  5. Last but not least (in the spirit of no warranties), Elasticsearch has a dependency on libc6that will be ignored and will break future updates. You have to deal with this at your own risk

Here the steps to follow.

Installing Elasticsearch on Raspberry Pi

  1. Set up your Raspberry Pi first. Here are the steps to set up your Rasberry Pi. The Raspberry Pi Imager makes it even easier to set up the OS. Once again, I would recommend using Raspberry Pi 4 Model B Quad Core 64 Bit with 4GB[ad] and a larger SD card[ad] to save logs for longer.
  2. Make sure all packages are up to date, and your Raspbian OS is fully patched.
    sudo apt-get update
    sudo apt-get upgrade
  3. Install the ZIP utility, we will need that later on for the Logstash configuration.
    sudo apt-get install zip
  4. Then, install the Java JRE because Elastic Stack requires it. You can use the open-source JRE to avoid licensing troubles with Oracle’s one.
    sudo apt-get install default-jre
  5. Once this is done, go ahead and download the Debian package for Elasticsearch. Make sure that you download the package with no JDK in it.
    wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.7.0-no-jdk-amd64.deb
  6. Once this is done, go ahead and install Elasticsearch using the package manager.
    sudo dpkg -i --force-all --ignore-depends=libc6 elasticsearch-7.7.0-no-jdk-amd64.deb
  7. Next, we need to configure Elasticsearch to use the installed JDK.
    sudo vi /etc/default/elasticsearch

    Set the JAVA_HOMEto the location of the JDK. Normally this is /usr/lib/jvm/default-java. You can also set the JAVA_HOMEto the same path in the /etc/environmentfile but this is not required.

  8. Last thing you need to do if to disable the ML XPack for Elasticsearch. Change the access mode to the /etc/elasticsearchdirectory first and edit the Elasticsearch configuration file.
    sudo chmod g+w /etc/elasticsearch
    sudo vi /etc/elasticsearch/elasticsearch.yml

    Change the xpack.ml.enabledsetting to falseas follows:

    xpack.ml.enabled: false

The above steps install and configure the Elasticsearch service on your Raspberry Pi. You can start the service with:

sudo service elasticsearch start

Or check its status with:

sudo service elasticsearch status

Installing Logstash on Raspberry Pi

Installing Logstash on the Raspberry Pi turned out to be a bit more problematic than Elasticsearch. Again, Elastic doesn’t have a Logstash package that targets ARM architecture and you need to install it manually. StackOverflow posts and GitHub issues were particularly helpful for that – I listed the two I used in the References at the end of this article. Here the steps:

  1. Download the Logstash Debian package from Elastic’s repository.
    wget https://artifacts.elastic.co/downloads/logstash/logstash-7.7.0.deb
  2. Install the downloaded package using the dpkg package installer.
    sudo dpkg -i logstash-7.7.0.deb
  3. If you run Logstash at this point and encounter error similar to logstash load error: ffi/ffi -- java.lang.NullPointerException: null get Alexandre Alouit’s fix from GitHub using.
    wget https://gist.githubusercontent.com/toddysm/6b4b9c63f32a3dfc476a725561fc23af/raw/06a2409df3eba5054d7266a8227b991a87837407/fix.sh
  4. Go to /usr/share/logstash/logstash-core/lib/jars and check the version of the jruby-complete-X.X.X.X.jarJAR
  5. Open the downloaded fix.sh, and replace the version of the jruby-complete-X.X.X.X.jaron line 11 with the one from your distribution. In my case, that was  jruby-complete-9.2.11.1.jar
  6. Change the permissions of the downloaded fix.shscript, and run it.
    chmod 755 fix.sh
    sudo ./fix.sh
  7. You can run Logstash with.
    sudo service logstash start

You can check the Logstash logs in /var/log/logstash/logstash-plain.logfor information on whether Logstash is successfully started.

Installing Kibana on Raspberry Pi

Installing Kibana had different challenges. The problem is that Kibana requires an older version of NodeJS, but the one that is packed with the Debian package doesn’t run on Raspbian. What you need to do is to replace the NodeJS version after you install the Debian package. Here the steps:

  1. Download the Kinabna Debian package from Elastic’s repository.
    wget https://artifacts.elastic.co/downloads/kibana/kibana-7.7.0-amd64.deb
  2. Install the downloaded package using the dpkg package installer.
    sudo dpkg -i --force-all kibana-7.7.0-amd64.deb
  3. Move the redistributed NodeJS to another folder (or delete it completely) and create a new empty directory nodein the Kibana installation directory.
    sudo mv /usr/share/kibana/node /usr/share/kibana/node.OLD
    sudo mkdir /usr/share/kibana/node
  4. Next, download version 10.19.0 of NodeJS. This is the required version of NodeJS for Kibana 7.7.0. If you are installing another version of Kibana, you may want to check what NodeJS version it requires. The best way to do that is to start the Kibana service and it will tell you.
    wget https://nodejs.org/download/release/v10.19.0/node-v10.19.0-linux-armv7l.tar.xz
  5. Unpack the TAR and move the content to the nodedirectory under the Kibana installation directory.
    sudo tar -xJvf node-v10.19.0-linux-armv7l.tar.xz
    sudo mv ./node-v10.19.0-linux-armv7l.tar.xz/* /usr/share/kibana/node
  6. You may also want to create symlinks for the NodeJS executable and its tools.
    sudo ln -s /usr/share/kibana/node/bin/node /usr/bin/node
    sudo ln -s /usr/share/kibana/node/bin/npm /usr/bin/npm
    sudo ln -s /usr/share/kibana/node/bin/npx /usr/bin/npx
  7. Configure Kibana to accept requests on any IP address on the device.
    sudo vi /etc/kibana/kibana.yml

    Set the server.hostsetting to 0.0.0.0like this:

    server.host: "0.0.0.0"
  8. You can run Kibana with.
    sudo service kibana start

Conclusion

Although not supported, you can run the complete ELK stack on a Raspberry Pi 4[ad] device. It is not the most trivial installation, but it is not so hard either. In the following posts, I will explain how you can use the Elastic SIEM to monitor the traffic on your network.

References

Here are some additional links that you may find useful:

For a while, I’ve been planning to build a cybersecurity research environment in the cloud that I can use to experiment with and research malicious cyber activities. Well, yesterday I received the following message on my cell phone:

Hello mate, your FEDEX package with tracking code GB-6412-GH83 is waiting for you to set delivery preferences: <url_goes_here>

My curiosity to follow the link was so tempting that I said: “Now is the time to get this sandbox working!” I will write about the scam above in another blog post, but in this one, I would like to explain what I needed in the cloud and how did I set it up.

Cybersecurity Research Needs

What (I think) I need from this sandbox environment? I know that my requirements will change over time when I get deeper in the various scenarios but for now, here is what I wanted to have:

  • First, and foremost, a dedicated network where I can click on malicious links and browse dark web sites without the fear that my laptop or local network will get infected. I also need to have the ability to split the network into subnets for different purposes
  • Next, I needed pre-built VM images for various purposes. Here some examples:
    • A Windows client machine to act as an unsuspicious user. Most probably, this VM will need to have Microsoft Office and other software like Acrobat Reader installed. Other more advanced software that will help track URLs and monitor process may also be required on this machine. I will go deeper into this in a separate post
    • Linux machine with networking tools that will allow me to better understand and monitor network traffic. Kali Linux may be the right choice, but Ubuntu and CentOS may also be helpful as different agents
    • I may also need some Windows Server and Linux server images to simulate enterprise environments
  • Very important for me was to be able to create the VM I need quickly, do the work I need, and tear it down after that. Automating the process of VM creation and set up was high up on the list
  • Also, I wanted to make sure that if I forget the VM running, it will be shut down automatically to save money

With those basic requirements, I got to work setting up the cloud environment. For my experiments, I choose Microsoft Azure because I have a good relationship with the Azure team and also some credits that I can use.

Segregating Network Access

As mentioned in the previous section, I need a separate network for the VMs to avoid any possibility of infecting my laptop. Now, the question that comes to mind is: Is a single virtual network with several subnets OK or not? Having in mind that I can destroy this environment at any time and recreate it (yes, this is the automation part), I decided to go with a single VNet and the following subnets in it:

  • Sandbox subnet to be used to spin up virtual machines that can simulate user behavior. Those will be either single VMs running Windows client and Microsoft Office or set of those if I want to observe the lateral movement within the network. I may also have a Linux machine with Wireshark installed on it to watch network traffic to the web sites that host the malicious pages
  • Honeypot subnet to be used to expose vulnerable devices to the internet. Those may be Windows Server Datacenter VMs or Linux servers without outdated patches and weaker or compromised passwords
  • Frontend subnet to be used to host exploit frameworks for red team scenarios. One example can be the Social Engineering Toolkit (SET). Also, simple redirection services or other apps can be placed in this subnet
  • Public subnet that is required in Azure if I need to set up any load balancers for the exploit apps

With all this in mind, I need to make sure that traffic between those subnets is not allowed. Hence, I had to set up a Network Security Group for each subnet to disable inbound and outbound VNet traffic.

Cybersecurity Virtual Machine Images

This can be an evergrowing list depending on the scenarios but below is the list of the first few I would like to start with. I will give them specific names based on the purpose I plan to use them for:

User VM

The purpose of the User VM is to simulate an office worker (think of receptionist, accountant or, admin). Typically such machines have a Windows Client OS installed on it as well as other office software like Word, Excel, PowerPoint, Outlook and Acrobat Reader. Different browsers like Edge, Chrome, and Firefox will be also useful to test.

At this point, the question that I need to answer is whether I would like to have any other software installed on this machine that will allow me to analyze the memory, reverse engineer binaries or, monitor network traffic on it. I decided to go with a clean user machine to be able to see the exact behavior the user sees and not impact it with the availability of other tools. One other reason I thought this would be a better approach is to avoid malware that checks for the existence of specialized software.

When I started building this VM image, I also had to decide whether I want to have any anti-virus software installed on it. And of course, the question is: “What anti-virus software?” My company is a Sophos Partner, and this would be my obvious choice. I decided to build two VM images to go with: one without anti-virus and one with.

User Analysis VM

This one is the same as the User VM but with added software for malware analysis. The minimum I need installed is:

  • Wireshark for network scanning
  • Cygwin for the necessary Linux tools
  • HEXDump viewer/editor
  • Decompiler

I am pretty sure the list will grow over time and I will keep a tap on what software I have installed or prefer to use. What my intent is to build Kali Linux type of VM but for Windows 🙂

Kali VM

This one is the obvious choice. It can be used not only for offensive capabilities but also to cover your identity when accessing malicious sites. It has all the necessary tools for hackers and it is available from the Microsoft Azure Marketplace.

Tor VM

One last VM type I would like to have is a machine on which I can install the Tor browser for private browsing. Similar to the Kali VM, it can be used for hiding the identity of the machine and the user who accesses the malicious sites. It can also be used to gain access to Dark Web sites and forums for cybersecurity research purposes.

Those are the VM images I plan for now. Later on, I may decide on more.

Automating the Security Research VM Creation

Ideally, I would like to be able to create a whole security research environment with a single script. While this is certainly possible, I will not be able to do it in an hour to load the above URL. I will slowly implement the automation based on my needs going forward. However, one thing that I will do immediately is to save regular snapshots of my VMs once I install new software. I will also need to version those.

I can use those snapshots to quickly spin up new VM with the required software if the previous one gets infected (ant this will certainly happen). So, for now, my automation will be only to create a VM from an Azure Disk snapshot.

Shutting Down the Security Research VMs

My last requirement was to shut down the security research VMs when I don’t need them. Like every other person in the world, I forget things, and if I forget the VMs running, I can incur some expenses that I would not be happy to pay. While I am working on full-fledged scheduling capability for Azure resources for customers, it is still in the works and I cannot use it yet. Hence, I will leverage the built-in Azure functionality for Dev/Test workloads and will schedule a daily shutdown at 10:00 PM PST time. This way, if I forget to turn off any VM, it will not continue to run all the time.

With all this, my plan is ready and I can move on to build my security research environment.

In my previous post What to Desire from a Good Image Annotator?, I wrote about the high-level capabilities of an Image Annotation Tool. In this one, I will go over the requirements for the actual image annotations or as you may also know it, tagging. I will use two images as examples. The first one is a scanned receipt. The receipt example can be used to generalize the broader category of scanned documents, whether financial, legal, or others. The second example is of a cityscape. That one can be used to generalize any other image.

Annotating Store Receipt

Let’s start with the receipt. A receipt is a scanned document that contains financial information. Below is just one way that you may want to annotate a receipt.

Annotated receipt

In this example, I have decided to annotate the receipt using the logical grouping of information printed on it. Each region is a rectangle that contains the part of the image that belongs together. Here is the list of regions and their possible annotations:

  • Region ID: 1
    Annotation: Store Logo
    Description: This can be the store logo or just the name printed on the receipt
  • Region ID: 2
    Annotation: Store Details
    Description: This can include information like address, phone number, store number, etc.
  • Region ID: 3
    Annotation: Receipt Metadata
    Description: This can be the date and time, receipt number as well as another receipt specific metadata
  • Region ID: 4
    Annotation: Cashier Details
    Description: This is information about the cashier
  • Region ID: 5
    Annotation: Items
    Description: Those are the purchased items, the quantities and the individual item price
  • Region ID: 6
    Annotation: Receipt Summary
    Description: This is the summary of the information for the purchase like subtotal amount, tax and the total amount
  • Region ID: 7
    Annotation: Customer Information
    Description: This is information about the customer and any loyalty programs she or he participates to
  • Region ID: 8
    Annotation: Merchant Details
    Description: This is additional information about the merchant
  • Region ID: 9
    Annotation: Transaction Type
    Description: This is information about the transaction
  • Region ID: 10
    Annotation: Transaction Details
    Description: This contains information about the transaction with the payment card processor. It can include transaction ID, the card type and number, timestamp, authorization code, etc.
  • Region ID: 11
    Annotation: Transaction Amounts
    Description: This summarizes the amounts for the transaction with the payment card processor
  • Region ID: 12
    Annotation: Transaction Status
    Description: This is the status of the transaction – i.e., Approved or Declined
  • Region ID: 13
    Annotation: Transaction Info
    Description: Those are technical details about the transaction
  • Region ID: 14
    Annotation: Copy Owner
    Description: This is information about the ownership of the receipt. Usually, this is Merchant or Customer
  • Region ID: 15, 16, and 17
    Annotation: Additional Details
    Description: Those can be various things like return policies, disclaimers, advertisement, surveys, notes, and so on. In this example, we have 15 as Return Policy, 16 as Survey and 17 as Additional Notes

When you think about it, the above areas will be the ones that your eyesd will immediately look to find information. For example, if you want to know what store the receipt was from, you will directly look at the top where the logo should be (Region #1); if you want to know what the total amount is, your eyes will steer towards the receipt summary (Region #6) and so on. Majority of us will follow a similar approach for separating the data because it is something that we do every day in our minds.

Few things to note about the annotations above. First, not every receipt will have all the information from above. Some receipts will have more and some less. Second, annotations evolve. After annotating a certain number of receipts, you start building a pattern and make fewer changes the more you annotate. However, after some time, you may discover that the patterns you developed need to be updated. A straightforward example is a better name for the annotation. If this happens, you need to go back and change the names. Third, there is no standard way to name those annotations. You and I will undoubtedly have different names for the same thing.

Now, let’s write a few requirements from this receipt example.

  1. The first thing we did is to draw the rectangular regions that we want to annotate. And this is our first and simplest requirement.
  2. The second thing we did is to annotate the rectangular region. When we create the annotation, we should be able to add additional information like description of the annotation
  3. The third thing we want is to be able to update annotation information retrospectively.

Those are good as a beginning. But to provide more context and backup our requirements, it will be useful to think about how those annotations will be used, i.e., define our use cases. I kind of hinted to those above.

Use Case #1: Logo Recognition

Let’s say; you are developing classification application that is used to recognize the store the receipt if from. You can easily do this by looking at the store logo only and develop a machine learning algorithm that returns the name of the store by recognizing the logo. For this, the only region you will need is Region 1 with the logo. Thus, you can just cut this region from the receipt and teach your algorithm only on the logo. That way you minimize the noise from the rest of the receipt and your algorithm can have better accuracy.

Use Case #2: Receipt Amount Extraction

If your application needs to extract the summary amounts from the receipt, you can concentrate on Region 6. That region contains all the information you will need. Few things you can do with this region are:

  • Binarize the area
  • Straighten the text
  • OCR the text
  • Analyze the extracted text (not an image related task anymore:))

This use case is applicable for any other are you annotated on the receipt. It doesn’t matter whether you want to obtain the credit card number or the timestamp; the approach will be the same.

Nested Annotations

Now, let’s look at another way to annotate the same receipt.

Annotated Receipt

If your application needs to determine what are your shopping habbits based on geography, you will need to extract detailed information about the store location. Thus, you will want to annotate the receipt as above to know which part is the street address, which is the city, etc. But those regions are all nested in Region 2 from our first annotation pass. It will be useful to have both types of annotations and use them for different use cases.

So, the requirements for the tool will be:

That is also very relevant in the next example, where we have areas with buildings but also want to annotate a single building.

Annotating Cityscapes

Annotating landscapes, cityscapes or other images with real objects is very similar to the receipt annotation. However, real objects rarely have regular shapes in pictures. Here is an example from a picture I took in Tokyo some time ago.

Annotated Cityscape

In this example, I have annotated only a few of the objects: two buildings (1 and 2), a crane (3), soccer field (4) and a tree (5). The requirements for annotating landscapes are not too different from the requirements for annotating documents. There is just one more thing we need to add to the tool to support real object tagging:

There are many use cases that you can develop for real-object recognition, and for that, versatile annotation capabilities will be important in any tool.

Additional Requirements for Annotations

All requirements that I have listed above are specific to the objects or areas in the pictures. However, we need to have the ability to add meta information to the whole picture. Well, you may think we already have a way to do that! We can use the EXIF data. The EXIF data is helpful, and it is automatically populated by the camera or the editing tool. However, it has limited capabilities for free-form meta-information because its fields are standardized.

For example, if you want to capture information who annotated the image last and at what time, you cannot use the EXIF fields for that. You can repurpose some EXIF fields, but you will lose the the original information. What we need is a simple way to create key-value metadata for the image. Of course, having the ability to see the EXIF information would be a helpful feature, although maybe not a high priority one.

With all that, I believe we have enough requirements to start working on tool design. If you are curious to follow the development or participate in it, you can head over to the Image Annotator Github project. The next thing we need to do is to do some design work. That includes UI design, back-end design, and data model.