Blog

In Part 1 of the series Signatures, Key Management, and Trust in Software Supply Chains, I wrote about the basic concepts of identities, signatures, and attestation. In this one, I will expand on the house buying scenario, that I hinted about in Part 1, and will describe a few ways to exploit it in the physical world. Then, I will map this scenario to the digital world and delve into a few possible exploits. Throughout this, I will also suggest a few possible mitigations in both the physical as well as the digital world. The whole process as you may have already known is called threat modeling.

Exploiting Signatures Without Attestation in the Offline World

For the purpose of this scenario, we will assume that the parties involved are me and the title company. The document that needs to be signed is the deed (we can also call it the artifact). Here is a visual representation of the scenario:

Here is how the trust is established:

  • The title company has an inherent trust in the government.
  • This means that the title company will trust any government-issued identification like a driving license.
  • In my meeting with the title company, I present my driving license.
  • The title company verifies the driving license is legit and establishes trust in me.
  • Last, the title company trusts the signature that I use to sign the deed in front of them.
  • From here on, the title company trusts the deed to proceed with the transaction.

As we can see, establishing trust between the parties involves two important conditions – implicit trust in a central authority and verification of identity. Though, this process is easily exploitable with fake IDs (like fake driving license) as shown in the picture below.

In this case, an imposter can obtain a fake driving license and impersonate me in the transaction. If the title company can be fooled that the driving license is issued by the government, they can falsely establish trust in the imposter and allow him to sign the deed. From there on, the title company considers the deed trusted and continues with the transaction.

The problem here is with the verification step – the title company does not do a real-time verification if the driving license is legitimate. The verification step is done manually and offline by an employee of the title company and relies on her or his experience to recognize forged driving licenses. If this “gate” is passed, the signature on the deed becomes official and will not be verified anymore in the process.

There is one important step in this process that we didn’t mention yet. When the title company employee verifies the driving license, she or he also takes a photocopy of the driving license and attaches it to the documentation. This photocopy becomes part of the audit trail for the transaction if later on is discovered that the transaction needs to be reverted.

Exploiting Signatures Without Attestation in the Digital World

The above process is easily transferable to the digital world. In the following GitHub project I have an example of signing a simple text file artifact.txt. The example uses self-signed certificates for verifying the identity and the signature.

There are two folders in the repository. The real folder contains the files used to generate a key and X.509 certificate that is tied to my real identity and verified using my real domain name toddysm.com. The fake folder contains the files used to generate a key and X.509 certificate that is tied to an imposter identity that can be verified with a look-alike (or fake) domain. The look-alike domain uses homographs to replace certain characters in my domain name. If the imposter has ownership of the imposter domain, obtaining a trusted certificate with that domain name is easily achievable.

The dilemma you are presented with is, which certificate to trust – the one here or the one here. When you verify both certificates using the following commands:

openssl x509 -nameopt lname,utf8 -in [cert-file].crt -text -noout | grep Subject:
openssl x509 -nameopt lname,utf8 -in [cert-file].crt -text -noout | grep Issuer:

they both return visually indistinguishable information:

Subject: countryName=US, stateOrProvinceName=WA, localityName=Seattle, organizationName=Toddy Mladenov, commonName=toddysm.com, emailAddress=me@toddysm.com
Issuer: countryName=US, stateOrProvinceName=WA, localityName=Seattle, organizationName=Toddy Mladenov, commonName=toddysm.com, emailAddress=me@toddysm.com

It is the same as looking at two identical driving licenses, a legitimate one and a forged one, that have no visible differences.

The barrier for this exploit using PGP keys and SSH keys is even lower. While X.509 certificates need to be issued by a trusted certificate authority (CA), PGP and SSH keys can be issued by anybody. Here is a corresponding example of a valid PGP key and an imposter PGP key. Once again, which one would you trust?

Though, compromising CAs is not something that we can ignore. There are numerous examples where forged certificates issued by legitimate CAs are used:

Let’s also not forget that Stuxnet malware was signed by compromised JMicron and Realtec private keys. In the case of compromised CA, malicious actors don’t even need to use homographs to deceive the public – they can issue the certificate with the real name and domain.

Unlike the physical world though, the digital one misses the very important step of collecting audit information when the signature is verified. I will come back to that in the next post of the series where I plan to explore the various controls that can be put to increase security.

Based on the above though, it is obvious that the trust whether in a single entity or a central certificate authority (CA), has highly diminished in recent years.

Oh, and don’t trust the keys that I published on GitHub! 🙂 Anybody can copy them or generate new ones with my information – unfortunately obtaining that information is quite easy nowadays.

Exploiting Signatures With Attestation in the Offline World

Let’s look at the example I introduced in the previous post where more parties are involved in the process of selling my house. Here is the whole scenario!

Because I am unable to attend the signing of the documents, I need to issue a power of attorney for somebody to represent me. This person will be able to sign the documents on my behalf. First and foremost, I need to trust that person. But my trust in this person doesn’t automatically transfer to the title company that will handle the transaction. For the title company to trust my representative, the power of attorney needs to be attested by a certified notary. Only then will the title company trust the power of attorney document and accept the signature of my representative.

Here is the question: “How the introduction of the notary increases the security?” Note that I used the term “increase security”. While there is no 100% guarantee that this process will not fail…

By adding one more step to the process, we introduce an additional obstacle that reduces the probability for malicious activity to happen, which increases the security.

What the notary will eventually prevent is that my “representative” forcefully makes me sign the power of attorney. My security is compromised and now my evil representative can use the power of attorney to sell my house to himself for just a dollar. The purpose of the notary is to attest that I willfully signed the document and was present (and in good health) during the signing. Of course, this can easily be exploited if both, the representative and the notary are evil, as shown in the below diagram.

As you can see in this scenario, all parties have valid government-issued IDs that the title company trusts. However, the process is compromised if there is collusion between the malicious actor (evil representative) and the notary.

Other ways to exploit this process are if the notary or my representative are both or individually impersonated. The impersonation is described in the section above – Exploiting Signatures Without Attestation in the Offline World.

Exploiting Signatures With Attestation in the Digital World

There is a lot of talks recently about implementing attestation systems that will save signature receipts in an immutable ledger. This is presented as the silver bullet solution for signing software artifacts (check out the Sigstore project). Similar to the notary example in the previous section, this approach may increase security but it may also have a negative impact. Because they compare themselves to Let’s Encrypt, let me take a stab at how Let’s Encrypt impacted the security on the Web.

Before Let’s Encrypt, only owners that want to invest money to pay for valid certificates had HTTPS enabled on their websites. More importantly, though, browsers showed a clear indicator when a site was using plain HTTP protocol and not the secure one. From a user’s point of view it was easy to make the decision that if the browser address bar was red, you should not enter your username and password or your credit card. Recognizing malicious sites was relatively easy because malicious actors didn’t want to spend the money and time to get a valid certificate.

Let’s Encrypt (and the browser vendors) changed that paradigm. Being free, Let’s Encrypt allows anybody to issue a valid (and “trusted”??? 🤔) certificate and enable HTTPS for their site. Not only that but Let’s Encrypt made it so easy that you can get the certificate issued and deployed to your web server using automation within seconds. The only proof you need to provide is the ownership of the domain name for your server. At the same time, Google led the campaign to change the browser indicators to show a very mediocre lock icon in the address bar that nobody except maybe a few pays any attention to anymore. As a result, every malicious website now has HTTPS enabled and there is no indication in the browser to tell you that it is malicious. In essence, the lock gives you a false sense of security.

I would argue that Let’s Encrypt (and the browser vendors) in fact decreased the security on the web instead of increasing it. Let me be clear! While I think Let’s Encrypt (and the browser vendors) decreased the security, what they provide had a tremendous impact on privacy. Privacy should not be discounted! Though in marketing messages those two terms are used interchangeably and this is not for the benefit of the users.

In the digital world, the CA can play the role of the notary in the physical world. The CA verifies the identity of the entity that wants to sign artifacts and issues a “trusted” certificate. Similar to a physical world notary, the CA will issue a certificate for both legit as well as malicious actors, and unlike the physical world, the CA has very basic means to verify identities. In the case of Let’s Encrypt this is the domain ownership. In the case of Sigstore that will be a GitHub account. Everyone can easily buy a domain or register a GitHub account and get a valid certificate. This doesn’t mean though that you should trust it.

Summary

The takeaway from this post for you should be that every system can be exploited. We learn and create systems that reduce the opportunities for exploitation but that doesn’t make them bulletproof. Also, when evaluating technologies we should not only look at the shortcomings of the previous technology but also at the shortcoming of the new shiny one. Just adding attestation to the signatures will not be enough to make signatures more secure.

In the next post, I will look at some techniques that we can employ to make signatures and attestations more secure.

Photo by Erik Mclean on Unsplash

 

 

For the past few months, I’ve been working on a project for a secure software supply chain, and one topic that seems to always start passionate discussions is the software signatures. The President’s Executive Order on Improving the Nation’s Cybersecurity (EO) is a pivotal point for the industry. One of the requirements is for vendors to document the supply chain for software artifacts. Proving the provenance of a piece of software is a crucial part of the software supply chain, and signatures play a main role in the process. Though, there are conflicting views on how signatures should work. There is the traditional PKI (Public Key Infrastructure) approach that is well established in the enterprises, but there are other traditional and emerging technologies that are brought up in discussions. These include PGP key signatures, SSH key signatures, and the emerging ephemeral key (or keyless) signatures (here, here, and lately here).

While PKI is well established, the PKI shortcomings were outlined by Bruce Schneier and Carl Elisson more than 20 years ago in their paper. The new approaches are trying to overcome those shortcomings and democratize the signatures the same way Let’s Encrypt democratized HTTPS for websites. Though, the question is whether those new technologies improve security over PKI? And if so, how? In a series of posts, I will lay out my view of the problem and the pros and cons of using one or another signing approach, how the trust is established, and how to manage the signing keys. I will start with the basics using simple examples that relate to everyday life and map those to the world of digital signatures.

In this post, I will go over the identity, signature, and attestation concepts and explain why those matter when establishing trust.

What is Identity?

Think about your own experience. Your identity is you! You are identified by your gender, skin color, facial and body characteristics, thumbprint, iris print, hair color, DNA etc. Unless you have an identical twin, you are unique in the world. Even if you are identical twins, there are differences like thumbprints and iris prints that make you unique. The same is true for other entities like enterprises, organizations, etc. Organizations have names, tax numbers, government registrations, addresses, etc. As a general rule, changing your identity is hard if not impossible. You can have plastic surgery but you cannot change your DNA. The story may be a bit different for organizations that can rename themselves, get bought or sold, change headquarters, etc. but it is still pretty easy to uniquely identify organizations.

All the above points that identities are:

  • unique
  • and impossible (or very hard) to change

In the digital world, identities are an abstract concept. In my opinion, it is wrong to think that identities can be changed in both the physical and the digital world. Although we tend to think that they can be changed, this is not true – what can be changed is the way we prove our identity. We will cover that shortly but before that, let’s talk about trust.

If you are a good friend of mine, you may be willing to trust me but if you just met me, your level of trust will be pretty low. Trust is established based on historical evidence. The longer you know me, and the longer I behave honestly, the more you will be willing to trust me. Sometimes I may not be completely honest, or I may borrow some money from you and not return them. But I may buy you a beer every time we go out and offset that cost and you may be willing to forgive me. It is important to note that trust is very subjective, and while you may be very forgiving, another friend of mine maybe not. He may decide that I am not worth his trust and never borrow me money again.

How do We Prove Our Identity?

In the physical world, we prove our identity using papers like a driving license, a passport, an ID card, etc. Each one of those documents is issued for a purpose:

  • The driving license is mainly used to prove you can drive a motorized vehicle on the US streets. Unless it is an enhanced driving license, you (soon) will not be able to use it to board a domestic flight. However, you cannot cross borders with your driving license and you cannot use it to even rent a car in Europe (unless you have an international driving license).
  • To cross borders you need a passport. The passport is the only document that is recognized by border authorities in other countries that you visit. You cannot use your US driving license to cross the borders in Europe. The interesting part is that you do not need a driving license to get a passport or vice versa.
  • You also have your work badge. Your work badge identifies you as an employee of a particular organization. Despite the fact that you have a driving license and a passport, you cannot enter the buildings without your badge. However, to prove to your employer that you are who you are for them to issue you the badge, you must have a driving license or a passport.

In the digital world, there are similar concepts to prove our identity.

  • You can use a username, password and another factor (2FA/MFA token) to prove your identity to a particular system.
  • App secret that you can generate in a system can also be used to prove your identity.
  • OAuth or SSO (single sign-on) token issued by a third party is another way to prove your identity to a particular system. That system though needs to trust the third party.
  • SSH key can be an alternate way to prove your identity. You can use it in conjunction with username/password combination or separately.
  • You can use PGP key to prove your identity to an email recipient.
  • Or use a TLS certificate to prove the identity of your website.
  • And finally, you can use an X.509 certificate to prove your identity.

As you can see, similar to the physical world, in the digital world you have multiple ways to prove your identity to a system. You can use more than one way for a single system. The example that comes to mind is GitHub – you can use app secret or SSH key to push your changes to your repository.

How Does Trust Tie to the Concepts Above? Let’s say that I am a good developer. My code published on GitHub has a low level of bugs, it is well structured, well documented, easy to use, and updated regularly. You decide that you can trust my GitHub account. However, I also have DockerHub account that I am negligent with – I don’t update the containers regularly, they have a lot of vulnerabilities, and are sloppily built. Although you are my friend and you trust my GitHub account, you are not willing to trust my DockerHub account. This example shows that trust is not only subjective but also based on context.

OK, What Are Signatures?

Here is where things become interesting! In the physical world, a signature is a person’s name written in that person’s handwriting. Just the signature does not prove my identity. Wikipedia’s entry for signature defines the traditional function of a signature as follows:

…to permanently affix to a document a person’s uniquely personal, undeniable self-identification as physical evidence of that person’s personal witness and certification of the content of all, or a specified part, of the document.

The keyword above is self-identification. This word in the definition has a lot of implications:

  • First, as a signer, I can have multiple signatures that I would like to use for different purposes. I.e. my identity may use different signatures for different purposes.
  • Second, nobody attests to my signature. This means that the trust is put in a single entity – the signer.
  • Third, a malicious person can impersonate me and use my signature for nefarious purposes.

Interestingly though, we are willing to accept the signature as proof of identity depending on the level of trust we have in the signer. For example, if I borrow $50 from you and give you a receipt with my signature the I will pay you back in 30 days, you may be willing to accept it even if you don’t know me too much (i.e. your level of trust is relatively low). This is understandable because we decide to lower our level of trust to just self-identification. I can increase your level of trust if I show you my driving license that has my signature printed on it and you can compare both signatures. However, showing you my driver’s license is actually an attestation, which is covered in detail below.

In the digital world, to create a signature, you need a private key and to verify a signature, you need a public key (check the Digital Signature article on Wikipedia). The private and the public key are related and work in tandem – the private key signs the content and the public key verifies the signature. You own both but keep the private secret and publish the public to everybody to use. From the examples I have above, you can use PGP, SSH, and X.509 to sign content. However, they have differences:

  • PGP is a self-generated key-pair with additional details like name and email address included in the public certificate, that can be used for (pseudo)identification of the entity that signs the content. You can think of it as similar to a physical signature, where, in addition to the signature you verbally provide your name and home address as part of the signing process.
  • SSH is also a self-generated key pair but has no additional information attached. Think of it as the plain physical signature.
  • With X.509 you have a few options:
    • Self-generated key-pair similar to the PGP approach but you can provide more self-identifying information. When signing with such a private key you can assume that it is similar to the physical signature, where you verbally provide your name, address, and date of birth.
    • Domain Validated (DV) certificate that validates your ownership of a particular domain (this is exactly what Let’s Encrypt does). Think of this as similar to a physical signature where you verbally provide your name, address, and date of birth as well as show a utility bill with your name and address as part of the signing process.
    • Extended Validation (EV) certificate that validates your identity using legal documents. For example, this can be your passport as an individual or your state and tax registrations as an organization.
      Both, DV and EV X.509 certificates are issued by Certificate Authorities (CA), which are trusted authorities on the Internet or within the organization.

Note: X.509 is actually an ITU standard defining the format of public-key certificates and is at the basis of the PKI. The key pair can be generated using different algorithms. Though, the term X.509 is used (maybe incorrectly) as a synonym for the key-pair also.

Without any other variables in the mix, the level of trust that you may put on the above digital approaches would most probably be the following: (1-Lowest) SSH, (2) PGP and self-signed X.509, (3) DV X,509, and (4-Highest) EC X.509. Keep in mind that DV and EV X.509 are actually based on attestation, which is described next.

So, What is Attestation?

We finally came to it! Attestation, according to Meriam-Webster dictionary, is an official verification of something as true or authentic. In the physical world, one can increase the level of trust in a signature by having a Notary attest to the signature (lower level of trust) and adding government apostille (higher level of trust used internationally). In many states notaries are required (or highly encouraged) to keep a log for tracking purposes. While you may be OK with having only my signature on a paper for $50 loan, you certainly would want to have a notary attesting to a contract for selling your house to me for $500K. The level of trust in a signature increases when you add additional parties who attest to the signing process.

In the digital world, attestation is also present. As we’ve mentioned above, CAs act as the digital notaries who verify the identity of the signer and issue digital certificates. This is done for the DV and EV X.509 certificates only though. There is no attestation for PGP, SSH, and self-signed X.509 certificates. For digital signatures, there is one more traditional method of attestation – the Timestamp Authority (TSA). The TSA’s role is to provide an accurate timestamp of the signing to avoid tampering with the time by changing the clock on the computer where the signing occurs. Note that the TSA attests only for the accuracy of the timestamp of signing and not for the identity of the signer. One important thing to remember here is that without attestation you cannot fully trust the signature.

Here is a summary of the signing approaches and the level of trust we discussed so far.

Signing Keys and Trust

Signing Approach Level of Trust
SSH Key 1 - Lowest
PGP Key 2 - Low
X.509 Self-Signed 2 - Low
X.509 DV 3 - Medium
X.509 EV 4 - High

Now, that we’ve established the basics let’s talk about the validity period and why it matters.

Validity Period and Why it Matters?

Every identification document that you own in the physical world has an expiration date. OK, I lied! I have a German driving license that doesn’t have an expiration date. But this is an exception, and I can claim that I am one of the last who had that privilege – newer driving licenses in Germany have an expiration date. US driving licenses have an expiration date and an issue date. You need to renew your passport every five years in the US. Different factors determine why an identification document may expire. For a driving license, the reason may be that you lost some of your vision and you are not capable of driving anymore. For a passport, it may be because you moved to another country, became a citizen, and forfeit your right to be a US citizen.

Now, let’s look at physical signatures. Let’s say that I want to issue a power of attorney to you to represent me in the sale of my house while I am on a business trip for four weeks in Europe. I have two options:

  • Write you a power of attorney without an expiration date and have a notary attest to it (else nobody will believe you that you can represent me).
  • Write you a power of attorney that expires four weeks from today and have a notary attest to it.

Which one do you think is more “secure” for me? Of course the second one! The second power of attorney will give you only a limited period to sell my house. While this does not prevent you from selling it in a completely different transaction than the one I want, you are still given some time constraints. The counterparts in the transaction will check the power of attorney and will note the expiration date. If there is a final meeting four weeks and a day from now, that will require you to sign the final papers for the transaction, they should not allow you to do that because the power of attorney is not valid anymore.

Now, here is an interesting situation that often gets overlooked. Let’s say that I sign the power of attorney on Jan 1st, 2022. The power of attorney is valid till the end of day Jan 28th, 2022. I use my driving license to identify myself to the notary. My driving license has an expiration date of Jan 21st, 2022. Also, the notary’s license expires on Jan 24th, 2022. What is the last date that the power of attorney is valid? I will leave this exploration for one of the subsequent posts.

Time constraints are a basic measure to increase my security and prevent you from selling my house and pocketing the money later in the year. I will expand on this example in my next post where I will look at different ways to exploit signatures. But the basic lesson here is: the more time you have to exploit something, the higher probability there is for you to do so. Also, another lesson is: put an expiration date on all of your powers of attorney!

How does this look in the digital world?

  • SSH keys do not have expiration dates. Unless you provide the expiration date in the signature itself, the signature will be valid forever.
  • PGP keys have expiration dates a few years in the future. I just created a new key and it is set to expire on Jan 8th, 2026. If I sign an artifact with it and don’t provide an expiration date for the signature, it will be considered valid until Jan 8th, 2026.
  • X.509 certificates also have long expiration dates – 3, 12, or 24 months. Let’s Encrypt certificates have 3 months expiration dates. Root CA certificates have even longer expiration dates, which can be dangerous as we will explore in the future. Let’s Encrypt was the first to reduce the length of validity of their certificates to increase the security of certificate compromise because domains change hands quite often. Enterprises followed suit because the number of stolen enterprise certificates is growing.

Note: In the next post, I will expand a little bit more into the relationships between keys and signatures but for now, you can use them as the example above where I mention the various validity periods for documents used for the power of attorney.

Summary

If nothing else, here are the main takeaways that you should remember from this post:

  • Signatures cannot infer identities. Signatures can be forged even in the digital world.
  • One identity can have many signatures. Those signatures can be used for different purposes.
  • For a period of time, a signature can infer identity if it is attested to. However, the longer time passes, the lower the trust in this signature should be. Also, the period of time is subjective and dependent on the risk level of the signature consumer.
  • To increase security, signatures must expire. The shorter the expiration period, the higher the security (but also other constraints should be put in place).
  • Before trusting a signature, you should verify if the signed asset is still trustable. This is in line with the zero-trust principle for security: “Never trust, always verify!”.

Take a note that in the last bullet point, I intentionally use the term “asset is trustable” and not “signature is valid”. In the next post, I will go into more detail about what that means, how signatures can be exploited, and how context can provide value.

Featured image by StockSnap.

If you have missed the news lately, cybersecurity is one of the most discussed topics nowadays. From supply chain exploits to data leaks to business email compromise (BEC) there is no break – especially during the pandemic. Many (if not all) start with an account compromise. And if you ask any cybersecurity expert, they will tell you that the best way to protect your account is to use two-factor (or multi-factor) authentication. Well, let me tell you a secret – MFA sucks! Ask the Okta guys! Even they think MFA sucks. And they are a mobile security company. Though, Randall and I have different motives to make that claim.

By the way: 2FA stands for “two-factor authentication” while MFA stands for “multi-factor authentication”. I will use those two acronyms to save on some typing. And one more, TLA means a “three-letter acronym”.

Randall goes on and on in his post about why MFA sucks. Most of his points are valid! It is an annoying, and frustrating experience. I don’t know about slow, but I would argue against being pointless – it serves a purpose, a very good purpose. Where he is mainly wrong is thinking that the solution is yet another technology (Well, the whole point of his post is to market Okta’s new technology, so, he will get a pass for that 😉 ). This new technology will not address the source of the issue – that people are scared to use MFA. Take a look at the Twitter Account Security survey – why do you think only 2.3% (at the time of this writing) of all Twitter users have MFA enabled? Here are what I think the reasons are:

  • Complexity and lack of understanding of the technology
  • Fear of losing access to the accounts

I believe people are smart enough to grasp the benefits without too many explanations. What they are not clear is how to set it up and how to make sure they don’t lose access to their accounts. In general, my frustration is with how the technology vendors have implemented MFA – without any thought about the user experience. Let me illustrate what I mean with my own experience.

The Problem With Too Many MFAs

I have set up MFA on all my important accounts. The list is long: bank accounts, credit card accounts, stock brokerage accounts, government accounts (like taxes, DMV, etc), email accounts (like Office 365 and GMail), GitHub, Twitter, Facebook, you name it. Have been doing this for years already and the list continues to grow. I am also required by former and current employers to use MFA for my work accounts – also a long list. Here is a list of MFA methods I use (and I don’t claim this to be a comprehensive list):

  • SMS
  • email (several different emails)
  • authenticators apps (here the things are getting crazy)
    • Microsoft Authenticator
    • Google Authenticator (I started with this one)
    • Entrust
    • VIP Access
    • Lastpass Authenticator
  • other vendor-specific implementations (I really don’t know how to call those, but they have their own way to do it)
    • Apple
    • WordPress
    • Facebook
  • Yubikey (I have three of those and I will ignore those in this post because the hardware key experience has been the same since the … 90s or so)

You’ll think I am crazy, right? Why do I use so many authenticators and MFA methods? I don’t want to, but the problem is that I have to!

First of all, this all evolved over the last 7-8 years since I started using MFA. It started with Google Authenticator; then Microsoft Authenticator and the Lastpass one; then a few banks added email and SMS (not very secure but interestingly some very big financial institutions still don’t offer other options!!!) while others struck deals with specific vendors to use their own authenticators – hence, I had to install one-off apps for those accounts. A couple of years ago I bought Yubukeys to secure my password managers and some important work and bank accounts.

At some point in time, I decided to unify on a single method! Or a single authenticator app and the Yubikey. That turns out to be impossible! Despite the fact that there is a standard, the authenticator vendors do not give you an easy way to transfer the seeds from one app to another. The only way to do that is to go over tens of accounts and register the new authenticator. Who wants to do that! Also, a few of my financial institutions do not offer a way to use a different MFA method than the one they approved. So, I am stuck with the one-offs. Unless I want to change to different financial institutions. But… who wants to do that!

There is one more problem with the use of so many tools. Very often the systems that are set up with MFA ask you to “enter the code from your MFA app”. The question is: which MFA app have I registered for that system? There is an argument whether having details about the app used to generate the code will compromise the security. Some companies provide that information, while others don’t. If I am allowed to use a single authenticator app for all my accounts, I don’t mind not having information on what tool have I registered. But in the current situation, giving me a hint is a requirement for usability. Anyway, if my authenticator app is compromised, not telling me what app to use will make absolutely no difference.

Another problem with the authenticator apps is that (and this is prevalent in technology) the app thinks it knows best how to name things. If I for example have two accounts for GitHub and want to use MFA, the authenticators will show them both as GitHub. What if I have ten? Which code should I use for which account is really hard to figure out.

The Problem With Switching Phones

Wait! Should I say: “The Problem With Losing Phones?” No, the problem with losing phones is next!

This is where the complexity of the MFA approach starts to show up! I don’t think any of the above-mentioned vendors really thought the whole user experience thoroughly through. I will add Duo to that list because I used it and it also sucks. Also, I will make a note here that colleagues recommended me Twillio’s Authy but I am already so deep in my current (diverse) ecosystem that I have no desire to try one more of those.

My wife (who is not in technology) has an iPhone 7 and she strongly refuses to change it because of all the trouble she needs to go through to set up her apps on a new phone. I spent a lot of time convincing her to set up at least SMS-based MFA for her bank and credit card accounts. And I think that will be the extend that she will go to. After I switched from iPhone 7 to the latest (and greatest) iPhone 13, I completely understand her fear. (As a side note: changing your phone every year is really, really, really bad for the environment. Be environmentally friendly and use your gadgets for a longer time! I have set a goal to use my phones and other gadgets for at least 5 years.) It has been a few months already and I continue to use some of the authenticators on my old phone because it is such a pain to migrate them to the new one.

Let me quickly go over the experience one by one.

Moving MFA Apps From Phone to Phone

Moving Apple MFA to my new phone was fluent. At the end of the day they are the kings of the experiences and this was expected. Moving WordPress and Facebook was also relatively simple – as long as you manage to sign in to your WordPress and Facebook accounts on your new phone, you will start getting the prompts there.

Moving the Lastpass Authenticator should have been easy but they really screwed up the flow between the actual Lastpass app and the authenticator app. I was clicking like crazy back and forth, going in circles for quite some time until something magical happened and it started working on my new phone. For the accounts where I used Entrust I had to go and register the new phone. Inconvenient, but at least I had a self-service. The problems started appearing when I got to VIP Access – I have to call my financial institution because they are the only ones that can register it. This will mean at least one hour on the phone.

Now, let’s get to the big ones!

Google Authenticator apparently has export functionality that allows you to export the seeds and import them in your new phone. If you know about that, it works like a charm too but… I just recently learned about it from Dave Bittner and Joe Carrigan from The Cyberwire.

Microsoft Authenticator should have been the easiest one (they claim). As long as you are signed in with your Microsoft Account, you should be able to get all the codes on your new phone. Well, king of! This works for other MFA accounts except for Microsoft work and school accounts. With all due respect to my colleagues from Azure Active Directory – the work and school account move sucks! You just need to go and register the new phone with those. Really disappointing!

“Insecure” MFA Methods When Switching Phones

Let me write about the non-secure ways to use MFA!

As I mentioned above, I also use email and SMS for certain accounts. The email experience also sucks! OK, I will be honest – this is certainly my own problem and few people may have this one but this is my rant so I will go with it. I have created many email accounts collected over time (about the reasons for that in some other post). One or another email account is used for MFA depending on what email address I’ve used for registration on a particular website. Those emails are synchronized to a single email account but… the synchronization is on schedule – about 30 mins or so. Now, every MFA code normally expires within 10 mins. Either, I miss the time window to enter the code or I need to login into that particular email account to get the code or I need to force the sync manually (yeah, I can do that but it is annoying. Right, Randall?!). Switching my phone has nothing to do with my emails so – there is no impact.

And the last one – SMS! Well, SMS doesn’t suck! You heard me! SMS doesn’t suck! … most of the time. Sometimes you will not get the text message on time due to networking issues but it works perfectly 99.999% of the time. Oh, and if I switch my phone, it continues to work without any additional configurations, QR codes, or calling my telco – like magic 😉

The Problem With Losing Your Phone

Now, here is where things get serious! Or serious if you use mobile apps for MFA. If you lose your phone, you are screwed. You will be locked out of all your accounts or most of them.

Apple is fine if you are in the Apple ecosystem. I have a few MacBooks, iPad, and AppleWatch – at least one of them will get me the MFA code.

With Facebook, I am screwed unless I am signed in to one of my browsers. For a person like me who uses Facebook once in a blue moon, the probability is low, so if I lose my phone, I am screwed. (Maybe that will push me over the edge to finally stop using them 🙂 ). I assume the WordPress story will be the same as with Facebook. Oh, and have you ever tried to get Facebook support on the phone to help you unlock your account? 🙂

About the other ones…

Backup Codes (or The Problem With Backup Codes)

Well, most of the systems allow you to print a bunch of backup codes that you can store “safely” so if you get locked out you can “easily” sign back in. I emphasize the words “safely” and “easily”. Here is why!

Storing Backup Codes Safely

Define “safely”! The experts recommend that you print your backup codes on paper and store them “safely” offline. I assume that they mean putting it in my safe deposit box at home, right? Because I will need to have easy access to those when I get locked out. It is questionable how safe is that because robberies are not uncommon. I had a colleague who got robbed and he found his safe deposit box in the bushes behind his house – of course, empty! Also, most of those safe deposit boxes are not fire- and waterproof. So, you need to buy a fireproof safe deposit box and cement it in your basement so no grown person (I mean teenager or older) can take it out with a crowbar.

Or, they mean to put it in the safe deposit box in the bank. Where there is security and the probability of robbery is minuscule. But then, I need to run to the bank every time I get locked out.

In both cases, this is not easy. From both a logistics point of view and a usability point of view. At the end of the day, what if I am on a trip and lose my phone (which is a quite realistic scenario).

To avoid all this hassle, most of us find some workarounds. Here are a few for you (and be honest and admit that you do those too):

  • Saving the backup codes as files and putting them in a folder on your laptop.
  • Copying the backup codes and storing them in your password manager (together with your password – how secure is that? 🙂 )
  • Saving the backup codes as files and keeping them on a thumb drive in your drawer.
  • Saving them as files on your Dropbox, OneDrive, Google Drive, or another cloud drive.
  • You see where I am going with those…

To be honest, I even didn’t bother printing/saving the backup codes for some of my accounts (and not all of the systems offer that option), which I assume many of us do.

Even if I print them and store them in my safe, I need to print details of what account they belong to and if they get stolen, all my accounts will be compromised.

Storing the Seeds in the Cloud

Some of the authenticators like Microsoft Authenticator keep your seeds in the cloud. Authe was recommended to me for the same reason. The idea is that if you lose your phone, you can sign in to your authenticator on your new phone and it will sync your seeds on the new phone. Magical, right? Yes, if you are able to sign in to the authenticator on your new phone… without your MFA code. So, you are caught in this vicious circle that if you lose your phone, you will need an MFA code to sign in to your authenticator but you have no way to get the MFA code.

The Solution (Backed by the Numbers)

What are you left with if you lose your phone? The only two MFA methods that work for a lost phone are email and SMS (because even if I lose my phone, I can easily keep my number). They are the most insecure ones but have the lowest risk to get you locked out from your accounts.

I am not promoting the use of SMS and email as the second factor for authentication. But the numbers show that majority of the users who use MFA use SMS instead of an app or a hardware key (see the Twitter report). Let’s run this simple math:

  • Twitter has about 396.5M users.
  • 2.3% (at the time of this writing) use MFA for their Twitter account. This is ~9.12M MFA users.
  • 0.5% of those 9.12M use a hardware key. This is just 456K hardware key users.
  • 30.9% of those 9.12M use an auth app. This is 2.82M auth app users.
  • 79.6% of those 9.12M use SMS. This is 7.26M SMS users.

It would be nice if Twitter had a way to break those numbers down by occupation (although it will be a violation of privacy). Pretty sure they will show that the majority of people who use an auth app or a hardware key work in technology. The normal users who deem their account important protect them with SMS because SMS offers the easiest user experience.

One more thing about SMS. Because everybody is scared to lock themselves out from their accounts, people set up their authenticator app as the primary MFA tool but then they have SMS as the backup. This way, if they lose their phone, they can still gain access to their account using SMS. But as we know, security is as strong as the weakest link – in this case the SMS. The setup of an authenticator app just gives the false illusion of security.

Summary

Using more than one factor for authentication is a MUST. Using stronger authenticators would be nice but with the current experience will be hard to achieve. To convince more people to do that, companies need to offer a much friendlier experience to their users:

  • Freedom to choose authenticator app
  • Self-service
  • Easy recovery

Without those, the usage will be at the current Twitter numbers.

MFA is yet another technology developed without the user in mind. But unfortunately, a technology that is at the core of cybersecurity. It is a shame that the security vendors continue to produce all kinds of new technologies (with fancy names like SOAR, SIEM, EDR, XDR, ML, AI) without fixing the basic user experience.

Friendly face on parachute

How often the following happens to you? You write your client code, you call an API, and receive a 404 Not found response. You start investigating the issue in your code; change a line here or there; spend hours troubleshooting just to find out that the issue is on the server-side, and you can’t do anything about it. Well, welcome to the microservices world! A common mistake I often see developers make is returning an improper response code or passing through the response code from another service.

Let’s see how we can avoid this. But first, a crash course on modern applications implemented with microservices and HTTP status response codes.

How Modern Microservices Applications Work?

I will try to avoid going deep into the philosophical reasons why we need microservices and the benefits (or disadvantages) of using them. This is not the point of this post.

We will start with a simple picture.

Microservices ApplicationAs you can see in the picture, we have a User that interacts with the Client Application that calls Microservice #1 to retrieve some information from the server (aka the cloud 🙂). The Client Application may need to call multiple (micro)services to retrieve all the information the User needs. Still, the part we will concentrate on is that Microservice #1 itself can call other services (Microservice #2 in this simple example) on the backend to perform its business logic. In a complex application (especially if not well architected), the chain of service calls may go well beyond two. But let’s stick with two for now. Also, let’s assume that Microservice #1 and Microservice #2 use REST, and their responses use the HTTP response status codes.

A basic call flow can be something like this. I also include the appropriate HTTP status response codes in each step.

  1. The User clicks on a button in the Client Application.
  2. The Client Application makes an HTTP request to Microservice #1.
  3. Microservice #1 needs additional business logic to complete the request and make an HTTP call to Microservice #2.
  4. Microservice #2 performs the additional business logic and responds to Microservice #1 using a 200 OK response code.
  5. Microservice #1 completes the business logic and responds to the Client Application with a 200 OK response code.
  6. The Client Application performs the action that is attached to the button, and the User is happy.

This is the so-called happy path. Everybody expects the flow to be executed as described above. If everything goes as planned, we don’t need to think anymore and implement the functionality behind the next button. Unfortunately, things often don’t go as planned.

What Can Go Wrong?

Many things! Or at a minimum, the following:

  1. The Client Application fails because of a bug before it even calls Microservice #1.
  2. The Client Application sends invalid input when calling Microservice #1.
  3. Microservice #1 fails before calling Microservice #2.
  4. Microservice #1 sends invalid input when calling Microservice #2.
  5. Microservice #2 fails while performing its business logic.
  6. Microservice #1 fails after calling Microservice #2.
  7. The Client Application fails after Microservice #1 responds.

For those cases (non-happy path? or maybe sad-path? 😉 ) the designers of the HTTP protocol wisely specified two separate sets of response codes:

The guidance for those is quite simple:

  • Client errors should be returned if the client did something wrong. In such cases, the client can change the parameters of the request and fix the issue. The important thing to remember is that the client can fix the issue without any changes on the server-side.
    A typical example is the famous 404 Not found error. If you (the user) mistype the URL path in the browser address bar, the browser (the client application) will request the wrong resource from the server (Microservice #1 in this case). The server (Microservice #1) will respond with a 404 Not found error to the browser (the client application) and the browser will show you “Oops, we couldn’t find the page” message. Well, in the past the browser just showed you the 404 Not found error but we learned a long time ago that this is not user-friendly (You see where I am going with this, right?).
  • Server errors should be returned if the issue occurred on the server-side and the client (and the user) cannot do anything to fix it.
    A simple example is a wrong connection string in the service configuration (Microservice #1 in our case). If the connection string used to configure Microservice #1 with the endpoint and credentials for Microservice #2 is wrong, the client application and the user cannot do anything to fix it. The most appropriate error to return in this case would be 500 Internal server error.

Pretty simple and logical, right? Though, one thing, we as engineers often forget, is who the client and who the server is.

So, Who Is the Client and Who Is the Server?

First, the client and server are two system components that interact directly with each other (think, no intermediaries). If we take the picture from above and change the labels of the arrows, it becomes pretty obvious.

Microservices Application Clients and Servers

We have three clients and three servers:

  • The user is a client of the client application, and the client application is a server for the user.
  • The client application is a client of Microservice #1, and Microservice #1 is a server for the client application.
  • Microservice #1 is a client of Microservice #2, and Microservice #2 is a server for Microservice #1.

Having this picture in mind, the engineers implementing each one of the microservices should think about the most appropriate response code for their immediate client using the guidelines above. It is better if we use examples to explain what response codes each service should return in different situations.

What HTTP Response Codes Should Microservices Return?

A few days ago I was discussing the following situation with one of our engineers. Our service, Azure Container Service (ACR), has a security feature allowing customers to encrypt their container images using customer-managed keys (CMK). For this feature to work, customers need to upload a key in Azure Key Vault (AKV). When the Docker client tries to pull an image, ACR retrieves the key from AKV, decrypts the image, and sends it back to the Docker client. (BTW, I know that ACR and AKV are not microservices 🙂 ) Here is a visual:

Docker pull encrypted image from ACR

In the happy-path scenario, everything works as expected. However, a customer submitted a support request complaining that he is not able to pull his images from ACR. When he tries to pull an image using the Docker client, he receives a 404 Not found error, but when he checks in the Azure Portal, he is able to see the image in the list.

Because the customer couldn’t figure it out by himself, he submitted a support request. The support engineer was also not able to figure out the issue, and had to escalate to the product group. It turned out that the customer deleted the Key Vault and ACR was not able to retrieve the key to decrypt the image. However, the implemented flow looked like this:

  1. Docker client requests an image from ACR.
  2. ACR sees that the image is encrypted and requests the key from the Key Vault.
  3. The Azure Key Vault service looks up the key and figures out that the key (or the whole Key Vault) is missing.
  4. Azure Key Vault returns 404 Not found to ACR for the key ACR tries to access.
  5. ACR passes through the 404 Not found error to the Docker client.
  6. Docker client shows a message to the user that the image cannot be found.

The end result: everybody is confused! Why?

Where the Confusion Comes From?

The investigation chain goes from left to right: Docker client –> ACR –> AKV. Both the customer and the support engineer were concentrated on figuring out why the image is missing in ACR. They were looking only at the Docker client –> ACR part of the chain. The customer’s assumption was that the Docker client is doing something wrong, i.e. requesting the wrong image. This would be the correct assumption because 404 Not found is a client error telling the client that is requesting something that doesn’t exist. Hence, the customer checked the portal and when he saw the image in the list, he was puzzled. The next assumption is that something is wrong on the ACR side. Here is where the customer decided to submit a support request for somebody to check if the data in ACR is corrupted. The support engineer checked the ACR backend and all the data was in sync.

This is a great example where the wrong HTTP response code can send the whole investigation into a rabbit hole. To avoid that, here is the guidance! Microservices should return response codes that are relevant to the business logic they implement and ones that help the client take appropriate actions. “Well”, you will say: “Isn’t that the whole point of HTTP status response codes?” It is! But for whatever reasons, we continue to break this rule. The key words in the above guidance are “the business logic they implement”, not the business logic of the services they call. (By the way, this is the same with exceptions. You don’t catch generic Exception, you catch SpecificException. You don’t pass through exceptions, you catch them and wrap them in a useful way for the calling code).

Business Logic and Friendly HTTP Response Codes

Think about the business logic of each one of the services above!

One way to decide which HTTP response code to return is to think about the resource your microservice is handling. ACR is the service responsible for handling the container images. The business logic that ACR implements should provide status codes relavant to the “business” of images. Azure Key Vault implement business logic that handles keys, secrets, and certificates (not images). Key Vault should return status codes that are relevant to the keys, secrets, and certificates. Azure Key Vault is a downstream service and cannot know what the key is used for, hence cannot provide details to the upstream client (Docker) what the error is. It is responsibility of the ACR to provide the approapriate status code to the upstream client.

Here is how the flow in the above scenario should be implemented:

  1. Docker client requests an image from ACR.
  2. ACR sees that the image is encrypted and requests the key from the Key Vault.
  3. The Azure Key Vault service looks up the key and figures out that the key (or the whole Key Vault) is missing.
  4. Azure Key Vault returns 404 Not found to ACR for the key ACR tries to access.
  5. ACR handles the 404 Not found from Azure Key Vault but wraps it in a error that is relevant to the requested image.
  6. Instead 404 Not found, ACR returns 500 Internal server error with a message clarifying the issue.
  7. Docker client shows a message to the user that it cannot pull the image because of an issue on the server.

The Q&A Approach

Another way that you can use to decide what response code to return is to take the Questions-and-Answers approach and build a simple IF-THEN logic (aka. decition tree). Here is how this can work for our example:

  • Docker: Pull image from ACR
    • ACR: Q: Is the image ivailable?
      • A: Yes
        (Note to myself: Requesting the image cannot be a client error anymore.)

        • Q: Is the image encrypted?
          • A: Yes
            • ACR: Request the key from Key Vault
              • AKV: Q: Is the key available?
                • A: Yes
                  • AKV: Return the key to ACR
                • A: No
                  • AKV: Return 404 [key] Not found error
            • ACR: Q: Did I get a key?
              • A: Yes
                • ACR: Decrypt the image
                • ACR: Return 200 OK with the image payload
              • A: No (I got 404 [key] Not found)
                • ACR: I cannot decrypt the image
                  (Note to myself: There is nothing the client did wrong! It is all the server fault)
                • ACR: Return 500 Internal server error “I cannot decrypt the image”
          • A: No (image is not encrypted)
            • ACR: Return 200 OK with the image payload
      • A: No (image does not exist)
        • ACR: Return 404 [image] Not found error

Note that the above flow is simplified. For example, in a real implementation, you may need to check if the client is authenticated and authorized to pull the image. Nevertheless, the concept is the same – you will just need to have more Q&As.

Summary

As you can see, it is important to be careful what HTTP response codes you return from your microservices. If you return the wrong message, you may end up with more work than you expect. Here are the main points that is worth remembering:

  • Return 400 errors only if the client can do something to fix the issue. If the client cannot do anything to fix it, 500 errors are the only appropriate ones.
  • Do not pass through the response codes you receive from upstream services. Handle each response from upstream services and wrap it according to the business logic you are implementing.
  • When implementing your services, think about the resource you are handling in those services. Return HTTP status response codes that are relevant to the resource you are handling.
  • Use the Q&A approach to decide what is the appropriate response code to return for your service and the resource that is requested by the client.

By using those guidelines, your microservices will become more friendly and easier to troubleshoot.

Featured image by Nick Page on Unsplash

In the last few months, I started seeing more and more customers using Azure Container Registry (or ACR) for storing their Helm charts. However, many of them are confused about how to properly push and use the charts stored in ACR. So, in this post, I will document a few things that need the most clarifications. Let’s start with some definitions!

Helm 2 and Helm 3 – what are those?

Before we even start!

Helm 2 is NOT supported and you should not use it! Period! If you need more details just read Helm’s blog post Helm 2 and the Charts Project Are Now Unsupported from Fri, Nov 13, 2020.

A nice date to choose for that announcement 🙂

OK, really – what are Helm 2 and Helm 3? When somebody says Helm 2 or Helm 3, they most often mean the version of the Helm CLI (i.e., Command Line Interface). The easiest way to check what version of the Helm CLI you have is to type:

$ helm version

in your Terminal. If the version is v2.x.x , then you have Helm (CLI) 2; if the version is v3.x.x then, you have Helm (CLI) 3. But, it is not that simple! You should also consider the API version. The API version is the version that you specify at the top of your chart (i.e. Chart.yaml) – you can read more about it in Helm Charts documentation. Here is a table that can come in handy for you:

wdt_ID apiVersion Helm 2 CLI Support Helm 3 CLI Support
1 v1 Yes Yes
2 v2 No Yes

What this table tells you is that Helm 2 CLI supports apiVersion V1, while Helm 3 CLI supports apiVersion V1 and V2. You should check the Helm Charts documentation linked above if you need more details about the differences, but the important thing to remember here is that Helm 3 CLI supports old charts, and (once again) there is no reason for you to use Helm 2.

We’ve cleared (I hope) the confusion around Helm 2 and Helm 3. Let’s see how ACR handles the Helm charts. For each one of those experiences, I will walk you step-by-step.

ACR and Helm 2

Azure Container Registry allows you to store Helm charts using the Helm 2 way (What? I will explain in a bit). However:

Helm 2 is NOT supported and you should not use it!

Now that you have been warned (twice! or was it three times?) let’s see how ACR handles Helm 2 charts. To avoid any ambiguity, I will use Helm CLI v2.17.0 for this exercise. At the time of this writing, it is the last published version of Helm 2.

$ helm version
Client: &version.Version{SemVer:"v2.17.0", GitCommit:"a690bad98af45b015bd3da1a41f6218b1a451dbe", GitTreeState:"clean"}

Initializing Helm and the Repository List

If you have a brand new installation of the Helm 2 CLI, you should initialize Helm and add the ACR to your repository list. You start with:

$ helm init
$ helm repo list
NAME     URL
stable   https://charts.helm.sh/stable
local    http://127.0.0.1:8879/charts

to initialize Helm and see the list of available repositories. Then you can add your ACR to the list by typing:

$ helm repo add --username <acr_username> --password <acr_password> <repo_name> https://<acr_login_server>/helm/v1/repo

For me, this looked like this:

$ helm repo add --username <myacr_username> --password <myacr_password> acrrepo https://tsmacrtestwus2acrhelm.azurecr.io/helm/v1/repo

Here is something very important: you must use the /helm/v1/repo path! If you do not specify the path you will see either 404: Not found error (for example, if you use the root URL without a path) or 403: Forbidden error (if you decide that you want to rename the repo part to something else).

I also need to make a side note here because the authentication can be a bit tricky. The following section applies to both Helm 2 and Helm 3.

Signing In To ACR Using the Helm (any version) CLI

Before you push and pull charts from ACR, you need to sign in. There are few different options that you can use to sign in to ACR using the CLI:

  • Using ACR Admin user (not recommended)
    If you have ACR Admin user enabled, you can use the Admin user username and password to sign in to ACR by simply specifying the --username and --password parameters for the Helm command.
  • Using a Service Principal (SP)
    If you need to push and pull charts using automation, you have most probably already set up a service principal for that. You can authenticate using the SP credentials by passing the app ID in the  --username and the client secret in the --password parameters for the Helm command. Make sure you assign the appropriate role to the service principal to allow access to your registry.
  • Using your own (user) credentials
    This one is the tricky one, and it is described in the ACR docs in az acr login with –expose-token section of the Authentication overview article. For this one, you must use the Azure CLI to obtain the token. Here are the steps:

    • Use the Azure CLI to sign in to Azure using your own credentials:
      $ az login

      This will pop up a browser window or give you an URL with a special code to use.

    • Next, sign in to ACR using the Azure CLI and add the --expose-token parameter:
      $ az acr login --name <acr_name_or_login_server> --expose-token

      This will sign you into ACR and will print an access token that you can use to sign in with other tools.

    • Last, you can sign in using the Helm CLI by passing a GUID-like string with zeros only (exactly this string 00000000-0000-0000-0000-000000000000) in the  --username parameter and the access token in the --password parameter. Here is how the command to add the Helm repository will look like:
      $ helm repo add --username "00000000-0000-0000-0000-000000000000" --password "eyJhbGciOiJSUzI1NiIs[...]24V7wA" <repo_name> https://<acr_login_server>/helm/v1/repo

Creating and Packaging Charts with Helm 2 CLI

Helm 2 doesn’t have out-of-the-box experience for pushing charts to a remote chart registry. You may wrongly assume that the helm-push plugin is the one that does that, but you will be wrong. This plugin will only allow you to push charts to Chartmuseum (although I can use it to try to push to any repo but will fail – a topic for another story). Helm’s guidance on how chart repositories should work is described in the documentation (… and this is the Helm 2 way that I mentioned above):

  • According to Chart Repositories article in Helm documentation, the repository is a simple web server that serves the index.yaml file that points to the chart TAR archives. The TAR archives can be served by the same web server or from other locations like Azure Storage.
  • In Store charts in your chart repository they describe the process to generate the index.yaml file and how to upload the necessary artifacts to static storage to serve them.

Disclaimer: the term Helm 2 way is my own term based on my interpretation of how things work. It allows me to refer to the two different approaches charts are saved. It is not an industry term not something that Helm refers to or uses.

I have created a simple chart called helm-test-chart-v2 on my local machine to test the push. Here is the output from the commands:

$ $ helm create helm-test-chart-v2
Creating helm-test-chart-v2

$ ls -al ./helm-test-chart-v2/
total 28
drwxr-xr-x 4 azurevmuser azurevmuser 4096 Aug 16 16:44 .
drwxr-xr-x 6 azurevmuser azurevmuser 4096 Aug 17 16:29 ..
-rw-r--r-- 1 azurevmuser azurevmuser 342 Aug 16 16:44 .helmignore
-rw-r--r-- 1 azurevmuser azurevmuser 114 Aug 16 16:44 Chart.yaml
drwxr-xr-x 2 azurevmuser azurevmuser 4096 Aug 16 16:44 charts
drwxr-xr-x 3 azurevmuser azurevmuser 4096 Aug 16 16:44 templates
-rw-r--r-- 1 azurevmuser azurevmuser 1519 Aug 16 16:44 values.yaml

$ helm package ./helm-test-chart-v2/
Successfully packaged chart and saved it to: /home/azurevmuser/helm-test-chart-v2-0.1.0.tgz

$ ls -al
total 48
drwxr-xr-x 6 azurevmuser azurevmuser 4096 Aug 17 16:31 .
drwxr-xr-x 3 root root 4096 Aug 14 14:12 ..
-rw------- 1 azurevmuser azurevmuser 780 Aug 15 22:48 .bash_history
-rw-r--r-- 1 azurevmuser azurevmuser 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 azurevmuser azurevmuser 3771 Feb 25 2020 .bashrc
drwx------ 2 azurevmuser azurevmuser 4096 Aug 14 14:15 .cache
drwxr-xr-x 6 azurevmuser azurevmuser 4096 Aug 15 21:46 .helm
-rw-r--r-- 1 azurevmuser azurevmuser 807 Feb 25 2020 .profile
drwx------ 2 azurevmuser azurevmuser 4096 Aug 14 14:12 .ssh
-rw-r--r-- 1 azurevmuser azurevmuser 0 Aug 14 14:18 .sudo_as_admin_successful
-rw------- 1 azurevmuser azurevmuser 1559 Aug 14 14:26 .viminfo
drwxr-xr-x 4 azurevmuser azurevmuser 4096 Aug 16 16:44 helm-test-chart-v2
-rw-rw-r-- 1 azurevmuser azurevmuser 3269 Aug 17 16:31 helm-test-chart-v2-0.1.0.tgz

Because Helm 2 doesn’t have a push chart functionality, the implementation is left up to the vendors. ACR has provided proprietary implementation (already deprecated, which is another reason to not use Helm 2) of the push chart functionality that is built into the ACR CLI.

Pushing and Pulling Charts from ACR Using Azure CLI (Helm 2)

Let’s take a look at how you can push Helm 2 charts to ACR using the ACR CLI. First, you need to sign in to Azure, and then to your ACR. Yes, this is correct; you need to use two different commands to sign into the ACR. Here is how this looks like for my ACR registry:

$ az login
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code AABBCCDDE to authenticate.
[
  {
    "cloudName": "AzureCloud",
    "homeTenantId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
    "id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
    "isDefault": true,
    "managedByTenants": [],
    "name": "ToddySM Sandbox",
    "state": "Enabled",
    "tenantId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
    "user": {
        "name": "toddysm_XXXXXXXX@outlook.com",
        "type": "user"
    }
  }
]
$ az acr login --name tsmacrtestwus2acrhelm.azurecr.io
The login server endpoint suffix '.azurecr.io' is automatically omitted.
You may want to use 'az acr login -n tsmacrtestwus2acrhelm --expose-token' to get an access token, which does not require Docker to be installed.
An error occurred: DOCKER_COMMAND_ERROR
Please verify if Docker client is installed and running.

Well, I do not have Docker running, but this is OK – you don’t need Docker installed for pushing the Helm chart. Though, it may be confusing because it leaves the impression that you may not be signed in to the ACR registry.

We will push the chart that we packaged already. Pushing it is done with the (deprecated) built-in command in ACR CLI. Here is the output:

$ az acr helm push --name tsmacrtestwus2acrhelm.azurecr.io helm-test-chart-v2-0.1.0.tgz
This command is implicitly deprecated because command group 'acr helm' is deprecated and will be removed in a future release. Use 'helm v3' instead.
The login server endpoint suffix '.azurecr.io' is automatically omitted.
{
  "saved": true
}

This seems to be successful, and I have a Helm chart pushed to ACR using the Helm 2 way (i.e. using the proprietary and deprecated ACR CLI implementation). The problem here is that it is hard to verify that the chart is pushed to the ACR registry. If you go to the portal, you will not see the repository that contains the chart. Here is a screenshot of my registry view in the Azure portal after I pushed the chart:

Azure Portal Not Listing Helm 2 ChartsAs you can see, the Helm 2 chart repository doesn’t appear in the list of repositories in the Azure portal, and you will not be able to browse the charts in that repository using the Azure portal. However, if you use the Helm command to search for the chart, the result will include the ACR repository. Here is the output from the command in my environment:

$ helm search helm-test-chart-v2
NAME                           CHART VERSION        APP VERSION        DESCRIPTION
acrrepo/helm-test-chart-v2     0.1.0                1.0                A Helm chart for Kubernetes
local/helm-test-chart-v2       0.1.0                1.0                A Helm chart for Kubernetes

Summary of the ACR and Helm 2 Experience

To summarize the ACR and Helm 2 experience, here are the main takeaways:

  • First, you should not use Helm 2 CLI and the proprietary ACR CLI implementation for working with Helm charts!
  • There is no push functionality for charts in the Helm 2 client and each vendor is implementing their own CLI for pushing charts to the remote repositories.
  • When you add ACR repository using the Helm 2 CLI you should use the following URL format https://<acr_login_server>/helm/v1/repo
  • If you push a chart to ACR using the ACR CLI implementation you will not see the chart in Azure Portal. The only way to verify that the chart is pushed to the ACR repository is to use the helm search command.

ACR and Helm 3

Once again, to avoid any ambiguity, I will use Helm CLI v3.6.2 for this exercise. Here is the complete version string:

PS C:> helm version
version.BuildInfo{Version:"v3.6.2", GitCommit:"ee407bdf364942bcb8e8c665f82e15aa28009b71", GitTreeState:"clean", GoVersion:"go1.16.5"}

Yes, I run this one in PowerShell terminal 🙂 And, of course, not in the root folder 😉 You can convert the commands to the corresponding Linux commands and prompts.

Let’s start with the basic thing!

Creating and Packaging Charts with Helm 3 CLI

There is absolutely no difference between the Helm 2 and Helm 3 experience for creating and packaging a chart. Here is the output:

PS C:> helm create helm-test-chart-v3
Creating helm-test-chart-v3

PS C:> ls .\helm-test-chart-v3\

    Directory: C:\Users\memladen\Documents\Development\Local\helm-test-chart-v3

Mode         LastWriteTime         Length         Name
----         -------------         ------         ----
d----    8/17/2021 9:42 PM                        charts
d----    8/17/2021 9:42 PM                        templates
-a---    8/17/2021 9:42 PM            349         .helmignore
-a---    8/17/2021 9:42 PM           1154         Chart.yaml
-a---    8/17/2021 9:42 PM           1885         values.yaml

PS C:> helm package .\helm-test-chart-v3\
Successfully packaged chart and saved it to: C:\Users\memladen\Documents\Development\Local\helm-test-chart-v3-0.1.0.tgz

PS C:> ls helm-test-*

    Directory: C:\Users\memladen\Documents\Development\Local

Mode         LastWriteTime         Length         Name
----         -------------         ------         ----
d----    8/17/2021 9:42 PM                        helm-test-chart-v3
-a---    8/17/2021 9:51 PM           3766         helm-test-chart-v3-0.1.0.tgz

From here on, though, things can get confusing! The reason is that you have two separate options to work with charts using Helm 3.

Using Helm 3 to Push and Pull Charts the Helm 2 Way

You can use Helm 3 to push the charts the same way you do that with Helm 2. First, you add the repo:

PS C:> helm repo add --username <myacr_username> --password <myacr_password> acrrepo https://tsmacrtestwus2acrhelm.azurecr.io/helm/v1/repo
"acrrepo" has been added to your repositories

PS C:> helm repo list
NAME         URL
microsoft    https://microsoft.github.io/charts/repo
acrrepo      https://tsmacrtestwus2acrhelm.azurecr.io/helm/v1/repo

Then, you can update the repositories and search for a chart:

PS C:> helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "microsoft" chart repository
...Successfully got an update from the "acrrepo" chart repository
Update Complete. ⎈Happy Helming!⎈

PS C:> helm search repo helm-test-chart
NAME                          CHART VERSION     APP VERSION     DESCRIPTION
acrrepo/helm-test-chart-v2    0.1.0             1.0             A Helm chart for Kubernetes

Ha, look at that! I can see the chart that I pushed using the ACR CLI in the ACR and Helm 2 section above – notice the chart name and the version. Also, notice that the Helm 3 search command has a bit different syntax – it wants you to clarify what you want to search (repo in our case).

I can use the ACR CLI to push the new chart that I just created using the Helm 3 CLI (after signing in to Azure):

PS C:> az acr helm push --name tsmacrtestwus2acrhelm.azurecr.io .\helm-test-chart-v3-0.1.0.tgz
This command is implicitly deprecated because command group 'acr helm' is deprecated and will be removed in a future release. Use 'helm v3' instead.
The login server endpoint suffix '.azurecr.io' is automatically omitted.
{
  "saved": true
}

By doing this, I have pushed the V3 chart to ACR and can pull it from there but, remember, this is the Helm 2 Way and the following are still true:

  • You will not see the chart in Azure Portal.
  • The only way to verify that the chart is pushed to the ACR repository is to use the helm search command.

Here is the result of the search command after updating the repositories:

PS C:> helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "microsoft" chart repository
...Successfully got an update from the "acrrepo" chart repository
Update Complete. ⎈Happy Helming!⎈

PS C:> helm search repo helm-test-chart
NAME                           CHART VERSION     APP VERSION     DESCRIPTION
acrrepo/helm-test-chart-v2     0.1.0             1.0             A Helm chart for Kubernetes
acrrepo/helm-test-chart-v3     0.1.0             1.16.0          A Helm chart for Kubernetes

You can see both charts, the one created with Helm 2 and the one created with Helm 3, available. This is understandable though because I pushed both charts the same way – by using the az acr helm command. Remember, though – both charts are stored in ACR using the Helm 2 way.

Using Helm 3 to Push and Pull Charts the OCI Way

Before proceeding, I changed the version property in the Chart.yaml to 0.2.0 to be able to differentiate between the charts I pushed. This is the same chart that I created in the previous section Creating and Packaging Charts with Helm 3 CLI.

You may have noticed that Helm 3 has a new chart command. This command allows you to (from the help text) “push, pull, tag, list, or remove Helm charts”. The subcommands under the chart command are experimental and you need to set the HELM_EXPERIMENTAL_OCI environment variable to be able to use them. Once you do that, you can save the chart. You can save the chart to the local registry cache with or without a registry FQDN (Fully Qualified Domain Name). Here are the commands:

PS C:> $env:HELM_EXPERIMENTAL_OCI=1

PS C:> helm chart save .\helm-test-chart-v3\ helm-test-chart-v3:0.2.0
ref:     helm-test-chart-v3:0.2.0
digest:  b6954fb0a696e1eb7de8ad95c59132157ebc061396230394523fed260293fb19
size:    3.7 KiB
name:    helm-test-chart-v3
version: 0.2.0
0.2.0: saved
PS C:> helm chart save .\helm-test-chart-v3\ tsmacrtestwus2acrhelm.azurecr.io/charts/helm-test-chart-v3:0.2.0
ref:     tsmacrtestwus2acrhelm.azurecr.io/charts/helm-test-chart-v3:0.2.0
digest:  ff5e9aea6d63d7be4bb53eb8fffacf12550a4bb687213a2edb07a21f6938d16e
size:    3.7 KiB
name:    helm-test-chart-v3
version: 0.2.0
0.2.0: saved

If you list the charts using the new chart command, you will see the following:

PS C:> helm chart list
REF                                                                  NAME                   VERSION     DIGEST     SIZE     CREATED
helm-test-chart-v3:0.2.0                                             helm-test-chart-v3     0.2.0       b6954fb    3.7 KiB  About a minute
tsmacrtestwus2acrhelm.azurecr.io/charts/helm-test-chart-v3:0.2.0     helm-test-chart-v3     0.2.0       ff5e9ae    3.7 KiB  About a minute

Few things to note here:

  • Both charts are saved in the local registry cache. Nothing is pushed yet to a remote registry.
  • You see only charts that are saved the OCI way. The charts saved the Helm 2 way are not listed using the helm chart list command.
  • The REF (or reference) for a chart can be truncated and you may not be able to see the full reference.

Let’s do one more thing! Let’s save the same chart with FQDN for the ACR as above but under a different repository. Here are the commands to save and list the charts:

PS C:> helm chart save .\helm-test-chart-v3\ tsmacrtestwus2acrhelm.azurecr.io/my-helm-charts/helm-test-chart-v3:0.2.0
ref: tsmacrtestwus2acrhelm.azurecr.io/my-helm-charts/helm-test-chart-v3:0.2.0
digest: daf106a05ad2fe075851a3ab80f037020565c75c5be06936179b882af1858e6a
size: 3.7 KiB
name: helm-test-chart-v3
version: 0.2.0
0.2.0: saved

PS C:> helm chart list
REF                                                                  NAME                   VERSION     DIGEST     SIZE     CREATED
helm-test-chart-v3:0.2.0                                             helm-test-chart-v3     0.2.0       b6954fb    3.7 KiB  11 minutes
tsmacrtestwus2acrhelm.azurecr.io/charts/helm-test-chart-v3:0.2.0     helm-test-chart-v3     0.2.0       ff5e9ae    3.7 KiB  11 minutes
tsmacrtestwus2acrhelm.azurecr.io/my-helm-charts/helm-test-chart...   helm-test-chart-v3     0.2.0       daf106a    3.7 KiB  About a minute

After doing this, we have three charts in the local registry:

  • helm-test-chart-v3:0.2.0 that is available only locally.
  • tsmacrtestwus2acrhelm.azurecr.io/charts/helm-test-chart-v3:0.2.0 that can be pushed to the remote ACR registry tsmacrtestwus2acrhelm.azurecr.io and saved in the charts repository.
  • and tsmacrtestwus2acrhelm.azurecr.io/my-helm-charts/helm-test-chart-v3:0.2.0 that can be pushed to the remote ACR registry tsmacrtestwus2acrhelm.azurecr.io and saved in the my-helm-charts repository.

Before we can push the charts to the ACR registry, we need to sign in using the following command:

PS C:> helm registry login tsmacrtestwus2acrhelm.azurecr.io --username <myacr_username> --password <myacr_password>

You can use any of the sign-in methods described in Signing in to ACR Using the Helm CLI section. And make sure you use your own ACR registry login server.

If we push the two charts that have the ACR FQDN, we will see them appear in the Azure portal UI. Here are the commands:

PS C:> helm chart push tsmacrtestwus2acrhelm.azurecr.io/charts/helm-test-chart-v3:0.2.0
The push refers to repository [tsmacrtestwus2acrhelm.azurecr.io/charts/helm-test-chart-v3]
ref: tsmacrtestwus2acrhelm.azurecr.io/charts/helm-test-chart-v3:0.2.0
digest: daf106a05ad2fe075851a3ab80f037020565c75c5be06936179b882af1858e6a
size: 3.7 KiB
name: helm-test-chart-v3
version: 0.2.0
0.2.0: pushed to remote (1 layer, 3.7 KiB total)

PS C:> helm chart push tsmacrtestwus2acrhelm.azurecr.io/my-helm-charts/helm-test-chart-v3:0.2.0
The push refers to repository [tsmacrtestwus2acrhelm.azurecr.io/my-helm-charts/helm-test-chart-v3]
ref: tsmacrtestwus2acrhelm.azurecr.io/my-helm-charts/helm-test-chart-v3:0.2.0
digest: daf106a05ad2fe075851a3ab80f037020565c75c5be06936179b882af1858e6a
size: 3.7 KiB
name: helm-test-chart-v3
version: 0.2.0
0.2.0: pushed to remote (1 layer, 3.7 KiB total)

And here is the result:

An important thing to note here is that:

  • Helm charts saved to ACR using the OCI way will appear in the Azure portal.

The approach here is a bit different than the Helm 2 way. You don’t need to package the chart into a TAR – saving the chart to the local registry is enough.

We need to do one last thing and we are ready to summarize the experience. Let’s use the helm search command to find our charts (of course using Helm 3). Here is the result of the search:

PS C:> helm search repo helm-test-chart 
NAME                           CHART VERSION     APP VERSION     DESCRIPTION 
acrrepo/helm-test-chart-v2     0.1.0             1.0             A Helm chart for Kubernetes 
acrrepo/helm-test-chart-v3     0.1.0             1.16.0          A Helm chart for Kubernetes

It yields the same result like the one we saw in Using Helm 3 to Push and Pull Charts the Helm 2 Way. The reason is that the helm search command doesn’t work for charts stored the OCI way. This is one limitation that the Helm team is working on fixing and is documented in Support for OCI registries in helm search #9983 issue on GitHub.

Summary of the ACR and Helm 3 Experience

To summarize the ACR and Helm 3 experience, here are the main takeaways:

  • First, you can use the Helm 3 CLI in conjunction with the az acr helm command to push and pull charts the Helm 2 way. Those charts will not appear in the Azure portal.
  • You can also use the Helm 3 CLI to (natively) push charts to ACR the OCI way. Those charts will appear in the Azure portal.
  • OCI features are experimental in the Helm 3 client and certain functionalities like helm search and helm repo do not work for charts saved and pushed the OCI way.

Conclusion

To wrap it up, when working with Helm charts and ACR (as well as other OCI compliant registries), you need to be careful which commands you use. As a general rule, always use the Helm 3 CLI and make a conscious decision whether you want to store the charts as OCI artifacts (the OCI way) or using the legacy Helm approach (the Helm 2 way). This should be a transition period and hopefully, at some point in the future, Helm will improve the support for OCI compliant charts and support the same scenarios that are currently enabled for legacy chart repositories.

Here is a summary table that gives a quick overview of what we described in this post.

wdt_ID Functionality Helm 2 CLI (legacy) Helm 3 (legacy) Helm 3 (OCI)
1 helm add repo Yes Yes No
2 helm search Yes Yes No
3 helm chart push No No Yes
4 helm chart list No No Yes
5 az acr helm push Yes Yes No
6 Chart appears in Azure portal No No Yes
7 Example chart helm-test-chart-v2 helm-test-chart-v3 helm-test-chart-v3
8 Example chart version 0.1.0 0.1.0 0.2.0

In the last two posts Configuring a hierarchy of IoT Edge Devices at Home Part 1 – Configuring the IT Proxy and Configuring a hierarchy of IoT Edge Devices at Home Part 2 – Configuring the Enterprise Network (IT) we have set up the proxy and the top layer of the hierarchical IoT Edge network. This post will describe how to set up and configure the first nested layer in the Purdue network architecture for manufacturing – the Business Planning and Logistics (IT) layer. So, let’s get going!

A lot of the steps to configure the IoT Edge runtime on the device are repeating from the previous post. Nevertheless, though, let’s go over them again.

Configuring the Network for Layer 4 Device

The tricky part with the Layer 4 device is that it should have no Internet access. The problem with that is that you will need Internet access to install the IoT Edge runtime. In a typical scenario, the Layer 4 device will come with the IoT Edge runtime pre-installed and you will only need to plug in the device and configure it to talk to its parent. For our scenario though, we need to have the device Internet-enabled while installing the IoT Edge and lock it down after that. Here is the initial dhcpcd.conf configuration for the Layer 4 device:

interface eth0
static ip_address=10.16.6.4/16
static routers=10.16.8.4
static domain_name_servers=1.1.1.1

The only difference in the network configuration is the IP address of the device. This configuration will allow us to download the necessary software to the device before we restrict its access.

While speaking of necessary software, let’s install the netfilter-persistent and iptables-persistent and have them ready for locking the device networking later on:

sudo DEBIAN_FRONTEND=noninteractive apt install -y netfilter-persistent iptables-persistent

Create the Azure Cloud Resources

We already have created the resource group and the IoT Hub resource in Azure. The only new resource that we need to create is the IoT Edge device for Layer 4. One difference here is that, unlike the Layer 5 device, the Layer 4 device will have a parent. And this parent will be the Layer 5 device. Use the following command to create the Layer 4 device

az iot hub device-identity create --device-id L4-edge-pi --edge-enabled --hub-name tsm-ioth-eus-test
az iot hub device-identity parent set --device-id L4-edge-pi --parent-device-id L5-edge-pi --hub-name tsm-ioth-eus-test
az iot hub device-identity connection-string show --device-id L5-edge-pi --hub-name tsm-ioth-eus-test

The first command creates the IoT Edge device for Layer 4. The second command sets the Layer 5 device as the parent for the Layer 4 device. And, the third command returns the connection string for the Layer 4 device that we can use for the IoT Edge runtime configuration.

Installing Azure IoT Edge Runtime on the Layer 4 Device

Surprisingly to me, it the time between posting my last post and starting this one, the Tutorial to Create a Hierarchy of IoT Edge Devices changed. I will still use the steps from my previous post to provide consistency.

Creating Certificates

We will start again with the creation of certificates – this time for the Layer 4 device. We already have the root and the intermediate certificates for the chain. The only certificate that we need to create is the Layer 4 device certificate. We will use the following command for that:

./certGen.sh create_edge_device_ca_certificate "l4_certificate"

You should see the new certificates in the <WORKDIR>/certs folder:

drwxrwxrwx 1 toddysm toddysm 4096 Apr 16 10:46 .
drwxrwxrwx 1 toddysm toddysm 4096 Apr 16 10:46 ..
...
-rwxrwxrwx 1 toddysm toddysm 5891 Apr 16 10:46 iot-edge-device-ca-l4_certificate-full-chain.cert.pem
-r-xr-xr-x 1 toddysm toddysm 1931 Apr 16 10:46 iot-edge-device-ca-l4_certificate.cert.pem
-rwxrwxrwx 1 toddysm toddysm 7240 Apr 16 10:46 iot-edge-device-ca-l4_certificate.cert.pfx
...

The private keys are in the <WORKDIR>/private folder together with the root and intermediate keys:

drwxrwxrwx 1 toddysm toddysm 4096 Apr 16 10:46 .
drwxrwxrwx 1 toddysm toddysm 4096 Apr 16 10:46 ..
-r-xr-xr-x 1 toddysm toddysm 3326 Apr  8 14:54 azure-iot-test-only.intermediate.key.pem
-r-xr-xr-x 1 toddysm toddysm 3326 Apr  8 14:54 azure-iot-test-only.root.ca.key.pem
-r-xr-xr-x 1 toddysm toddysm 3243 Apr 16 10:46 iot-edge-device-ca-l4_certificate.key.pem
...

As before, we need to upload the relevant certificates to the Layer 4 device. From the <WORKDIR> folder on the workstation use the following commands:

scp ./certs/azure-iot-test-only.root.ca.cert.pem pi@pi-l4:.
scp ./certs/iot-edge-device-ca-l4_certificate-full-chain.cert.pem pi@pi-l4:.
scp ./private/iot-edge-device-ca-l4_certificate.key.pem pi@pi-l4:.

Connect to the Layer 4 device with ssh pi@pi-l4 and install the root CA using:

sudo cp ~/azure-iot-test-only.root.ca.cert.pem /usr/local/share/ca-certificates/azure-iot-test-only.root.ca.cert.pem.crt
sudo update-ca-certificates

Verify that the cert is installed with:

ls /etc/ssl/certs/ | grep azure

We will also move the device certificate chain and the private key to the /var/secrets folder:

sudo mkdir /var/secrets
sudo mkdir /var/secrets/aziot
sudo mv iot-edge-device-ca-l4_certificate-full-chain.cert.pem /var/secrets/aziot/
sudo mv iot-edge-device-ca-l4_certificate.key.pem /var/secrets/aziot/

We are all set to install and configure the Azure IoT Edge runtime.

Configuring the IoT Edge Runtime

As mentioned before the steps are described in Install or uninstall Azure IoT Edge for Linux and here is the configuration that we need to use in /etc/aziot/config.tomlfile:

hostname = "10.16.6.4"

parent_hostname = "10.16.7.4"

trust_bundle_cert = "file:///etc/ssl/certs/azure-iot-test-only.root.ca.cert.pem.pem"

[provisioning]
source = "manual"
connection_string = "HostName=tsm-ioth-eus-test.azure-devices.net;DeviceId=L4-edge-pi;SharedAccessKey=s0m#VeRYcRYpT1c$tR1nG"

[agent]
name = "edgeAgent"
type = "docker"
[agent.config]
image = "10.16.7.4:<port>/azureiotedge-agent:1.2"

[edge_ca]
cert = "file:///var/secrets/aziot/iot-edge-device-ca-l4_certificate-full-chain.cert.pem"
pk = "file:///var/secrets/aziot/iot-edge-device-ca-l4_certificate.key.pem"

Note two differences in this configuration compared to the configuration for the Layer 5 device:

  1. There is a parent_hostname configuration that has the Layer 5 device’s IP address as a value.
  2. The agent image is pulled from the parent registry 10.16.7.4:<port>/azureiotedge-agent:1.2 instead from mcr.microsoft.com (don’t forget to remove/change the <port> in the configuration depending on how you set up the parent registry).

Also, if the registry running on your Layer 5 device requires authentication, you will need to provide credentials in the [agent.config.auth] section.

Restricting the Network for the Layer 4 Device

Now, that we have the IoT Edge runtime configured, we need to restrict the traffic to and from the Layer 4 device. Here is what are we going to do:

  • We will allow SSH connectivity only from the IP addresses in the Demo/Support network, i.e. where our workstation is. However, we will use a stricter CIDR to allow only  10.16.9.x addresses. Here are the commands that will make that configuration:
    sudo iptables -A INPUT -i eth0 -p tcp --dport 22 -s 10.16.9.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    sudo iptables -A OUTPUT -p tcp --sport 22 -d 10.16.9.0/24 -m state --state ESTABLISHED -j ACCEPT
  • We will enable connectivity on the following ports 8080 (or whatever the registry port is), 8883, 443, and 5671 to the Layer 5 network only, i.e. 10.16.7.x addresses. Here the commands that will make that configuration:
    sudo iptables -A OUTPUT -p tcp --dport <registry-port> -d 10.16.7.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    sudo iptables -A INPUT -i eth0 -p tcp --sport <registry-port> -s 10.16.7.0/24 -m state --state ESTABLISHED -j ACCEPT
    sudo iptables -A OUTPUT -p tcp --dport 8883 -d 10.16.7.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    sudo iptables -A INPUT -i eth0 -p tcp --sport 8883 -s 10.16.7.0/24 -m state --state ESTABLISHED -j ACCEPT
    sudo iptables -A OUTPUT -p tcp --dport 5671 -d 10.16.7.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    sudo iptables -A INPUT -i eth0 -p tcp --sport 5671 -s 10.16.7.0/24 -m state --state ESTABLISHED -j ACCEPT
    sudo iptables -A OUTPUT -p tcp --dport 443 -d 10.16.7.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    sudo iptables -A INPUT -i eth0 -p tcp --sport 443 -s 10.16.7.0/24 -m state --state ESTABLISHED -j ACCEPT
    
  • We will also enable connectivity on the following ports 8080 (or whatever the registry port is), 8883, 443, and 5671 from the OT Proxy network only, i.e. 10.16.5.x addresses. Here the commands that will make that configuration:
    sudo iptables -A OUTPUT -p tcp --dport <registry-port> -d 10.16.5.0/24 -m state --state ESTABLISHED -j ACCEPT
    sudo iptables -A INPUT -i eth0 -p tcp --sport <registry-port> -s 10.16.5.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    sudo iptables -A OUTPUT -p tcp --dport 8883 -d 10.16.5.0/24 -m state --state ESTABLISHED -j ACCEPT
    sudo iptables -A INPUT -i eth0 -p tcp --sport 8883 -s 10.16.5.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    sudo iptables -A OUTPUT -p tcp --dport 5671 -d 10.16.5.0/24 -m state --state ESTABLISHED -j ACCEPT
    sudo iptables -A INPUT -i eth0 -p tcp --sport 5671 -s 10.16.5.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    sudo iptables -A OUTPUT -p tcp --dport 443 -d 10.16.5.0/24 -m state --state ESTABLISHED -j ACCEPT
    sudo iptables -A INPUT -i eth0 -p tcp --sport 443 -s 10.16.5.0/24 -m state --state NEW,ESTABLISHED -j ACCEPT
    
  • Last, we need to disable any other traffic to and from the Layer 4 device. Here the commands for that:
    sudo iptables -P INPUT DROP
    sudo iptables -P OUTPUT DROP
    sudo iptables -P FORWARD DROP
    

Don’t forget to save the configuration and restart the device with:

sudo netfilter-persistent save
sudo systemctl reboot

Deploying Modules on the Layer 4 Device

Similar to the deployment of modules on the Layer 5 Device, we need to deploy the standard $edgeAgent and $edgeHub modules as well as a registry module. For this layer, I experimented with the API proxy module. Hence, the configurations include also the $IoTEdgeApiProxy module. There is also an API Proxy configuration that you need to set on the $IoTEdgeApiProxy module’s twin – for details, take a look at Configure the API proxy module for your gateway hierarchy scenario. Here are the Gists that you can use for deploying the modules on the Layer 4 device:

By using the API proxy, you can also leverage the certificates deployed on the device to connect to the registry over HTTPS. Thus, you don’t need to configure Docker on the clients to connect to an insecured registry.

Testing the Registry Module

To test the registry module, use the following commands:

docker login -u <sync_token_name> -p <sync_token_password> 10.16.6.4:<port_if_any>
docker pull 10.16.6.4:<port_if_any>/azureiotedge-simulated-temperature-sensor:1.0

Now, we have set up the first nested layer (Layer 4) for our nested IoT Edge infrastructure. In the next post, I will describe the steps to set up the OT Proxy and the second nested layer for IoT Edge (Layer 3).

Image by ejaugsburg from Pixabay

In my last post, I started explaining how to configure the IoT Edge device hierarchy’s IT Proxy. This post will go one layer down and set up the Layer 5 device from the Purdue model for manufacturing networks.

Reconfiguring The Network

While implementing network segregation in the cloud is relatively easy, implementing it with a limited number of devices and a consumer-grade network switch requires a bit more design. In Azure, the routing between subnets within a VNet is automatically configured using the subnet gateways. In my case, though, the biggest challenge was connecting each Raspberry Pi device to two separate subnets using the available interfaces – one WiFi and one Ethernet. In essence, I couldn’t use the WiFi interface because then I was limited to my Eero’s (lack of) capabilities (Well, I have an idea how to make this work, but this will be a topic of a future post :)). Thus, my only option was to put all devices in the same subnet and play with the firewall on each device to restrict the traffic. Here is how the picture looks like.

To be able to connect to each individual device from my laptop (i.e. playing the role of the jumpbox from the Azure IoT Edge for Industrial IoT sample), I had to configure a second network interface on it and give it an IP address from the 10.16.0.0/16 network (the Workstation in the picture above). There are multiple ways to do that with easiest one to buy a USB networking dongle and connect it to the switch with the rest of the devices. One more thing that will be helpful to do to speed up the work is to edit the /etc/hosts file on my laptop and add DNS names for each of the devices:

10.16.8.4       pi-itproxy
10.16.7.4       pi-l5
10.16.6.4       pi-l4
10.16.5.4       pi-otproxy
10.16.4.4       pi-l3
10.16.3.4       pi-opcua

The next thing I had to do is to go back to the IT Proxy and change the subnet mask to enable a broader addressing range. Connect to the IT Proxy device that we configured in Configuring a hierarchy of IoT Edge Devices at Home Part 1 – Configuring the IT Proxy using ssh pi@10.16.8.4. Then edit the DHCP configuration file with sudo vi /etc/dhcpcd.conf and change the subnet mask from /24 to /16:

interface eth0
static ip_address=10.16.8.4/16

Now, the IT Proxy is configured to address the broader 10.16.0.0/16 network. To restrict the communication between the devices, we will configure each individual device’s firewall.

Now, getting back to the Layer 5 configuration. Normally, the Layer 5 device is configured to have access to the Internet via the IT Proxy as a gateway. Edit the DHCP configuration file with sudo vi /etc/dhcpcd.conf and add the following at the end:

interface eth0
static ip_address=10.16.7.4/16
static routers=10.16.8.4
static domain_name_servers=1.1.1.1

Note that I added the Cloudflare’s DNS server to the list of DNS servers. The reason for that is that the proxy device will not do DNS resolution. You can also configure it with your home network’s DNS server. This should be enough for now, and we can start installing the IoT Edge runtime on the Layer 5 device.

Testing from my laptop, I am able to connect to both devices using their 10.16.0.0/16 network IP addresses:

ssh pi@pi-itproxy

connects me to the IT Proxy, and:

ssh pi@pi-l5

connects me to the Layer 5 Iot Edge device.

Create the Azure Cloud Resources

Before we start installing the Azure IoT Edge runtime on the device, we need to create an Azure IoT Hub and register the L5 device with it. This is well described in Microsoft’s documentation explaining how to deploy code to a Linux device. Here are the Azure CLI commands that will create the resource group and the IoT Hub resources:

az group create --name tsm-rg-eus-test --location eastus
az iot hub create --resource-group tsm-rg-eus-test --name tsm-ioth-eus-test --sku S1 --partition-count 4

Next is to register the IoT Edge device and get the connection string. Here the commands:

az iot hub device-identity create --device-id L5-edge-pi --edge-enabled --hub-name tsm-ioth-eus-test
az iot hub device-identity connection-string show --device-id L5-edge-pi --hub-name tsm-ioth-eus-test

The last command will return the connection string that we should use to connect the new device to Azure IoT Hub. It has the following format:

{
  "connectionString": "HostName=tsm-ioth-eus-test.azure-devices.net;DeviceId=L5-edge-pi;SharedAccessKey=s0m#VeRYcRYpT1c$tR1nG"
}

Save the connection string because we will need it in the next section to configure the Layer 5 device.

Installing Azure IoT Edge Runtime on the Layer 5 Device

Installing the Azure IoT Edge is described in the Tutorial: Create a hierarchy of IoT Edge devices article. The tutorial describes how to build a hierarchy with two devices only. The important part of the nested configuration is to generate the certificates and transfer them to the devices. So, let’s go over this step by step for the Layer 5 device.

Create Certificates

The first thing we need to do on the workstation, after cloning the IoT Edge GitHub repository with

git clone https://github.com/Azure/iotedge.git

,is to generate the root and the intermediate certificates (check folder /tools/CACertificates):

./certGen.sh create_root_and_intermediate

Those certificates will be used to generate the individual devices’ certificates. For now, we will only create a certificate for the Layer 5 device.

./certGen.sh create_edge_device_ca_certificate "l5_certificate"

After those two command, you should have the following in your <WORKDIR>/certs folder:

drwxrwxrwx 1 toddysm toddysm 4096 Apr  8 15:04 .
drwxrwxrwx 1 toddysm toddysm 4096 Apr  8 15:04 ..
-rwxrwxrwx 1 toddysm toddysm 3960 Apr  8 14:54 azure-iot-test-only.intermediate-full-chain.cert.pem
-r-xr-xr-x 1 toddysm toddysm 1976 Apr  8 14:54 azure-iot-test-only.intermediate.cert.pem
-rwxrwxrwx 1 toddysm toddysm 5806 Apr  8 14:54 azure-iot-test-only.intermediate.cert.pfx
-r-xr-xr-x 1 toddysm toddysm 1984 Apr  8 14:54 azure-iot-test-only.root.ca.cert.pem
-rwxrwxrwx 1 toddysm toddysm 5891 Apr  8 15:04 iot-edge-device-ca-l5_certificate-full-chain.cert.pem
-r-xr-xr-x 1 toddysm toddysm 1931 Apr  8 15:04 iot-edge-device-ca-l5_certificate.cert.pem
-rwxrwxrwx 1 toddysm toddysm 7240 Apr  8 15:04 iot-edge-device-ca-l5_certificate.cert.pfx

The content of the <WORKDIR>/private folder should be the following:

drwxrwxrwx 1 toddysm toddysm 4096 Apr  8 15:04 .
drwxrwxrwx 1 toddysm toddysm 4096 Apr  8 15:04 ..
-r-xr-xr-x 1 toddysm toddysm 3326 Apr  8 14:54 azure-iot-test-only.intermediate.key.pem
-r-xr-xr-x 1 toddysm toddysm 3326 Apr  8 14:54 azure-iot-test-only.root.ca.key.pem
-r-xr-xr-x 1 toddysm toddysm 3243 Apr  8 15:04 iot-edge-device-ca-l5_certificate.key.pem

We need to upload the relevant certificates to the Layer 5 device. From the <WORKDIR> folder on the workstation issue the following commands:

scp ./certs/azure-iot-test-only.root.ca.cert.pem pi@pi-l5:.
scp ./certs/iot-edge-device-ca-l5_certificate-full-chain.cert.pem pi@pi-l5:.

The above two commands will upload the public key for the root certificate and the certificate chain to the Layer 5 device. The following command will upload the device’s private key:

scp ./private/iot-edge-device-ca-l5_certificate.key.pem pi@pi-l5:.

Connect to the Layer 5 device with ssh pi@pi-l5 and install the root CA using:

sudo cp ~/azure-iot-test-only.root.ca.cert.pem /usr/local/share/ca-certificates/azure-iot-test-only.root.ca.cert.pem.crt
sudo update-ca-certificates

The response from this command should be:

Updating certificates in /etc/ssl/certs...
1 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.

To verify that the cert is installed, you can use:

ls /etc/ssl/certs/ | grep azure

We should also move the device certificate chain and the private key to the /var/secrets folder:

sudo mkdir /var/secrets
sudo mkdir /var/secrets/aziot
sudo mv iot-edge-device-ca-l5_certificate-full-chain.cert.pem /var/secrets/aziot/
sudo mv iot-edge-device-ca-l5_certificate.key.pem /var/secrets/aziot/

Configuring the IoT Edge Runtime

Installation and configuration of the IoT Edge runtime is described in the official Microsoft documentation (see Install or uninstall Azure IoT Edge for Linux ) but here the values you should set for the Layer 5 device configuration when editing the /etc/aziot/config.toml file:

hostname = "10.16.7.4"

trust_bundle_cert = "file:///etc/ssl/certs/azure-iot-test-only.root.ca.cert.pem.pem"

[provisioning]
source = "manual"
connection_string = "HostName=tsm-ioth-eus-test.azure-devices.net;DeviceId=L5-edge-pi;SharedAccessKey=s0m#VeRYcRYpT1c$tR1nG"

[agent.config]
image = "mcr.microsoft.com/azureiotedge-agent:1.2.0-rc4"
[edge_ca]
cert = "file:///var/secrets/aziot/iot-edge-device-ca-l5_certificate-full-chain.cert.pem"
pk = "file:///var/secrets/aziot/iot-edge-device-ca-l5_certificate.key.pem"

Apply the IoT Edge configuration with sudo iotedge config apply and check it with sudo iotedge check. At this point, the IoT Edge runtime should be running on the Layer 5 device. You can check the status with sudo iotedge system status.

Deploying Modules on the Layer 5 Device

The last thing we need to do is to deploy the required modules that will support the lower-layer device. In addition to the standard IoT Edge modules $edgeAgent and $edgeHub we need also to deploy a registry module.  The registry module is intended to serve the container images for the Layer 4 device. You should also deploy the API proxy module to enable a single endpoint for all services deployed on the IoT Edge device. Here are several Gists that you can use for deploying the modules on the Layer 5 device:

Testing the Registry Module

Before setting up the next layer of the hierarchical IoT edge infrastructure, we need to make sure that Layer 5 registry module is working properly. Without it, you will not be able to set up the next layer. From your laptop, you should be able to pull Docker images from the registry module. Depending on how you have deployed the registry module (using one of the Gists above), the pull commands should look something like this:

docker login -u <registry_username> -p <registry_password> 10.16.8.4:<port_if_any>
docker pull 10.16.8.4:<port_if_any>/azureiotedge-simulated-temperature-sensor:1.0

Now, we have set up the top layer (Layer 5) for our nested IoT Edge infrastructure. In the next post, I will describe the steps to set up the first nested layer of the hierarchical infrastructure – Layer 4.

Image by falco from Pixabay.

To provide support for the hierarchical Azure IoT Edge scenarios we started working on a connected registry implementation that will allow extension of the Azure container registry functionality to on-premises. For those of you who are not familiar with what a hierarchical IoT Edge scenario is, take a look at the Purdue network model used in the ISA 95 and ISA 99 standards – TL;DR: it is the network architecture that allows segregation of OT and IT traffic in manufacturing networks. While the Azure IoT team has provided a sample of the hierarchical IoT Edge environment, I wanted to reproduce it using physical ARM-based devices.

The problem with configuring a hierarchy of IoT Edge devices at home is that home networking devices do not allow advanced network configurations that you would normally expect from an enterprise-grade switch or router. While my Eero is good at its mesh WiFi capabilities, it is a very poor implementation of a switch and doesn’t allow the creation of multiple virtual networks or WiFI SSIDs. The routing capabilities are also quite limited, which to be honest, impact the ability to create secure IoT networks at home. However, I derail… To implement my configuration, I gathered a bunch of Raspberry Pi 4s, Pi Zeros, and an Nvidia Jetson Nano and got to work.

Here is the big picture of what are we implementing:

Hierarchy of IoT Edge devices - Purdue Networks

 

In this post, I will walk over the configuration of the IT Proxy in the IT DMZ layer shown in the picture above. For that, I decided to use a Raspberry Pi 4 device and configure it with the Squid proxy similar to the Azure IoT sample linked above. Before that though, let’s look at the device configuration.

Configuring the Raspberry Pi Network Interfaces

Raspberry Pi 4 has two network interfaces – WiFi one and Ethernet one. The idea here is to configure the WiFi interface to connect to my home network using the home network IP range  192.168.0.0/24 and wire the Ethernet interface to a simple switch and assign it a static IP address from the  10.16.8.0/24 network. Routing between the two interfaces should also be established so traffic from the  10.16.8.0/24 network can flow to the 192.168.0.0/24 network. Pulling some details from the Setting up a Raspberry Pi as a routed wireless access point article on the Raspberry Pi’s official website, I ended up with the following configuration.

Warning: Be careful here and don’t copy the commands directly from the Pi’s article! We are routing in the reverse direction – from the Ethernet interface to the WiFi interface.

  • Install the netfilter-persistent and the iptables-persistent plugin to be able to persist the routing rules between reboots:
    sudo DEBIAN_FRONTEND=noninteractive apt install -y netfilter-persistent iptables-persistent
  • Configure a static IP address for the Ethernet interface. To follow the Azure IoT sample, I choose 10.16.8.4. Edit the DHCP configuration file with sudo vi /etc/dhcpcd.conf and add the following at the end:
    interface eth0
    static ip_address=10.16.8.4/24
  • Enable routing by creating a new file with sudo vi /etc/sysctl.d/routed-proxy.conf and add the following in it:
    # Enable IPv4 routing
    net.ipv4.ip_forward=1
  • Next, create a firewall rule with:
    sudo iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE
  • Last, save the configuration and reboot the device:
    sudo netfilter-persistent save
    sudo systemctl reboot

Checking the configuration with ifconfig after reboot should give you something similar to:

pi@raspberrypi:~ $ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.16.8.4  netmask 255.255.255.0  broadcast 10.16.8.255
        inet6 fe80::6816:2aa8:ab8d:9f93  prefixlen 64  scopeid 0x20<link>
        ether dc:a6:32:77:fb:99  txqueuelen 1000  (Ethernet)
        RX packets 5565  bytes 1359978 (1.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 131  bytes 14399 (14.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 22  bytes 1848 (1.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 22  bytes 1848 (1.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.0.101  netmask 255.255.0.0  broadcast 192.168.255.255
        inet6 fe80::6788:d8e2:4e8a:558  prefixlen 64  scopeid 0x20<link>
        ether dc:a6:32:77:fb:9a  txqueuelen 1000  (Ethernet)
        RX packets 7619  bytes 1671937 (1.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1494  bytes 346279 (338.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

To test the connectivity through the newly configured router, I had to set up my Mac with a wired connection using a USB Ethernet adapter. Here the configuration:

traceroute yields satisfactory results showing that the traffic goes through the Raspberry Pi:

toddysm@MacBook-Pro ~ % traceroute 192.168.0.101
traceroute to 192.168.0.101 (192.168.0.101), 64 hops max, 52 byte packets
 1  192.168.0.101 (192.168.0.101)  9.146 ms  7.669 ms  10.125 ms
toddysm@MacBook-Pro ~ % traceroute 172.217.3.164
traceroute to 172.217.3.164 (172.217.3.164), 64 hops max, 52 byte packets
 1  10.16.8.4 (10.16.8.4)  2.555 ms  0.661 ms  0.377 ms
 2  * *^C
toddysm@MacBook-Pro ~ %

Installing and Configuring Squid Proxy

Installing Squid proxy is as trivial as typing the following command:

sudo apt install squid

However, there are two configuration settings you need to do to make sure that the proxy can be used from machines on the local network. Those settings are available in the Squid configuration file /etc/squid/squid.conf:

  • First, the port configuration for the proxy is set to bind to any IP address but when the proxy starts, it binds to the IPv6 addresses instead of the IPv4 ones. You need to change the following line:
    http_port 3128

    and add the IP address of the Ethernet port like this:

    http_port 10.16.8.4:3128
  • Second, access to the proxy is enabled only from the localhost. Uncomment the following line to enable access from the localnet:
    http_access allow localnet

    Also, make sure that the localnet is defined as:

    acl localnet src 0.0.0.1-0.255.255.255  # RFC 1122 "this" network (LAN)
    acl localnet src 10.0.0.0/8             # RFC 1918 local private network (LAN)
    acl localnet src 100.64.0.0/10          # RFC 6598 shared address space (CGN)
    acl localnet src 169.254.0.0/16         # RFC 3927 link-local (directly plugged) machines
    acl localnet src 172.16.0.0/12          # RFC 1918 local private network (LAN)
    acl localnet src 192.168.0.0/16         # RFC 1918 local private network (LAN)
    acl localnet src fc00::/7               # RFC 4193 local private network range
    acl localnet src fe80::/10              # RFC 4291 link-local (directly plugged) machines

    The latter should already be done later in the file.

To make sure that the proxy is properly configured, I changed the networking configuration in my Firefox browser on my Mac as follows:

I also turned off my Mac’s WiFi to make sure that the traffic goes through the wired interface and uses the proxy. The test was successful and now I have the top layer of the hierarchy of IoT Edge network configured.

In the next post, I will go over configuring the L5 of the Purdue network architecture and installing Azure IoT Edge runtime on it.

With the recent Solorigate incident, a lot of emphasis is put on determining the origin of the software running in an enterprise. For Docker container images, this will mean to embed in the image the Dockerfile the image was built from. However, tracking down the software origin is not so trivial to do. For closed-source software, we blindly trust the vendors and if we are lucky enough, we may get a signed piece of code. For open-source one, we rarely check the SHA signature and never even think of verifying what source code this binary was produced from. In talks with customers, I quite often hear them asking, how can they verify what sources a container image is built from. They want to attribute each image with metadata that links to the Dockerfile used to build the image as well as the Git commit and the developer who triggered the build.

There are many articles that discuss this problem. Here are two recent examples. Richard Lander from the Microsoft .NET team writes in his blog post Staying safe with .NET containers about the pedigree and provenance of the software we run and how to think about it. Josh Hendrick in his post Embedding source code version information in Docker images offers one solution to the problem.

Josh Hendrick’s proposal is in the direction I would go, but one problem I have with it is that it requires special handling in the application that runs in the container to obtain this information. I would prefer to have this information readily available without the need to run the container image. Docker images and the Open Container Initiative already have specified ways to do that without adding special files to your image. In this post, I will outline another way you can embed this information into your images and easily retrieve it without any changes to your application.

Using Docker Image Labels

Docker images spec has already built-in functionality to add labels to the image. Labels are intended to be set during build time. They also show up when inspecting the image using docker image inspect, which makes them the right choice to specify the Dockerfile and the other build origin details. One more argument that makes them the right choice for this information is that the labels are layers in the image, and thus immutable. If you change the label in an image the resulting image SHA will change.

To demonstrate how labels can be used to embed the Dockerfile and other origin information into the Docker image, I have published a dynamic labels sample on GitHub. The sample uses a base Python image and implements a simple functionality to print the container’s environment variables. Let’s walk through it step by step.

The Dockerfile is quite simple.

FROM python:slim
ARG IMAGE_COMMITTER
ARG IMAGE_DOCKERFILE
ARG IMAGE_COMMIT_SHA
LABEL "build.user"=${IMAGE_COMMITTER}
LABEL "build.sha"=${IMAGE_COMMIT_SHA}
LABEL "build.dockerfile"=${IMAGE_DOCKERFILE}
ADD ./samples/dynamic-labels/source /
CMD ["python", "/show_environment.py"]

Lines 2-4 define the build arguments that need to be set during the build of the image. Lines 5-7 set the three labels build.user, build.sha, and build.dockerfilethat we want to embed in the image. build.dockerfile is the URL to the Dockerfile in the GitHub repository, while the build.sha is the Git commit that triggers the build. If you build the image locally with some dummy build arguments you will see that new layers are created for each of the lines 5-7.

toddysm@MacBook-Pro ~ % docker build -t test --build-arg IMAGE_COMMITTER=toddysm --build-arg IMAGE_DOCKERFILE=https://test.com --build-arg IMAGE_COMMIT_SHA=12345 -f .\samples\dynamic-labels\Dockerfile .
Sending build context to Docker daemon  376.3kB
Step 1/9 : FROM python:slim
 ---> 8c84baace4b3
Step 2/9 : ARG IMAGE_COMMITTER
 ---> Running in 71ad05f20d20
Removing intermediate container 71ad05f20d20
 ---> fe56c62b9903
Step 3/9 : ARG IMAGE_DOCKERFILE
 ---> Running in fe468c44e9fc
Removing intermediate container fe468c44e9fc
 ---> b776dca57bd7
Step 4/9 : ARG IMAGE_COMMIT_SHA
 ---> Running in 849a82225c31
Removing intermediate container 849a82225c31
 ---> 3a4c6c23a699
Step 5/9 : LABEL "build.user"=${IMAGE_COMMITTER}
 ---> Running in fd4bfb8d5b5b
Removing intermediate container fd4bfb8d5b5b
 ---> 2e9be17c48ff
Step 6/9 : LABEL "build.sha"=${IMAGE_COMMIT_SHA}
 ---> Running in 892323d73495
Removing intermediate container 892323d73495
 ---> b7bc6559629d
Step 7/9 : LABEL "build.dockerfile"=${IMAGE_DOCKERFILE}
 ---> Running in 98687b8dd9fb
Removing intermediate container 98687b8dd9fb
 ---> 35e97d273cbc
Step 8/9 : ADD ./samples/dynamic-labels/source /
 ---> 9e71859892b1
Step 9/9 : CMD ["python", "/show_environment.py"]
 ---> Running in 366b1b6c3bea
Removing intermediate container 366b1b6c3bea
 ---> e7cb39a21c2a
Successfully built e7cb39a21c2a
Successfully tagged test:latest

You can inspect the image and see the labels by issuing the command docker image inspect --format='{{json .Config.Labels}}' <imagename>.

toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' test | jq
{
  "build.dockerfile":"https://test.com",
  "build.sha":"12345",
  "build.user":"toddysm"
}

Now, let’s automate the process with the help of GitHub Actions. I have created one GitHub Action to build and push the image to DockerHub and another to build and push to Azure Container Registry (ACR). Both actions are similar in the steps they use. The first two steps are the same for both actions. They will build the URL to the Dockerfile using the corresponding GitHub Actions variables:

- name: 'Set environment variable for Dockerfile URL for push'
  if: ${{ github.event_name == 'push' }}
  run: echo "DOCKERFILE_URL=${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/blob/${GITHUB_REF#refs/*/}/samples/dynamic-labels/Dockerfile" >> $GITHUB_ENV

- name: 'Set environment variable for Dockerfile URL for pull request'
  if: ${{ github.event_name == 'pull_request' }}
  run: echo "DOCKERFILE_URL=${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/blob/${GITHUB_BASE_REF#refs/*/}/samples/dynamic-labels/Dockerfile" >> $GITHUB_ENV

Then, there will be specific steps to sign into DockerHub or Azure. After that, the build steps are the ones where the labels are set. Here, for example, is the build step that buildx and automatically pushes the image to DockerHub:

- name: Build and push
  id: docker_build
  uses: docker/build-push-action@v2
  with:
    context: ./
    file: ./samples/dynamic-labels/Dockerfile
    push: true
    tags: ${{ secrets.DOCKER_HUB_REPONAME }}:build-${{ github.run_number }}
    build-args: |
      IMAGE_COMMITTER=${{ github.actor }}
      IMAGE_DOCKERFILE=${{ env.DOCKERFILE_URL }}
      IMAGE_COMMIT_SHA=${{ github.sha }}

The build step for building the image and pushing to Azure Container Registry uses the traditional docker build approach:

- name: Build and push
  id: docker_build
  uses: azure/docker-login@v1
  with:
    login-server: ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}
    username: ${{ secrets.ACR_REGISTRY_USERNAME }}
    password: ${{ secrets.ACR_REGISTRY_PASSWORD }}
- run: |
    docker build -f ./samples/dynamic-labels/Dockerfile -t ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}/${{ secrets.ACR_REPOSITORY_NAME }}:build-${{ github.run_number }} --build-arg IMAGE_COMMITTER=${{ github.actor }} --build-arg IMAGE_DOCKERFILE=${{ env.DOCKERFILE_URL }} --build-arg IMAGE_COMMIT_SHA=${{ github.sha }} .
    docker push ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}/${{ secrets.ACR_REPOSITORY_NAME }}:build-${{ github.run_number }}

After the actions complete, the images are available in DockerHub and Azure Container Registry. Here is how the image looks like in DockerHub:

Docker container image with labels

If you scroll down a little, you will see the labels that appear in the list of layers:

The URL points you to the Dockerfile that was used to create the image while the commit SHA can be used to identify the latest changes that are done on the project that is used to build the image. If you pull the image locally, you can also see the labels using the command:

toddysm@MacBook-Pro ~ % docker pull toddysm/tmstests:build-36
build-36: Pulling from toddysm/tmstests
45b42c59be33: Already exists
8cd3485318db: Already exists
2f564129f025: Pull complete
cf1573f5a21e: Pull complete
ceec8aed2dab: Pull complete
78b1088f77a0: Pull complete
Digest: sha256:7862c2a31970916fd50d3ab38de0dad74a180374d41625f014341c90c4b55758
Status: Downloaded newer image for toddysm/tmstests:build-36
docker.io/toddysm/tmstests:build-36
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' toddysm/tmstests:build-36
{
  "build.dockerfile":"https://github.com/CrimsonPinnacle/container-image-inspector/blob/development/samples/dynamic-labels/Dockerfile",
  "build.sha":"e80e6ef86f86a11d6a73aea8d8c41700c4d3d7c5",
  "build.user":"toddysm"
}

To summarize, the benefit of using labels for embedding the Dockerfile and other origin information into the container images is that those are considered immutable layers of the image. Thus, they cannot be changed without changing the image.

Who is Using Docker Image Labels?

Unfortunately, labels are not widely used if at all 🙁 Checking several popular images from DockerHub yields the following results:

toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' busybox | jq
null
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' alpine | jq 
null
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' ubuntu | jq 
null

Tracking down the sources from which the Alpine image is built would require much higher effort.

What is Next for Checking Docker Image Origins?

There are a couple of community initiatives that will play a role in determining the origin of container images.

  • Notary V2 will allow images to be signed. Having the origin information embedded into the image and adding an official signature to the image will increase the confidence in the legitimacy of the image.
  • OCI manifest specification allows artifacts (i.e. images) to be annotated with arbitrary metadata. Unfortunately, Docker doesn’t support those yet. Hopefully, in the future, Docker images will add support for arbitrary metadata that can be included in the image manifest.
  • An implementation of metadata service (see metadata service draft from Steve Lasker) as part of the registry will enable additional capabilities to provide origin information for the images.

Summary

While image metadata is great to annotate images with useful information and enable search and querying capabilities, the metadata is kept outside of the image layers and can mutate over time. Verifying the authenticity of the metadata and keeping a history of the changes will be a harder problem to solve. Docker already provides a way to embed the Dockerfile and other image origin information as immutable layers of the image itself. Using dynamically populated Docker image labels, developers can right now provide origin information and increase the supply chain confidence for their images.

In my previous post, Learn More About Your Home Network with Elastic SIEM – Part 1: Setting Up Elastic SIEM, I explained how you could set up Elastic SIEM on a Raspberry Pi[ad]. The next thing you would want to do is to collect the logs from your firewall and analyze them. Before I jump into the technical details, I should warn you that you may… not be able to do the steps below if you rely on consumer products or you use the equipment provided by your ISP.

Let me go on a short rant here! Every self-respected router vendor should allow firewall logs to be sent to an external system. A common approach is to use the SYSLOG protocol to collect the logs. If your router does not have this capability… well, I would suggest you buy a new, more advanced one.

I personally invested in a tiny Netgate SG-1100 box that runs the open-source PFSense router/firewall. You can, of course, install PFSense on your own hardware if you don’t want to buy a new device. PFSense allows you to configure up to three external log servers. Logstash, that we have configured in the previous post, can play the role of an SYSLOG server and send the events to Elasticsearch. Here is how simple the configuration of the PFSense log shipping looks:

The IP address 192.168.11.72 is the address of the Raspberry Pi, where the ELK SIEM is installed and 5140 is the port that Logstash uses to listen for incoming events. Thas is all you need to configure PFSense to send the logs to the ELK SIEM.

Our next step is to configure Logstash to collect the events from PFSense and feed them into an index in Elastic. The following project from Patrick Jennings will help you with the Logstash configuration. If you follow the instructions, you will see the new index show up in Kibana like this:

The last thing we need to do is to create a dashboard in Kibana to show the data collected from the firewall. Patrick Jennings’ project has pre-configured visualizations and a dashboard for the PFSense data. Unfortunately, when you import those, Kibana warns you that those need to be updated. The reason is that they use the old JSON format used by Kibana, and the latest versions require all objects to be described using the Newline Delimited NDJSON format (for more details, visit ndjson.org). The pfSense dashboard and visualization are available in my GitHub repository for Home SIEM.

Now, keep in mind that the pfSense logs will not feed into the SIEM functionality of the Elastic stack because it is not in the Elastic Common Schema (ECS) format. What we have created is just a dashboard that visualizes the firewall log data. Also, the dashboard and the visualizations are built using the pfSense data. If you use a different router/firewall, you will need to update the configuration to visualize the data, and things may not work out of the box. I would be curious to hear feedback on how other routers can send data to ELK.

In subsequent posts, I will describe how you can use Beats to get data from the machines connected to your local network and how you can dig deeper into the collected data.