Addressing the Current Challenges of Patching Container Vulnerabilities

While working on a process of improving the container secure supply chain, I often need to go over the current challenges of patching container vulnerabilities. With the introduction of Automatic VM Patching, having those conversations are even more challenging because there is always the question: “Why can’t we patch containers the same way we patch VMs?” Really, why can’t we? First, let’s look at how VM and container workloads differ.

How do VM and Container Workloads Differ?

VM-based applications are considered legacy applications, and VMs fall under the category of Infrastructure-as-a-Service (IaaS) compute services. One of the main characteristics of IaaS compute services is the persistent local storage that can be used to save data on the VM. Typically the way you use the VMs for your application is as follows:

You choose a VM image from the cloud vendor’s catalog. The VM image specifies the OS and its version you want to run on the VM.
You create the VM from that image and specify the size of the VM. The size includes the vCPUs, memory, and persistent storage to be used by the VM.
You install the additional software you need for your application on the VM.

From this point onward, the VM workload state is saved to the persistent storage attached to the VM. Any changes to the OS (like patches) are also committed to the persistent storage, and next time the VM workload needs to be spun up, those are loaded from there. Here are the things to remember for VM-based workloads:

VM image is used only once when the VM workload is created.
Changes to the VM workload are saved to the persistent storage; the next time the VM is started, those changes are automatically loaded.
If a VM workload is moved to a different hardware, the changes will still be loaded from the persistent storage.

How do containers differ, though?

Whenever a new container workload is started, the container image is used to create the container (similar to the VM). If the container workload is stopped and started on the same VM or hardware, any changes to the container will also be automatically loaded. However, because orchestrators do not know whether the new workload will end up on the same VM (or hardware due to resource constraints), they do not stop but destroy the containers, and if a new one needs to be spun up, they use the container image again to create it.

That is a major distinction between VMs and containers. While the VM image is used only once when the VM workload is created, the container images are used repeatedly to re-create the container workload when moving from one place to another and increasing capacity. Thus, when a VM is patched, the patches will be saved to the VM’s persistent storage, while the container patches need to be available in the container image for the workloads to be always patched.

The bottom line is, unlike VMs, when you think of how to patch containers, you should target improvements in updating the container images.

A Timeline of a Container Image Patch

For this example, we will assume that we have an internal machine learning team that builds their application image using python:3.10-bullseye as a base image. We will concentrate on the timelines for fixing the OpenSSL vulnerabilities CVE-2022-0778 and CVE-2022-1292. The internal application team’s dependency is OpenSSL <– Debian <– Python. Those are all Open Source Software (OSS) projects driven by their respective communities. Here is the timeline of fixes for those vulnerabilities by the OSS community.

2022-03-08: python:3.10.2-bullseye Released

Python publishes puthon:3.10.2-bullseye container image. This is the last Python image before the CVE-2022-0778 OpenSSL vulnerability was fixed.

2022-03-15: OpenSSL CVE-2022-0778 Fixed

OpenSSL publishes fix for CVE-2022-0778 impacting versions 1.0.2 – 1.0.2zc, 1.1.1 – 1.1.1m, and 3.0.0 – 3.0.1.

2022-03-16: debian:bullseye-20220316 Released

Debian publishes debian:bullseye-20220316 container image that includes a fix for CVE-2022-0778.

2022-03-18: python:3.10.3-bullseye Released

Python publishes python:3.10.3-bullseye container image that includes a fix for CVE-2022-0778.

2022-05-03: OpenSSL CVE-2022-1292 Fixed

OpenSSL publishes fix for CVE-2022-1292 impacting versions 1.0.2 – 1.0.2zd, 1.1.1 – 1.1.1n, and 3.0.0 – 3.0.2.

2022-05-09: debian:bullseye-20220509 Released

Debian publishes debian:bullseye-20220316 container image that DOES NOT include a fix for CVE-2022-1292.

2022-05-27: debian:bullseye-20220527 Released

Debian publishes debian:bullseye-20220527 container image that includes a fix for CVE-2022-1292.

2022-06-02: python:3.10.4-bullseye Released

Python publishes python:3.10.4-bullseye container image that includes a fix for CVE-2022-1292.

There are a few important things to notice in this timeline:

CVE-2022-0778 was fixed in the whole chain within three days only.
In comparison, CVE-2022-1292 took 30 days to fix in the whole chain.
Also, in the case of CVE-2022-1292, Debian released a container image after the fix from OpenSSL was available, but that image DID NOT contain the fix.

The bottom line is:

Timelines for fixes by the OSS communities are unpredictable.
The latest releases of container images do not necessarily contain the latest software patches.

SLAs and the Typical Process for Fixing Container Vulnerabilities

The typical process teams use to fix vulnerabilities in container images is waiting for the fixes to appear in the upstream images. In our example, the machine learning team must wait for the fixes to appear in the python:3.10-bullseye image first, then rebuild their application image, test the new image, and re-deploy to their production workloads if tests pass. Let’s call this process wait-rebuild-test-redeploy (or WRTR if you like acronyms:)).

The majority of enterprises have established SLAs for fixing vulnerabilities. For those that have not established such yet, things will soon change due to the Executive Order for Improving the Nation’s Cybersecurity. Many enterprises model their patching processes based on the FedRAMP 30/90/180 rules specified in the FedRAMP Continuous Monitoring Strategy Guide. According to the FedRAMP rules, high severity vulnerabilities must be remediated within 30 days. CISA’s Operational Directive for Reducing the Risk of Known Exploited Vulnerabilities has much more stringent timelines of two weeks for vulnerabilities published in CISA’s Known Exploited Vulnerabilities Catalog.

Let’s see how the timelines for patching the abovementioned OpenSSL vulnerabilities fit into those SLAs for the machine learning team using the typical process for patching containers.

CVE-2022-0778 was published on March 15th, 2022. It is a high severity vulnerability, and according to the FedRAMP guidelines, the machine learning team has till April 14th, 2022, to fix the vulnerability in their application image. Considering that the python:3.10.3-bullseye image was published on March 18th, 2022, the machine learning team has 27 days to rebuild, test, and redeploy the image. This sounds like a reasonable time for those activities. Luckily, CVE-2022-0778 is not in the CISA’s catalog, but the team would still have 11 days for those activities if it was.

The picture with CVE-2022-1292 does not look so good, though. The vulnerability was published on May 3rd, 2022. It is a critical severity vulnerability, and according to the FedRAMP guidelines, the machine learning team has till June 2nd, 2022, to fix the vulnerability. Unfortunately, though, python:3.10.4-bullseye image was published on June 2nd, 2022. This means that the team needs to do the re-build, testing, and re-deployment on the same day the community published the image. Either the team needs to be very efficient with their processes or work around the clock that day to complete all the activities (after hoping the community will publish a fix for the python image before the SLA deadline). That is a very unrealistic expectation and also impacts the team’s morale. If by any chance, the vulnerability appeared on the CISA’s catalog (which luckily it did not), the team would not be able to fix it within the two-week SLA.

That proves that the wait-rebuild-test-redeploy (WRTR) process is ineffective in meeting the SLAs for fixing vulnerabilities in container images. But, what can you currently do to improve this and take control of the timelines?

Using Multi-Stage Builds to Fix Container Vulnerabilities

Until the container technology evolves and a more declarative way for patching container images is available, teams can use multi-stage builds to build their application images and fix the base image vulnerabilities. This is easily done in the CI/CD pipeline. This approach will also allow teams to control the timelines for vulnerability fixes and meet their SLAs. Here is an example how you can solve the issue with patching the above example:

FROM python:3.10.2-bullseye as baseimage

RUN apt-get update; \
     apt-get upgrade -y

RUN adduser appuser

FROM baseimage

USER appuser

WORKDIR /app

CMD [ "python", "--version" ]

In the above Dockerfile, the first stage of the build updates the base image with the latest patches. The second stage builds the application and runs it with the appropriate user permissions. Using this approach you awoid the wait part in the WRTD process above and you can always meet your SLAs with simple re-build of the image.

Of course, this approach also has drawbacks. One of its biggest issues is the level of control teams have over what patches are applied. Another one is that some teams do not want to include layers in their images that do not belong to the application (i.e. modify the base image layers). Those all are topics for another post 🙂

Photo by Webstacks on Unsplash