Static vulnerability analysis tools for Docker containers

You’re developing in Docker, and you have a container with a lot of layers when you’re done. How do you make sure that you don’t have security vulnerabilities hiding somewhere inside there, especially if you are running a container that depends on something that depends on something that depends on something else?

The first step is to understand your own application, so that you can have some sense for how the dependencies that you have control over are impacted. This is actually a more difficult problem than you’d expect, since a lot of packages are in turn dependent on other packages and so on and so on. I’ve used Gemnasium to help with this. It tracks Ruby, Node.js, Python, and PHP dependencies, and generates alerts when a package has a known issue that should trigger an update. The Node Security Project has a similar service, focused on Node.js, and has an integration directly into Github so that you can verify and validate every commit against their vulnerability analysis.

Next, make certain that your Dockerfiles are following best practices by checking them with Docker Bench for Security, which is a portable shell script that checks for dozens of common practices that are good for security.

One of the difficult practices that’s common to Dockerfiles is the tendency for people to work their way around package managers in the install process, to avoid the sometimes hefty overhead of maintaining a package index. When you install a package with

curl | tar && ./configure && make install

instead of RPM or apk or apt or your distribution’s favorite package manager, it can be very hard to keep track of precisely which version that incantation actually put into place.

Assuming that your application is now somehow clear from any issues under your direct control, the next step is to track down and understand operating system vulnerabilities. One approach that I’ve read about but not used yet is the open source Clair from CoreOS. Clair does static analysis of containers, using a set of vulnerability databases that are parsed down to a Postgres database. There’s a certain amount of complexity in the whole thing, and as a result it’s not trivial to integrate this into a continuous integration process (though that task has been done at least once).

Docker has its own Docker Security Scanning, which is available to paid accounts.

One of the big issues in handling static analysis of Docker containers is that a popular and small Docker base image, Alpine Linux, is not covered by Clair in part because Alpine does not have a fully machine parsable vulnerability analysis database. The Clair project is working on the issue, but they are hampered in part by the small size of the Alpine team and their focus on code vs. documentation. From a post to alpine-devel in 2016:

we try do more the actual work, than the paperwork ;-)

Doing continuous integration on a complex project with multiple open source dependencies can pose tricky issues, especially when you’re dealing with code that is at or near end of life. If you discover that the vulnerability is in a component that has been removed from active development, yet you depend on it, you may be forced to either live with a known bug or undertake expensive porting efforts to get back to a good state. For example, Node-RED used the ws package from npm, at a version known to have a denial of service attack. This issue persists as of late October 2016 because of the project’s desire to continue to support Raspberry Pi hardware with the default operating system load. The Pi’s base Raspbian system uses Node.js at 0.10 which is now obsolete, and the fixes to ws don’t include 0.10 patches.

What exactly are you running, how do you know that it’s free of known issues, and how can you mitigate or fix the problems you find? It’s hard to do dynamic analysis at scale, because of the likelihood that bugs exist that can escape your testing environment. Static analysis looks like a reasonable alternative (assuming that the components themselves are well tested), but the increasingly layered nature of modern software development means that this is a hard, hard task.