Docker image optimization

This article is part 0 in a series: Docker



Layers

A docker image consists of a set of layers. Each layer is each command in the dockerfile and each layer adds a bit of extra size to the resulting image.
Keeping the image size down is a good thing, so the less layers you have in the end, the better.
Now, layers are of course not the only thing that makes a image grow, but it’s likely the easiest thing to change!

When you write a dockerfile and test it, the number of layers should be higher.
Why? Well, each layer is cached and if no of the preexisting layers are changed, it wont have to be rebuilt. This makes the development of images a lot faster and you won’t have to wait for a full rebuild each time you test.

We start with a (kind of easy) example:

FROM busybox

WORKDIR /
RUN mkdir test
WORKDIR /test
RUN touch test.txt
WORKDIR /
RUN mkdir test2

The above sequence of run and workdir commands might not be that hard to figure out how to squash together, but its a stupidly simple example to show what is wrong!
Each RUN and each WORKDIR directive adds a layer, every one of them. In the example we have a total of 6 layers. (In this case the layers are tiny, non-significant, but with a few ADD and installations of programs for the image that would grow a whole lot!) You can check a image size by using the docker images <image-name> and the layers and the size of each layer with docker history <image-name>.
By removing all but one RUN directive and remove all but one WORKDIR directive, we will lower the layer count to 2 layers!

FROM busybox

RUN mkdir /test \
    && touch /test/test.txt \
    && mkdir /test2

WORKDIR /

A better example could be the following jekyll build script:

FROM ruby:alpine

RUN apk add --no-cache \
        openssh-client \
        ruby-dev \
        build-base

ADD . /app
WORKDIR /app

RUN gem install --no-document  \
        jekyll \
        bundler

RUN bundle install \
    && jekyll build \

The above file could be used to build a jekyll project and allow a webserver to serve the files in the /app directory. But it consists of many layers, and some of those are quite big!
So how do we squash it?
Well, first off we need to know some basics with jekyll and ruby. For example, running gem install for global packages - as the 2’nd RUN directive does - can be done in any directory.
If we concatenate that with the first RUN, we are a whole layer smaller!
Adding the app directory could also be done before the first RUN directive, so could the WORKDIR command and by moving those up to the top will let us concatenate the first and last RUN.
The image would look something like the following:

FROM ruby:alpine

ADD . /app
WORKDIR /app

RUN apk add --no-cache \
        openssh-client \
        ruby-dev \
        build-base \
    && gem install --no-document  \
        jekyll \
        bundler \
    && bundle install \
    && jekyll build \

Counting the layers will give us a much more manageable layer count of 3.
Image size difference is much depending on the size of the jekyll project, I won’t add that here, rather experiment with it yourself and let me know how it worked out! ;)

When you write a dockerfile the first of the two are the better way to go, or even just splitting up the RUN commands of the last version into multiple.
Each layer will be built, and if either fail, you won’t have to rebuild the earlier ones to fix the latter, that way you will develop faster and can squash the image at the end.
If jekyll build command fails in the last version, the whole RUN directive have to be rebuilt, in the end, it will take more time than it would have to!

Installing and removing

When you wish to lower the size even more of a image you should make sure you remove all installed packages that you do not need anymore.
It’s easy to just install a bunch of stuff that you like to have in a VM or computer, but you will most likely not need them in a container!

At the end of each file, in the last run, remove all packages that you don’t need. Most (if not all) package managers allow you to purge or delete packages, just check how for the specific one you use.

The package manager APK in the alpine distro have two quite nifty features when it comes to adding and removing images.
First off: skip the local cache! (on some distros you can do this by deleting the cache files) In Alpine, all you have to do is add the --no-cache argument to the apk add call.
By not using the cache, apk will check for latest packages on the repositories, not in the local repository cache files, hence always get the package you want, if it exists.
Secondly: The virtual argument. With this argument you can add a bunch of packages to a virtual group, it makes the cleanup process a whole lot easier.

RUN apk add --no-cache --virtual .trash gnupg g++ gcc wget make python autoconf \
    && apk add --no-cache git openssh \
    # do something with all the packages installed
    && apk del .trash

As seen in the example above, in the first apk-add a few packages was installed and was put in the virtual group .trash. At the end of the file, when all packages have been used and are not needed anymore for the image, a call to apk del is made, this call does not list all the packages, but rather just their group, deleting them all. The second apk add command installs the git and openssh packages the usual way, without a virtual group and are not removed, cause they will be used in the container! If you add more packages to the .trash group, they will be uninstalled on apk-del as well, keeping you away from forgetting to add it at both places!

Conclusion

  • During development, use as many layers as you feel is useful. The more, the less is the risk that you will have to spend time waiting on long downloads or on compilers running for hours!
  • When you are to push your image, make sure it has as few layers as possible.
  • Make sure you only keep the files and the packages that you really want to be kept in the image, whatever the container won’t need you can throw away.
  • Use small images as bases. For example, don’t use a Ubuntu or Debian image as base for your data container!

Updated: