Docker image optimization
Layers
A docker image consists of a set of layers. Each layer is each command in the dockerfile and each layer adds a bit of extra size to the resulting image.
Keeping the image size down is a good thing, so the less layers you have in the end, the better.
Now, layers are of course not the only thing that makes a image grow, but it’s likely the easiest thing to change!
When you write a dockerfile and test it, the number of layers should be higher.
Why? Well, each layer is cached and if no of the preexisting layers are changed, it wont have to be rebuilt. This makes the development of images a lot faster and you won’t have to wait for a full rebuild each time you test.
We start with a (kind of easy) example:
FROM busybox
WORKDIR /
RUN mkdir test
WORKDIR /test
RUN touch test.txt
WORKDIR /
RUN mkdir test2
The above sequence of run and workdir commands might not be that hard to figure out how to squash together, but its a stupidly simple example to show what is wrong!
Each RUN
and each WORKDIR
directive adds a layer, every one of them. In the example we have a total of 6 layers. (In this case the layers are tiny, non-significant, but with a few ADD
and installations of programs for the image that would grow a whole lot!)
You can check a image size by using the docker images <image-name>
and the layers and the size of each layer with docker history <image-name>
.
By removing all but one RUN
directive and remove all but one WORKDIR
directive, we will lower the layer count to 2 layers!
FROM busybox
RUN mkdir /test \
&& touch /test/test.txt \
&& mkdir /test2
WORKDIR /
A better example could be the following jekyll build script:
FROM ruby:alpine
RUN apk add --no-cache \
openssh-client \
ruby-dev \
build-base
ADD . /app
WORKDIR /app
RUN gem install --no-document \
jekyll \
bundler
RUN bundle install \
&& jekyll build \
The above file could be used to build a jekyll project and allow a webserver to serve the files in the /app directory. But it consists of many layers, and some of those are quite big!
So how do we squash it?
Well, first off we need to know some basics with jekyll and ruby. For example, running gem install
for global packages - as the 2’nd RUN
directive does - can be done in any directory.
If we concatenate that with the first RUN
, we are a whole layer smaller!
Adding the app
directory could also be done before the first RUN
directive, so could the WORKDIR
command and by moving those up to the top will let us concatenate the first and last RUN
.
The image would look something like the following:
FROM ruby:alpine
ADD . /app
WORKDIR /app
RUN apk add --no-cache \
openssh-client \
ruby-dev \
build-base \
&& gem install --no-document \
jekyll \
bundler \
&& bundle install \
&& jekyll build \
Counting the layers will give us a much more manageable layer count of 3.
Image size difference is much depending on the size of the jekyll project, I won’t add that here, rather experiment with it yourself and let me know how it worked out! ;)
When you write a dockerfile the first of the two are the better way to go, or even just splitting up the RUN
commands of the last version into multiple.
Each layer will be built, and if either fail, you won’t have to rebuild the earlier ones to fix the latter, that way you will develop faster and can squash the image at the end.
If jekyll build
command fails in the last version, the whole RUN
directive have to be rebuilt, in the end, it will take more time than it would have to!
Installing and removing
When you wish to lower the size even more of a image you should make sure you remove all installed packages that you do not need anymore.
It’s easy to just install a bunch of stuff that you like to have in a VM or computer, but you will most likely not need them in a container!
At the end of each file, in the last run, remove all packages that you don’t need. Most (if not all) package managers allow you to purge or delete packages, just check how for the specific one you use.
The package manager APK
in the alpine distro have two quite nifty features when it comes to adding and removing images.
First off: skip the local cache! (on some distros you can do this by deleting the cache files) In Alpine, all you have to do is add the --no-cache
argument to the apk add
call.
By not using the cache, apk will check for latest packages on the repositories, not in the local repository cache files, hence always get the package you want, if it exists.
Secondly: The virtual
argument. With this argument you can add a bunch of packages to a virtual group, it makes the cleanup process a whole lot easier.
RUN apk add --no-cache --virtual .trash gnupg g++ gcc wget make python autoconf \
&& apk add --no-cache git openssh \
# do something with all the packages installed
&& apk del .trash
As seen in the example above, in the first apk-add
a few packages was installed and was put in the virtual group .trash
. At the end of the file, when all packages have been used and are not needed anymore for the image,
a call to apk del
is made, this call does not list all the packages, but rather just their group, deleting them all. The second apk add
command installs the git
and openssh
packages the usual way, without
a virtual group and are not removed, cause they will be used in the container! If you add more packages to the .trash
group, they will be uninstalled on apk-del
as well, keeping you away from forgetting to add it at both places!
Conclusion
- During development, use as many layers as you feel is useful. The more, the less is the risk that you will have to spend time waiting on long downloads or on compilers running for hours!
- When you are to push your image, make sure it has as few layers as possible.
- Make sure you only keep the files and the packages that you really want to be kept in the image, whatever the container won’t need you can throw away.
- Use small images as bases. For example, don’t use a Ubuntu or Debian image as base for your data container!