A docker image consists of a set of layers. Each layer is each command in the dockerfile and each layer adds a bit of extra size to the resulting image.
Keeping the image size down is a good thing, so the less layers you have in the end, the better.
Now, layers are of course not the only thing that makes a image grow, but it’s likely the easiest thing to change!
When you write a dockerfile and test it, the number of layers should be higher.
Why? Well, each layer is cached and if no of the preexisting layers are changed, it wont have to be rebuilt. This makes the development of images a lot faster and you won’t have to wait for a full rebuild each time you test.
We start with a (kind of easy) example:
RUN mkdir test
RUN touch test.txt
RUN mkdir test2
The above sequence of run and workdir commands might not be that hard to figure out how to squash together, but its a stupidly simple example to show what is wrong!
RUN and each
WORKDIR directive adds a layer, every one of them. In the example we have a total of 6 layers. (In this case the layers are tiny, non-significant, but with a few
ADD and installations of programs for the image that would grow a whole lot!)
You can check a image size by using the
docker images <image-name> and the layers and the size of each layer with
docker history <image-name>.
By removing all but one
RUN directive and remove all but one
WORKDIR directive, we will lower the layer count to 2 layers!
RUN mkdir /test \
&& touch /test/test.txt \
&& mkdir /test2
A better example could be the following jekyll build script:
RUN apk add --no-cache \
ADD . /app
RUN gem install --no-document \
RUN bundle install \
&& jekyll build \
The above file could be used to build a jekyll project and allow a webserver to serve the files in the /app directory. But it consists of many layers, and some of those are quite big!
So how do we squash it?
Well, first off we need to know some basics with jekyll and ruby. For example, running
gem install for global packages - as the 2’nd
RUN directive does - can be done in any directory.
If we concatenate that with the first
RUN, we are a whole layer smaller!
app directory could also be done before the first
RUN directive, so could the
WORKDIR command and by moving those up to the top will let us concatenate the first and last
The image would look something like the following:
ADD . /app
RUN apk add --no-cache \
&& gem install --no-document \
&& bundle install \
&& jekyll build \
Counting the layers will give us a much more manageable layer count of 3.
Image size difference is much depending on the size of the jekyll project, I won’t add that here, rather experiment with it yourself and let me know how it worked out! ;)
When you write a dockerfile the first of the two are the better way to go, or even just splitting up the
RUN commands of the last version into multiple.
Each layer will be built, and if either fail, you won’t have to rebuild the earlier ones to fix the latter, that way you will develop faster and can squash the image at the end.
jekyll build command fails in the last version, the whole
RUN directive have to be rebuilt, in the end, it will take more time than it would have to!
Installing and removing
When you wish to lower the size even more of a image you should make sure you remove all installed packages that you do not need anymore.
It’s easy to just install a bunch of stuff that you like to have in a VM or computer, but you will most likely not need them in a container!
At the end of each file, in the last run, remove all packages that you don’t need. Most (if not all) package managers allow you to purge or delete packages, just check how for the specific one you use.
The package manager
APK in the alpine distro have two quite nifty features when it comes to adding and removing images.
First off: skip the local cache! (on some distros you can do this by deleting the cache files) In Alpine, all you have to do is add the
--no-cache argument to the
apk add call.
By not using the cache, apk will check for latest packages on the repositories, not in the local repository cache files, hence always get the package you want, if it exists.
virtual argument. With this argument you can add a bunch of packages to a virtual group, it makes the cleanup process a whole lot easier.
RUN apk add --no-cache --virtual .trash gnupg g++ gcc wget make python autoconf \
&& apk add --no-cache git openssh \
# do something with all the packages installed
&& apk del .trash
As seen in the example above, in the first
apk-add a few packages was installed and was put in the virtual group
.trash. At the end of the file, when all packages have been used and are not needed anymore for the image,
a call to
apk del is made, this call does not list all the packages, but rather just their group, deleting them all. The second
apk add command installs the
openssh packages the usual way, without
a virtual group and are not removed, cause they will be used in the container! If you add more packages to the
.trash group, they will be uninstalled on
apk-del as well, keeping you away from forgetting to add it at both places!
- During development, use as many layers as you feel is useful. The more, the less is the risk that you will have to spend time waiting on long downloads or on compilers running for hours!
- When you are to push your image, make sure it has as few layers as possible.
- Make sure you only keep the files and the packages that you really want to be kept in the image, whatever the container won’t need you can throw away.
- Use small images as bases. For example, don’t use a Ubuntu or Debian image as base for your data container!