Multi-stage dockerfiles

This article is part 0 in a series: Docker



I have earlier written a bit about docker, what I have not yet had time to go through here is multi-stage builds. Multi-stage builds is quite new (depending on how you define new!) and quite nifty when it comes to building images that don’t take too much disk space.
So what is multi-stage? Simply described, dockerfiles using multi-stage builds contains multiple image declarations which can use each others data, they always build after each other (so not in parallel).
That way you can use the first image to build all your data, then define a new image which only contains the data you need. The resulting image would not contain all the files needed to build, rather only the files that you have specified to use from the earlier stage.
So even if you use a huge image for building, your resulting image could be very small, depending on which image it derives from of course.

To illustrate this, I will show a simple example of a image for a webpage (using lighttpd) which is built in a multi-stage dockerfile.

FROM jitesoft/jekyll as build
ADD . /app
ENV JEKYLL_ENV="production"
RUN bundle update \
    && jekyll build

FROM jitesoft/lighttpd:latest
EXPOSE 80
COPY --from=build /app/_site /var/www/html
CMD ["lighttpd", "-D", "-f", "/etc/lighttpd/lighttpd.conf"]

So, as you might be able to guess, the above script is the file I use to build this page. It uses a jekyll image in the first stage, the stage is named (right after the image name) to build. This is important because you will want to be able to reference it when fetching the data in later stages.
All the code is copied from the local filesystem. It’s a quite standard Jekyll page, so it contains a lot of files and data that we would never use when running the page in production. JEKYLL_ENV is set to production to make sure that jekyll builds all files with minification and whatever else type of optimizations it gives, then the page is built.

Right after the build, a new stage is created, it’s almost like a new file, but it’s in the same one. This image derives from a lighttpd image (which is a small image, around 5mb in size) and exposes the port that is used.
After the expose line, a copy line is used, this one is special compared to normal copy/add commands. It does not copy from the local filesystem but actually from the previous stage. As you can see, we define that the data is to be copied from build (which the jekyll stage was named) and we only fetch the app/_site files, the _site dir is where jekyll creates all its built files.
After that, all that is done is starting the lighttpd instance in a CMD command.
When the image is used by a container, all it contains is the lighttpd image and the data from the jekyll stage, not a bunch of data that is not needed.

So why do this instead of just installing everything and then remove it after build is done?
Well, one could of course do that if they wanted, but this way you can be sure that the final image don’t contain a bunch of extra layers or files that you forgot to remove, it makes it easier to handle the final image and to only run one dockerfile when building the project!

Updated: