Containers 101: Containerfiles 🗒
Changelog
Date | Changes |
---|---|
2023-03-26 |
|
2023-03-29 |
|
2023-03-30 |
|
2023-04-02 |
|
2023-04-03 |
|
2023-04-09 |
|
Motivation
In the previous article of this series, we explored containers as a concept, how they differ from virtual machines and how they are realized on a high level. In this article we will see how containers are defined.
Preparation
If you want to follow along, you need a means to build container images from containerfiles. I recommend to either use Docker (docs.docker.com
) or Podman (podman.io
). If you are using docker, you also need to install buildx
(docs.docker.com
).
Terminology
You may notice that this article is named "containerfiles", and you may have heard the term "dockerfile". Right now, they are synonyms. I prefer using the term "containerfile" since it is provider-independent.
Where does the term "dockerfile" 🐋 come from?
As I have stated above, Docker provides a container engine, but it is far from the only one. Other popular examples are runC (github.com
), containerd (containerd.io
), and cri-o (crio.io
). Interoperability is governed by the Open Container Initiative (short: OCI, opencontainers.org
). Docker was (and most probably still is) the most popular container engine in a development setting. For a cluster setting, however, it is not so clear. Kubernetes (kubernetes.io
), for example, removed support for Docker back in 2020 (kubernetes.io
). In any case, its dominant position in the early days gave rise to the term "dockerfile".
Lifecycle of a container 🗄️
To understand what a containerfile actually is, we need to understand the lifecycle of containers. Figure 1 depicts the lifecycle of a container.
The central piece is the container image. The container image is stored in a registry. The container engine pulls images from registries and executes them. A containerfile is one way to produce a container, there are other ways. Only when a container image has been created and pushed to a registry can the container image pull the image from the registry and start a container from the image.
Now let us take a look at how to create a container from a containerfile.
The containerfile 🗎
A typical containerfile looks as follows:
# Start from Ubuntu
FROM ubuntu:22.04
LABEL \
org.opencontainers.image.authors="Marco Bungart <mail@example.com>" \
org.opencontainers.image.licenses=Apache-2.0 \
purpose="Learning containers"
# Create a new user
RUN useradd \
--uid 1000 \
--home-dir /app \
--create-home \
--shell /bin/bash \
runner
# Switch to the new user
USER 1000
# Also possible:
# USER runner
# All commands after this line be executed relative to the WORKDIR
WORKDIR /app
# Copy the script, change owner and permissions so that it is executable
COPY \
--chown=1000:1000 \
--chmod=700 \
hello-world.sh .
# Start the script on container start
ENTRYPOINT [ "/bin/bash", "-c", "./hello-world.sh" ]
Before we look at what every single line does, we take a look at the anatomy of each line.
First of, a line starting with a #
is a comment. All lines start with an instruction, written in uppercase. A full list of instructions can be found at the corresponding docker documentation page (docs.docker.com
). The instruction is followed by the arguments for the instruction. Instructions can have multiple arguments.
The base image
Almost[1] every containerfile starts with a FROM
(docs.docker.com
). This is the containerfile we base our image off of[2]. The image name is divided in three parts:
-
the name of the image (allowed characters are
[a-zA-Z0-9/_-]
) -
an optional separator, either
:
or@
-
a tag or digest, if a separator is given (allowed characters are
[a-zA-Z0-9/_-]
)
If the separator :
is used, the part afterwards will be interpreted as a tag. Image tags are similar to tags in docker: the reference a specific image at one point in time, but they might reference different images over time. For example, the tag 22.04
of image ubuntu
might change as new security patches get available.
If the separator @
is used, the part afterwards is interpreted as digest. A digest takes the form of sha:<digest>
. An image digest will always reference the exact same image.
If no separator is given, then the special tag latest
is assumed. For every image, the tag latest
points to the latest build of that image, that has no explicit tag (we will discuss tags later).
Labels
Looking at our containerfile, the next instruction is LABEL
(docs.docker.com
). Labels are a way to add meta information (e.g. the name and contact information) of the maintainer. They do not change the behaviour of the actual image. Each label is a key=value
pair, and we can add multiple labels at once. There are standardaized labels for certain purposes (github.com
), e.g. org.opencontainers.image.authors
and org.opencontainers.image.licenses
. We can, of course, add custom labels. And we also see that we can omit the quotations around the value if the value does not include a space.
The RUN
instruction
The RUN
instruction (docs.docker.com
) is one of the most common instructions we see within a containerfile. It allows us to run a command in the container when the container is built into an image (we call this the container build time). In our containerfile, we create a new user through the useradd
command (die.net
)
We will skip the description of the USER
instruction (docs.docker.com
).
Setting the current work directory
This is another common instruction. While the WORKDIR
instruction (docs.docker.com
) is simple in its use, it is essential to understand what a containerfile does when reading one. All subsequent commands are executed relative to the last WORKDIR
. Notice that a WORKDIR
can be set to a relative path, i.e. to the previous WORKDIR
.
Getting files into the container
There are several ways to get files into a container. One of the most common ones is the COPY
instruction (docs.docker.com
). It is one of the more complex instructions. We can change the user and the file permissions through --chown=…
and chmod=…
. We can define multiple sources to copy. Each source can be a file or a directory. If the source is a directory, then the content of the directory will be copied, not the directory itself.
On top of this, we can use wildcards (?
for a single character, *
for any number of characters).
For the target, we can either define a (path to a) file name, or a directory. If we define more than one source, the target must be a directory.
The full semantics can only be explained when we take a look at how to build images, which we will do shortly, so keep on reading.
The ENTRYPOINT
, or what to do when the container starts
It is not strictly necessary to define what a container should do when it starts. In fact, the base image we are using (docker.io/Ubuntu:22.04
) does not define anything to do at container startup. Most containers, however, fulfill a specific purpose. They might provide a database, or a webserver. Thus, it is a good idea to start this service automatically when the container starts. This is normally achieved through either an ENTRYPOINT
(docs.docker.com
), a CMD
(docs.docker.com
) or both. The specifics of ENTRYPOINT
, CMD
, as well as the interaction of both will be discussed in a separate article. For now, we just look at a single ENTRYPOINT
.
The first thing we see is that the ENTRYPOINT
is an array. The second thing we see is that we do not simply call the script we want to start, but run it through /bin/bash -c …
, i.e. the first argument afterwards will be read, and all other arguments will be treated as parameters[3]. The third thing we notice is that each argument is a separate array entry.
Building images 👷
Now that we have take a look at the containerfile, we will build the containerfile. We can find the sources here (github.com
). We clone the project and switch in the root directory:
git clone https://github.com/turing85/article-2023-03-26-containerfiles.git
cd article-2023-03-26-containerfiles
To build the image, we execute:
-
docker
-
podman
docker build --file Containerfile .
podman build --file Containerfile .
Before we take a look at the output of the command, we will analyze the command itself.
The --file
parameter (the short form is -f
) instructs the engine which containerfile to use. The .
at the end sets the context directory. The context directory has a major impact on the COPY
instruction. All src
-files and -paths of COPY
instructions are always resolved relative to the context directory. Furthermore, the build process cannot escape from the context directory. As an example, we cannot use ..
(the parent directory) as src
of a COPY
instruction.
Now, let us look at the output produced by the command.
I will show the output of podman build… . If you are using docker build … instead, your output may slightly vary.
|
podman build --file Containerfile .
$ podman build --file Containerfile .
STEP 1/7: FROM docker.io/ubuntu:22.04
STEP 2/7: LABEL org.opencontainers.image.authors="Marco Bungart <mail@example.com>" org.opencontainers.image.licenses=Apache-2.0 purpose="Learning containers"
--> 94c21b5a2c8
STEP 3/7: RUN useradd --uid 1000 --home-dir /app --create-home --shell /bin/bash runner
--> 6d666a5c558
STEP 4/7: USER 1000
--> 983bc2adb88
STEP 5/7: WORKDIR /app
--> 2adac29e00d
STEP 6/7: COPY --chown=1000:1000 --chmod=700 hello-world.sh .
--> 49b68cab2c8
STEP 7/7: ENTRYPOINT [ "/bin/bash", "-c", "./hello-world.sh" ]
COMMIT
--> e1526e6c6b6
e1526e6c6b6ea123e535b3a5145736c5eda542bc7b164834fa60e809be10509e
Layered Goodness
We see that seven steps are executed, and each steps corresponds with one (non-comment) line of our containerile. Furthermore, after each step (except the first one), a (truncated) SHA-value is shown. Those are the layers of our image. An image consists of modifications of the file system, that are stacked on top of each other to form the final result, i.e. image. Those modifications are organized in said layers. A layer is similar to a commit in git: it is based on a previous layer and applies the changes to that layer. This has an important implication: If we were to copy a large file into the container in one step, and delete this file from the container in the next step (through, e.g. RUN rm large-file
), then the size of the image would be unexpectedly large. This is due to the layering in a containerimage: the file is still there, but inaccessible. Just like we cannot really delete a file from docker (when it is checked in once, it will always be in the commit history), the large file still impacts the final image size.
You might ask why containers use this layering technique. The answer is: performance, in particular transfer speed. Take a look at Figure 2. If we were to first pull Image 1, we would pull the base
image, as well as layers a
, b
, c
, x1
, y1
and z1
. If we then were to pull Image 2, we would only need to pull layer x2
since the base
and (common) layers a
, b
and c
were already pulled previously.
How to control the layers
We did not control the creation of layers; they were created automatically for us. In some cases, it might be desirable to have more fine-grained control on how layers are generated. There are tools, like Buildah (buildah.io
) or Bazel’s container image rules (github.com
).
Where is my image?
To get some details about our image, we can run:
-
docker
-
podman
docker image ls
podman image ls
to get the following (or similar) output:
$ podman image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
...
<none> <none> e1526e6c6b6e 47 minutes ago 80.7 MB
...
We see that the IMAGE ID
is consistent with the last line of the output we got when building the image. But neither the REPOSITORY
nor the TAG
is set. We can rectify this by re-building the image, but this time with a tag:
-
docker
-
podman
docker build --file Containerfile --tag hello-world .
podman build --file Containerfile --tag hello-world .
When we now list the images, we see that the REPOSITORY
is set:
podman image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
...
localhost/hello-world latest e1526e6c6b6e 51 minutes ago 80.7 MB
...
We also see that the tag
is set to latest
, as we have discussed before. We can set the tag explicitly to a value of our choice by appending a colon (:
) and then the diesired tag to the --tag
, e.g.:
-
docker
-
podman
docker build --file Containerfile --tag hello-world:1.0 .
podman build --file Containerfile --tag hello-world:1.0 .
which will result in
podman image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
...
localhost/hello-world latest e1526e6c6b6e 56 minutes ago 80.7 MB
localhost/hello-world 1.0 e1526e6c6b6e 56 minutes ago 80.7 MB
...
We see that the prefix localhost/
was added to the REPOSITORY
. Furthermore, the tag latest
is still present. This is due to the fact that it was present before. If we remove the image
-
docker
-
podman
docker image rm --force e1526e6c6b6e
podman image rm --force e1526e6c6b6e
rebuild, and list the images again, we get the following (or similar) output:
REPOSITORY TAG IMAGE ID CREATED SIZE
...
localhost/hello-world 1.0 9886f26ecc33 About a minute ago 80.7 MB
...
We can also see that the image tag changed, although we did not change the containerfile or any files involved in the build.
Re-tagging
Instead of re-building the image over and over again to change or add tags, we can re-tag the image. For example, let us assume we want to add the latest
tag to the hello-world
image. We can achieve this by executing
-
docker
-
podman
docker image tag hello-world:1.0 hello-world:latest
podman image tag hello-world:1.0 hello-world:latest
And, in fact, if we now list the image, we see that the latest
tag has been added:
podman image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
...
localhost/hello-world latest 9886f26ecc33 7 minutes ago 80.7 MB
localhost/hello-world 1.0 9886f26ecc33 7 minutes ago 80.7 MB
...
If we want to remove a tag from an image, we can use
-
docker
-
podman
docker image rm hello-world:latest
podman image rm hello-world:latest
And, in deed, the latest
tag is gone:
podman image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
...
localhost/hello-world 1.0 9886f26ecc33 12 minutes ago 80.7 MB
...
There is also the [docker|podman] untag … command. This will not only untag the specified image, but all images on this specific image id.
|
Tagging for pushing
So far, we have played around with the image locally. In fact, we used the local image repository that comes with every container runtime. Remembering Figure 1, we see that images (almost[4]) never stand on their own; they are stored in registries. To transfer images to an external registry, for example hub.docker.com
, we just re-tag the image, prefixing it with the registry host. For more information, please see the corresponding documentation at docs.docker.com
. Finally, to push an image we can run
-
docker
-
podman
docker push <image-name-with-tag>
podman push <image-name-with-tag>
To push images, we have to authenticate against the registry first. We can do so by running [docker|podmam] login <registry-host>
|
Conclusion 💡
In this article, we discussed the lifecycle of a container. We talked about how a containerfile is structured, how containerfile instructions are structured and took a detailed look on some common containerfile instructions. In the process, we learned what container build time is. We built an image and explored the concept of layers, its implications and benefits. For image management, we took a look at tagging images with repositories and tags. We learned that we can re-tag images without rebuilding them. This can be particularly useful to add tag(s) to an existing image or to prepare images for a push to remote repositories. Likewise, we saw how we can remove tags from images. We closed by shortly discussing how images can be pushed to remote repositories.
FROM
in a later tutorial
FROM scratch
, but this is an advanced topic we will not discuss here.
"-c"
is not strictly necessary in our example, but commonly used. Thus, I decided to include it here
tar.gz
format (docs.docker.com
) and later imported (docs.docker.com
). This is, however, seldom used explicitly