Containers 101: Containerfiles 🗒

Author:

Marco Bungart

Created:

2023-03-26

Last modified:

2023-04-09

Keywords:

containers

Changelog

Date	Changes
2023-03-26	Added a sentence in section Layered Goodness to draw a comparison between image layers and git commits Replaced Table 1, which represented images in a containerfile with a PlantUML diagram (Figure 2)
2023-03-29	Fixed typo
2023-03-30	Changed git clone url to HTTPS protocol Added Docker’s `buildx` as prerequisite
2023-04-02	Flipped arrows in Figure 2 to indicate that a layer does not determine its successor, but its predecessor (just like commits, where a commit defines its parent-commit)
2023-04-03	Reworded a sentence Prefixed all images from dockerhub with `docker.io/`
2023-04-09	Fixed some formatting Replace all `[source, docker]` with `[source, dockerfile]`

Motivation

In the previous article of this series, we explored containers as a concept, how they differ from virtual machines and how they are realized on a high level. In this article we will see how containers are defined.

Preparation

If you want to follow along, you need a means to build container images from containerfiles. I recommend to either use Docker (docs.docker.com) or Podman (podman.io). If you are using docker, you also need to install buildx (docs.docker.com).

Terminology

You may notice that this article is named "containerfiles", and you may have heard the term "dockerfile". Right now, they are synonyms. I prefer using the term "containerfile" since it is provider-independent.

Where does the term "dockerfile" 🐋 come from?

As I have stated above, Docker provides a container engine, but it is far from the only one. Other popular examples are runC (github.com), containerd (containerd.io), and cri-o (crio.io). Interoperability is governed by the Open Container Initiative (short: OCI, opencontainers.org). Docker was (and most probably still is) the most popular container engine in a development setting. For a cluster setting, however, it is not so clear. Kubernetes (kubernetes.io), for example, removed support for Docker back in 2020 (kubernetes.io). In any case, its dominant position in the early days gave rise to the term "dockerfile".

Lifecycle of a container 🗄️

To understand what a containerfile actually is, we need to understand the lifecycle of containers. Figure 1 depicts the lifecycle of a container.

Figure 1. Lifecycle of a container

The central piece is the container image. The container image is stored in a registry. The container engine pulls images from registries and executes them. A containerfile is one way to produce a container, there are other ways. Only when a container image has been created and pushed to a registry can the container image pull the image from the registry and start a container from the image.

Now let us take a look at how to create a container from a containerfile.

The containerfile 🗎

A typical containerfile looks as follows:

Listing 1. A typical containerfile

# Start from Ubuntu
FROM ubuntu:22.04
LABEL \
  org.opencontainers.image.authors="Marco Bungart <mail@example.com>" \
  org.opencontainers.image.licenses=Apache-2.0 \
  purpose="Learning containers"

# Create a new user
RUN useradd \
  --uid 1000 \
  --home-dir /app \
  --create-home \
  --shell /bin/bash \
  runner

# Switch to the new user
USER 1000
# Also possible:
# USER runner

# All commands after this line be executed relative to the WORKDIR
WORKDIR /app

# Copy the script, change owner and permissions so that it is executable
COPY \
  --chown=1000:1000 \
  --chmod=700 \
  hello-world.sh .

# Start the script on container start
ENTRYPOINT [ "/bin/bash", "-c", "./hello-world.sh" ]

Before we look at what every single line does, we take a look at the anatomy of each line.

First of, a line starting with a # is a comment. All lines start with an instruction, written in uppercase. A full list of instructions can be found at the corresponding docker documentation page (docs.docker.com). The instruction is followed by the arguments for the instruction. Instructions can have multiple arguments.

The base image

Almost^[1] every containerfile starts with a FROM (docs.docker.com). This is the containerfile we base our image off of^[2]. The image name is divided in three parts:

the name of the image (allowed characters are [a-zA-Z0-9/_-])
an optional separator, either : or @
a tag or digest, if a separator is given (allowed characters are [a-zA-Z0-9/_-])

If the separator : is used, the part afterwards will be interpreted as a tag. Image tags are similar to tags in docker: the reference a specific image at one point in time, but they might reference different images over time. For example, the tag 22.04 of image ubuntu might change as new security patches get available.

If the separator @ is used, the part afterwards is interpreted as digest. A digest takes the form of sha:<digest>. An image digest will always reference the exact same image.

If no separator is given, then the special tag latest is assumed. For every image, the tag latest points to the latest build of that image, that has no explicit tag (we will discuss tags later).

Labels

Looking at our containerfile, the next instruction is LABEL (docs.docker.com). Labels are a way to add meta information (e.g. the name and contact information) of the maintainer. They do not change the behaviour of the actual image. Each label is a key=value pair, and we can add multiple labels at once. There are standardaized labels for certain purposes (github.com), e.g. org.opencontainers.image.authors and org.opencontainers.image.licenses. We can, of course, add custom labels. And we also see that we can omit the quotations around the value if the value does not include a space.

The `RUN` instruction

The RUN instruction (docs.docker.com) is one of the most common instructions we see within a containerfile. It allows us to run a command in the container when the container is built into an image (we call this the container build time). In our containerfile, we create a new user through the useradd command (die.net)

We will skip the description of the USER instruction (docs.docker.com).

Setting the current work directory

This is another common instruction. While the WORKDIR instruction (docs.docker.com) is simple in its use, it is essential to understand what a containerfile does when reading one. All subsequent commands are executed relative to the last WORKDIR. Notice that a WORKDIR can be set to a relative path, i.e. to the previous WORKDIR.

Getting files into the container

There are several ways to get files into a container. One of the most common ones is the COPY instruction (docs.docker.com). It is one of the more complex instructions. We can change the user and the file permissions through --chown=… and chmod=…. We can define multiple sources to copy. Each source can be a file or a directory. If the source is a directory, then the content of the directory will be copied, not the directory itself.

On top of this, we can use wildcards (? for a single character, * for any number of characters).

For the target, we can either define a (path to a) file name, or a directory. If we define more than one source, the target must be a directory.

The full semantics can only be explained when we take a look at how to build images, which we will do shortly, so keep on reading.

The `ENTRYPOINT`, or what to do when the container starts

It is not strictly necessary to define what a container should do when it starts. In fact, the base image we are using (docker.io/Ubuntu:22.04) does not define anything to do at container startup. Most containers, however, fulfill a specific purpose. They might provide a database, or a webserver. Thus, it is a good idea to start this service automatically when the container starts. This is normally achieved through either an ENTRYPOINT (docs.docker.com), a CMD (docs.docker.com) or both. The specifics of ENTRYPOINT, CMD, as well as the interaction of both will be discussed in a separate article. For now, we just look at a single ENTRYPOINT.

The first thing we see is that the ENTRYPOINT is an array. The second thing we see is that we do not simply call the script we want to start, but run it through /bin/bash -c …, i.e. the first argument afterwards will be read, and all other arguments will be treated as parameters^[3]. The third thing we notice is that each argument is a separate array entry.

Building images 👷

Now that we have take a look at the containerfile, we will build the containerfile. We can find the sources here (github.com). We clone the project and switch in the root directory:

Listing 2. Cloning the project

git clone https://github.com/turing85/article-2023-03-26-containerfiles.git
cd article-2023-03-26-containerfiles

To build the image, we execute:

docker
podman

docker build --file Containerfile .

podman build --file Containerfile .

Before we take a look at the output of the command, we will analyze the command itself.

The --file parameter (the short form is -f) instructs the engine which containerfile to use. The . at the end sets the context directory. The context directory has a major impact on the COPY instruction. All src-files and -paths of COPY instructions are always resolved relative to the context directory. Furthermore, the build process cannot escape from the context directory. As an example, we cannot use .. (the parent directory) as src of a COPY instruction.

Now, let us look at the output produced by the command.

I will show the output of podman build…. If you are using docker build … instead, your output may slightly vary.

Listing 3. Output of podman build --file Containerfile .

$ podman build --file Containerfile .
STEP 1/7: FROM docker.io/ubuntu:22.04
STEP 2/7: LABEL   org.opencontainers.image.authors="Marco Bungart <mail@example.com>"   org.opencontainers.image.licenses=Apache-2.0   purpose="Learning containers"
--> 94c21b5a2c8
STEP 3/7: RUN useradd   --uid 1000   --home-dir /app   --create-home   --shell /bin/bash   runner
--> 6d666a5c558
STEP 4/7: USER 1000
--> 983bc2adb88
STEP 5/7: WORKDIR /app
--> 2adac29e00d
STEP 6/7: COPY   --chown=1000:1000   --chmod=700   hello-world.sh .
--> 49b68cab2c8
STEP 7/7: ENTRYPOINT [ "/bin/bash", "-c", "./hello-world.sh" ]
COMMIT
--> e1526e6c6b6
e1526e6c6b6ea123e535b3a5145736c5eda542bc7b164834fa60e809be10509e

Layered Goodness

We see that seven steps are executed, and each steps corresponds with one (non-comment) line of our containerile. Furthermore, after each step (except the first one), a (truncated) SHA-value is shown. Those are the layers of our image. An image consists of modifications of the file system, that are stacked on top of each other to form the final result, i.e. image. Those modifications are organized in said layers. A layer is similar to a commit in git: it is based on a previous layer and applies the changes to that layer. This has an important implication: If we were to copy a large file into the container in one step, and delete this file from the container in the next step (through, e.g. RUN rm large-file), then the size of the image would be unexpectedly large. This is due to the layering in a containerimage: the file is still there, but inaccessible. Just like we cannot really delete a file from docker (when it is checked in once, it will always be in the commit history), the large file still impacts the final image size.

Figure 2. Two Images with common layers

You might ask why containers use this layering technique. The answer is: performance, in particular transfer speed. Take a look at Figure 2. If we were to first pull Image 1, we would pull the base image, as well as layers a, b, c, x¹, y¹ and z¹. If we then were to pull Image 2, we would only need to pull layer x² since the base and (common) layers a, b and c were already pulled previously.

How to control the layers

We did not control the creation of layers; they were created automatically for us. In some cases, it might be desirable to have more fine-grained control on how layers are generated. There are tools, like Buildah (buildah.io) or Bazel’s container image rules (github.com).

Where is my image?

To get some details about our image, we can run:

docker
podman

docker image ls

podman image ls

to get the following (or similar) output:

$ podman image ls
REPOSITORY                                    TAG                 IMAGE ID      CREATED         SIZE
...
<none>                                        <none>              e1526e6c6b6e  47 minutes ago  80.7 MB
...

We see that the IMAGE ID is consistent with the last line of the output we got when building the image. But neither the REPOSITORY nor the TAG is set. We can rectify this by re-building the image, but this time with a tag:

docker
podman

docker build --file Containerfile --tag hello-world .

podman build --file Containerfile --tag hello-world .

When we now list the images, we see that the REPOSITORY is set:

podman image ls
REPOSITORY                                    TAG                 IMAGE ID      CREATED         SIZE
...
localhost/hello-world                         latest              e1526e6c6b6e  51 minutes ago  80.7 MB
...

We also see that the tag is set to latest, as we have discussed before. We can set the tag explicitly to a value of our choice by appending a colon (:) and then the diesired tag to the --tag, e.g.:

docker
podman

docker build --file Containerfile --tag hello-world:1.0 .

podman build --file Containerfile --tag hello-world:1.0 .

which will result in

podman image ls
REPOSITORY                                    TAG                 IMAGE ID      CREATED         SIZE
...
localhost/hello-world                         latest              e1526e6c6b6e  56 minutes ago  80.7 MB
localhost/hello-world                         1.0                 e1526e6c6b6e  56 minutes ago  80.7 MB
...

We see that the prefix localhost/ was added to the REPOSITORY. Furthermore, the tag latest is still present. This is due to the fact that it was present before. If we remove the image

docker
podman

docker image rm --force  e1526e6c6b6e

podman image rm --force  e1526e6c6b6e

rebuild, and list the images again, we get the following (or similar) output:

REPOSITORY                                    TAG                 IMAGE ID      CREATED             SIZE
...
localhost/hello-world                         1.0                 9886f26ecc33  About a minute ago  80.7 MB
...

We can also see that the image tag changed, although we did not change the containerfile or any files involved in the build.

Re-tagging

Instead of re-building the image over and over again to change or add tags, we can re-tag the image. For example, let us assume we want to add the latest tag to the hello-world image. We can achieve this by executing

docker
podman

docker image tag hello-world:1.0 hello-world:latest

podman image tag hello-world:1.0 hello-world:latest

And, in fact, if we now list the image, we see that the latest tag has been added:

podman image ls
REPOSITORY                                    TAG                 IMAGE ID      CREATED        SIZE
...
localhost/hello-world                         latest              9886f26ecc33  7 minutes ago  80.7 MB
localhost/hello-world                         1.0                 9886f26ecc33  7 minutes ago  80.7 MB
...

If we want to remove a tag from an image, we can use

docker
podman

docker image rm hello-world:latest

podman image rm hello-world:latest

And, in deed, the latest tag is gone:

podman image ls
REPOSITORY                                    TAG                 IMAGE ID      CREATED         SIZE
...
localhost/hello-world                         1.0                 9886f26ecc33  12 minutes ago  80.7 MB
...

There is also the [docker|podman] untag … command. This will not only untag the specified image, but all images on this specific image id.

Tagging for pushing

So far, we have played around with the image locally. In fact, we used the local image repository that comes with every container runtime. Remembering Figure 1, we see that images (almost^[4]) never stand on their own; they are stored in registries. To transfer images to an external registry, for example hub.docker.com, we just re-tag the image, prefixing it with the registry host. For more information, please see the corresponding documentation at docs.docker.com . Finally, to push an image we can run

docker
podman

docker push <image-name-with-tag>

podman push <image-name-with-tag>

To push images, we have to authenticate against the registry first. We can do so by running [docker|podmam] login <registry-host>

Conclusion 💡

In this article, we discussed the lifecycle of a container. We talked about how a containerfile is structured, how containerfile instructions are structured and took a detailed look on some common containerfile instructions. In the process, we learned what container build time is. We built an image and explored the concept of layers, its implications and benefits. For image management, we took a look at tagging images with repositories and tags. We learned that we can re-tag images without rebuilding them. This can be particularly useful to add tag(s) to an existing image or to prepare images for a push to remote repositories. Likewise, we saw how we can remove tags from images. We closed by shortly discussing how images can be pushed to remote repositories.

1. We will see cases where a containerfile does not start with a FROM in a later tutorial

2. There is the possibility to start with a blank containerfile. In this case, we would use FROM scratch, but this is an advanced topic we will not discuss here.

3. The "-c" is not strictly necessary in our example, but commonly used. Thus, I decided to include it here

4. It is actually possible to export images in tar.gz format (docs.docker.com) and later imported (docs.docker.com). This is, however, seldom used explicitly