Building a Docker image
In the previous section we downloaded a Docker image of Ubuntu and noticed that it was based on layers, each with a unique hash as id. An image in Docker is based on a number of read-only layers, where each layer contains the differences to the previous layers. If you kow how git is working this might remind you of how a Git commit contains the difference to the previous commit. The great thing about this is that we can start from one base layer, say containing an operating system and some utility programs, and then generate many new images based on this, say 10 different project-specific images.
Docker provides a convenient way to describe how to go from a base image to the image we want by using a Dockerfile. This is a simple text file containing the instructions for how to generate each layer. Docker images are typically quite large, often several GBs, while Dockerfiles are small and serve as blueprints for the images. It is therefore good practice to have your Dockerfile in your project Git repository, since it allows other users to exactly replicate your project environment.
Understanding Dockerfiles#
We will now go through a Dockerfile file example below and discuss the different steps and what they do.
See full Dockerfile
FROM ubuntu:24.04
LABEL description="Minimal image for the Container training."
LABEL maintainer="my.self@institute.fr"
# Use bash as shell
SHELL ["/bin/bash", "-c"]
# Set workdir
WORKDIR /course
# Install necessary tools
RUN apt-get update && \
apt-get install -y --no-install-recommends bzip2 \
ca-certificates \
curl \
fontconfig \
git \
language-pack-en \
tzdata \
vim \
unzip \
wget \
&& apt-get clean
# Install Miniconda and add to PATH
RUN curl -L https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O && \
bash Miniforge3-Linux-x86_64.sh -bf -p /usr/miniforge3/ && \
rm Miniforge3-Linux-x86_64.sh && \
/usr/miniforge3/bin/conda clean -tipy && \
ln -s /usr/miniforge3/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /usr/miniforge3/etc/profile.d/conda.sh" >> ~/.bashrc && \
echo "conda activate base" >> ~/.bashrc
# Add conda to PATH and set locale
ENV PATH="/usr/miniforge3/bin:${PATH}"
ENV LC_ALL=en_US.UTF-8
ENV LC_LANG=en_US.UTF-8
# Configure Conda channels and clean up
RUN conda config --add channels bioconda \
&& conda config --add channels conda-forge \
&& conda config --set channel_priority strict \
&& conda clean --all
# Open port for running Jupyter Notebook
EXPOSE 8888
# Start Bash shell by default
CMD /bin/bash
Each line in the Dockerfile will typically result in one layer in the resulting image. The format for Dockerfiles is INSTRUCTION arguments
. A full specification of the format, together with best practices, can be found here.
Here are the first few lines:
FROM ubuntu:24.04
LABEL description="Minimal image for the Container training."
LABEL maintainer="my.self@institute.fr"
Here we use the instructions FROM
and LABEL
.
FROM
is the most important, which specifies the base image our image should start from. In this case we want it to beUbuntu 24.04
, which is one of the official repositories. There are many roads to Rome when it comes to choosing the best image to start from. Say you want to run RStudio in a Conda environment through a Jupyter notebook. You could then start from one of the rocker images for R, a Miniforge image, or a Jupyter image. Or you just start from one of the low-level official images and set up everything from scratch.LABEL
is just a way to provide meta-data.
Let's take a look at the next section of the Dockerfile.
# Use bash as shell
SHELL ["/bin/bash", "-c"]
# Set workdir
WORKDIR /course
SHELL
simply sets which shell to use.WORKDIR
determines the directory the container should start in.
The next few lines introduce the important RUN
instruction, which is used for executing shell commands:
# Install necessary tools
RUN apt-get update && \
apt-get install -y --no-install-recommends bzip2 \
ca-certificates \
curl \
fontconfig \
git \
language-pack-en \
tzdata \
vim \
unzip \
wget \
&& apt-get clean
- This
RUN
command will update the apt-get package lists, install a list of packages (i.e.vim
,unzip
,wget
, etc.) and finally clean up local cache files created by apt-get.
# Install Miniconda and add to PATH
RUN curl -L https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O && \
bash Miniforge3-Linux-x86_64.sh -bf -p /usr/miniforge3/ && \
rm Miniforge3-Linux-x86_64.sh && \
/usr/miniforge3/bin/conda clean -tipy && \
ln -s /usr/miniforge3/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /usr/miniforge3/etc/profile.d/conda.sh" >> ~/.bashrc && \
echo "conda activate base" >> ~/.bashrc
- This second
RUN
command will perform many steps that can be decomposed as fallow:
See here
# Download Miniforge3
curl -L https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O && \
# Install it
bash Miniforge3-Linux-x86_64.sh -bf -p /usr/miniforge3/ && \
# Remove the downloaded installation file
rm Miniforge3-Linux-x86_64.sh && \
# Remove unused packages and caches
/usr/miniforge3/bin/conda clean -tipy && \
# Permanently enable the Conda command
ln -s /usr/miniforge3/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /usr/miniforge3/etc/profile.d/conda.sh" >> ~/.bashrc && \
# Add the base environment permanently to PATH
echo "conda activate base" >> ~/.bashrc
Tip
As a general rule, each layer in a Docker image should represent a single, logical task. For example, if you’re installing a program, the RUN command should handle downloading, installing, and cleaning up in one step.
This approach is crucial to reduces image size by limiting the number of layers.
If you had these commands in separate RUN lines:
RUN apt-get update
RUN apt-get install -y <package>
apt-get update
would be stored in the first layer, which is unnecessary. The same remarks apply to the the conda RUN step. Let's take a look at the next section of the Dockerfile.
# Add conda to PATH and set locale
ENV PATH="/usr/miniconda3/bin:${PATH}"
ENV LC_ALL en_US.UTF-8
ENV LC_LANG en_US.UTF-8
ENV
instruction is used to set environment variables.
The first command addsconda
to the path, so we can writeconda install
instead of/usr/miniconda3/bin/conda install
. The next two commands set a UTF-8 character encoding so that we can use weird characters (and a bunch of other things).
# Configure Conda channels and clean up
RUN conda config --add channels bioconda \
&& conda config --add channels conda-forge \
&& conda config --set channel_priority strict \
Here we just configure Conda to be sure channels are setup correctly.
# Open port for running Jupyter Notebook
EXPOSE 8888
# Start Bash shell by default
CMD /bin/bash
EXPOSE
opens up the port 8888, so that we can for example later run a Jupyter Notebook server on that port.CMD
is an interesting instruction. It sets what a container should run when nothing else is specified. It can be used for example for printing some information on how to use the image or, as here, start a shell for the user. If the purpose of your image is to accompany a publication thenCMD
could be to run the workflow that generates the paper figures from raw data.
Building from Dockerfiles#
Ok, so now we understand how a Dockerfile works. Constructing the image from the Dockerfile is really simple. Try it out now, copy past the set of commands in a file called Dockerfile_test
and then run:
docker build -f Dockerfile_test -t my_first_docker_image .
This should result in something similar to this:
[+] Building 0.0s (9/9) FINISHED docker:desktop-linux
=> [internal] load build definition from Dockerfile_test 0.0s
=> => transferring dockerfile: 1.78kB 0.0s
=> [internal] load metadata for docker.io/library/ubuntu:24.04 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [1/5] FROM docker.io/library/ubuntu:24.04 0.0s
=> CACHED [2/5] WORKDIR /course 0.0s
=> CACHED [3/5] RUN apt-get update && apt-get install -y --no-install-recommends bzip2 ca-certificates 0.0s
=> CACHED [4/5] RUN curl -L https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O && bash Miniforge3-Linux-x8 0.0s
=> CACHED [5/5] RUN conda config --add channels bioconda && conda config --add channels conda-forge && conda config --set channel_priority strict 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:948b49deccbe4a4f22cb3cfebe2617081c370650f859715f8dc751e53e7a7fe5 0.0s
=> => naming to docker.io/library/my_first_docker_image 0.0ss
- The
-f
flag sets which Dockerfile to use. - The
-t
tags the image with a name. This name is how you will refer to the image later. - Lastly, the
.
is the path to where the image should be build (.
means the current directory). This had no real impact in this case, but matters if you want to import files.
Now validate by checking you can see your new image:
docker image ls
Creating your own Dockerfile (Optional)#
Now it's time to make a Dockerfile on your own, following some instructions:
-
Create a Dockerfile called
Dockerfile_training
. -
Set
FROM
to the miniforge image. -
Install package(s) of your choice with
Conda
.
The packages will be installed to the default environment namedbase
inside the container. -
Add a file from your computer to the image by using the
COPY
instruction. The syntax isCOPY source target
, so in our case simplyCOPY source .
to copy to the work directory in the image. -
Set a default command for the image using the
CMD
instruction.
If it seems overwhelming you can take a look at the example below:
Click to show the solution
First I create a file for point 4 echo "This is/was a file from my computer" >> local_file.txt
Then I create the Dockerfile_training
with the following:
FROM condaforge/miniforge3:24.7.1-2
RUN conda config --add channels bioconda
RUN conda install -n base hisat2=2.2.1 samtools=1.21
COPY local_file.txt .
CMD ["hisat2", "--help"]
Warning
Does not work for you?
You are probably using a computer with ARM/AArch platform, then the default miniforge3 image downloaded is specific to that platform.
If you check hisat2 on conda you will see that is not available for this patform (compare with samtools)
The solution is to force docker to use an X86/amd64 platform miniforge image as base image:
FROM --platform=linux/amd64 condaforge/miniforge3:24.7.1-2
Build the image and tag it my_docker_training
:
docker build -t my_docker_training -f Dockerfile_training .
Verify that the image was built using:
docker image ls
Run the image to see if the default container instruction works (CMD
):
docker run my_docker_training
Now let's run the container interactively to see the presence of the file you copied in the container :
docker run -it my_docker_training /bin/bash
ls -l
Quick recap
In this section we've learned:
- How the keywords
FROM
,LABEL
,RUN
,ENV
,SHELL
,WORKDIR
, andCMD
can be used when writing a Dockerfile. - The importance of letting each layer in the Dockerfile be a "logical unit".
- How to use
docker build
to construct and tag an image from a Dockerfile. - How to create your own Dockerfile from miniforge base image.