The basics
We're almost ready to start, just one last note on nomenclature. You might have noticed that we sometimes refer to "Docker images" and sometimes to "Docker containers". A container is simply an instance of an image. To use a programming metaphor, if an image is a class, then a container is an instance of that class — a runtime object. You can have an image containing, say, a certain Linux distribution, and then start multiple containers running that same OS.
Warning
If you don't have root privileges you have to prepend all Docker commands with sudo
.
Where to find containers#
Containers can be found on registry/platforms like DockerHub, which hosts a wide variety of ready-to-use images for different software and environments.
Question
Could you site other registries wher to find containers?
Click to show the solution
- Biocontainers.pro (PArtners: bioconda, nextflow, elixir, galaxy...)
- Quay.io (Redhat)
- Gitlab (Container Registry) - images are linked to the repositories where their corresponding Dockerfiles are maintained
- Github (Container Registry) - images are linked to the repositories where their corresponding Dockerfiles are maintained
- Let's see together the ubuntu container on DockerHub.
Downloading containers#
Docker containers typically run Linux, so let's start by downloading an image containing Ubuntu (a popular Linux distribution that is based on only open-source tools) through the command line.
docker pull ubuntu
Question
Which version of Ubuntu will be downloaded?
Click to show the solution
By default it will download the latest version.
You will notice that it downloads different layers with weird hashes as names. This represents a very fundamental property of Docker images that we'll get back to in just a little while. The process should end with something along the lines of:
Let's take a look at our new and growing collection of Docker images:
docker image ls
The Ubuntu image show show up in this list, with something looking like this:
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu latest d70eaf7277ea 3 weeks ago 72.9MB
Running containers#
We can now start a container running our image. We can refer to the image either by "REPOSITORY:TAG" ("latest" is the default so we can omit it) or "IMAGE ID".
The syntax for docker run
is:
- [OPTIONS] The OPTIONS part is related to the docker options. To see the available options run
docker run --help
. - [COMMAND] The COMMAND part is any command that you want to run inside the container, it can be a script that you have written yourself, a command line tool or a complete workflow.
- [ARG] The ARG part is where you put optional arguments that the command will use.
Let's run the command uname -a
to get some info about the operating system.
First run on your own system (use systeminfo
if you are on Windows):
uname -a
This should print something like this to your command line:
Seems like I'm running the Darwin version of macOS. Then run it in the Ubuntu Docker container:
docker run ubuntu uname -a
Here I get the following result:
And now I'm running on Linux! Try the same thing with whoami
.
Running interactively#
So, seems we can execute arbitrary commands on Linux. Seems useful, but maybe a bit limited. We can also get an interactive terminal with the flags -it
.
docker run -it ubuntu
This should put at a terminal prompt inside a container running Ubuntu. Your prompt should now look similar to:
root@1f339e929fa9:/#
Here you can do whatever; install, run, remove stuff. It will still be within the container and never affect your host system.
- create a file
touch my_test_file.txt
- check the file exists
ls -l
- exit the container with
exit
- reopen the container
docker run -it ubuntu
- check the file exists
ls -l
Question
Is the file still here? Why?
Click to show the solution
A container instance is ephemere. When starting the container is made from the image which is Immutable
.
Tip
In some cases to run container interactively you will have to use the docker run -it <image_name> /bin/bash
instead of docker run -it <image_name>
. This is related to the the default command or entrypoint (we will learn about them later) for the container image. If the container’s default entrypoint or command is something other than a shell (e.g., a script or a service), you will need to specify the shell via /bin/bash
or /bin/sh
according to the available shell to interact with it manually.
Mount data into containers (Bind mounts)#
There are obviously some advantages to isolating and running your data analysis in containers, but at some point you need to be able to interact with the rest of the host system (e.g. your laptop) to actually deliver the results. This is done via bind mounts. When you use a bind mount, a file or directory on the host machine is mounted into a container. That way, when the container generates a file in such a directory it will appear in the mounted directory on your host system.
Tip
Docker also has a more advanced way of data storage called volumes. Volumes provide added flexibility and are independent of the host machine’s file system having a specific directory structure available. They are particularly useful when you want to share data between containers.
- First of all let's check some of your linux knowledge.
Question
How would you specify the current directory?
1 providing an absolute path
2 providing a relative path
3 PWD
4 pwd
5 .
Click to show the solution
All of them ^^
Let's know have a pratical case how to mount data in a container. In that practical we will build an index of a fasta file with bowtie2.
-
First of all let's create a folder and put some data in it:
mkdir -p ~/containers-training/analysis cd ~/containers-training/analysis curl -o NCTC8325.fa.gz ftp://ftp.ensemblgenomes.org/pub/bacteria/release-37/fasta/bacteria_18_collection/staphylococcus_aureus_subsp_aureus_nctc_8325/dna/Staphylococcus_aureus_subsp_aureus_nctc_8325.ASM1342v1.dna_rm.toplevel.fa.gz gunzip -c NCTC8325.fa.gz > NCTC8325.fa
-
check your current directory with
pwd
-
check the content of your current drectory with
ls -l
-
Now try running the following Bash code:
docker run -v .:/analysis_in_container quay.io/biocontainers/bowtie2:2.5.0--py310h8d7afc0_0 bowtie2-build /analysis_in_container/NCTC8325.fa /analysis_in_container/NCTC8325
Command explanation
Docker will automatically download the container image for Bowtie2 version 2.5.1
from the remote repository https://quay.io/repository/biocontainers/bowtie2
and subsequently run the command!
This is the docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
syntax just like before.
In this case quay.io/biocontainers/bowtie2:2.5.1--py39h3321a2d_0
is the IMAGE but instead of first downloading and then running it we point to its remote location directly, which will cause Docker to download it on the fly.
The bowtie2-build
part is the COMMAND followed by the ARG (the input tempfile and the output index)
The -v .:/analysis_in_container
part is the OPTIONS which we use to mount the current directory inside the container in order to make the local analysis
folder available to Bowtie2.
Tip
The -v $(pwd):/analysis_in_container
or -v $(PWD):/analysis_in_container
is also commonly used. Pay attention that the pwd
command has to be interpreted before the whole command line. This is done using the $(<command>)
syntax.
- Now check the content of your current drectory with
ls -l
.
You should oberve the output produced in the container, which are present localy.
Tip
We've been discussing Docker in the context of running tools. Another application is as a kind of very powerful environment manager, similarly to Conda or PIXI. If you've organized your work into projects, then you can mount the whole project directory in a container and use the container as the terminal for running stuff while still using your normal OS for editing files and so on.
Quick recap
In this section we've learned:
- Where to find containers.
- How to use docker pull
for downloading images from a central registry.
- How to use docker image ls
for getting information about the images we have on our system.
- How to use docker run
for starting a container from an image.
- How to use the -it
flag for running in interactive mode.
- How to use bind mounts to share data between the container and the host system with -v pathComputer:pathInTheContainer
.