Skip to content

The basics

We're almost ready to start, just one last note on nomenclature. You might have noticed that we sometimes refer to "Docker images" and sometimes to "Docker containers". A container is simply an instance of an image. To use a programming metaphor, if an image is a class, then a container is an instance of that class — a runtime object. You can have an image containing, say, a certain Linux distribution, and then start multiple containers running that same OS.

Warning

If you don't have root privileges you have to prepend all Docker commands with sudo.

Where to find containers#

Containers can be found on registry/platforms like DockerHub, which hosts a wide variety of ready-to-use images for different software and environments.

Question

Could you site other registries wher to find containers?

Click to show the solution
  • Biocontainers.pro (PArtners: bioconda, nextflow, elixir, galaxy...)
  • Quay.io (Redhat)
  • Gitlab (Container Registry) - images are linked to the repositories where their corresponding Dockerfiles are maintained
  • Github (Container Registry) - images are linked to the repositories where their corresponding Dockerfiles are maintained
  • Let's see together the ubuntu container on DockerHub.

Downloading containers#

Docker containers typically run Linux, so let's start by downloading an image containing Ubuntu (a popular Linux distribution that is based on only open-source tools) through the command line.

docker pull ubuntu

Question

Which version of Ubuntu will be downloaded?

Click to show the solution

By default it will download the latest version.

You will notice that it downloads different layers with weird hashes as names. This represents a very fundamental property of Docker images that we'll get back to in just a little while. The process should end with something along the lines of:

Status: Downloaded newer image for ubuntu docker.io/library/ubuntu

Let's take a look at our new and growing collection of Docker images:

docker image ls

The Ubuntu image show show up in this list, with something looking like this:

REPOSITORY       TAG              IMAGE ID            CREATED             SIZE
ubuntu           latest           d70eaf7277ea        3 weeks ago         72.9MB

Running containers#

We can now start a container running our image. We can refer to the image either by "REPOSITORY:TAG" ("latest" is the default so we can omit it) or "IMAGE ID".
The syntax for docker run is:

docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
  • [OPTIONS] The OPTIONS part is related to the docker options. To see the available options run docker run --help.
  • [COMMAND] The COMMAND part is any command that you want to run inside the container, it can be a script that you have written yourself, a command line tool or a complete workflow.
  • [ARG] The ARG part is where you put optional arguments that the command will use.

Let's run the command uname -a to get some info about the operating system.
First run on your own system (use systeminfo if you are on Windows):

uname -a

This should print something like this to your command line:

Darwin liv433l.lan 15.6.0 Darwin Kernel Version 15.6.0: Mon Oct 2 22:20:08 PDT 2017; root:xnu-3248.71.4~1/RELEASE_X86_64 x86_64

Seems like I'm running the Darwin version of macOS. Then run it in the Ubuntu Docker container:

docker run ubuntu uname -a

Here I get the following result:

Linux 24d063b5d877 5.4.39-linuxkit #1 SMP Fri May 8 23:03:06 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

And now I'm running on Linux! Try the same thing with whoami.

Running interactively#

So, seems we can execute arbitrary commands on Linux. Seems useful, but maybe a bit limited. We can also get an interactive terminal with the flags -it.

docker run -it ubuntu

This should put at a terminal prompt inside a container running Ubuntu. Your prompt should now look similar to:

root@1f339e929fa9:/#

Here you can do whatever; install, run, remove stuff. It will still be within the container and never affect your host system.

  • create a file touch my_test_file.txt
  • check the file exists ls -l
  • exit the container with exit
  • reopen the container docker run -it ubuntu
  • check the file exists ls -l

Question

Is the file still here? Why?

Click to show the solution

A container instance is ephemere. When starting the container is made from the image which is Immutable.

Tip

In some cases to run container interactively you will have to use the docker run -it <image_name> /bin/bash instead of docker run -it <image_name>. This is related to the the default command or entrypoint (we will learn about them later) for the container image. If the container’s default entrypoint or command is something other than a shell (e.g., a script or a service), you will need to specify the shell via /bin/bash or /bin/sh according to the available shell to interact with it manually.

Mount data into containers (Bind mounts)#

There are obviously some advantages to isolating and running your data analysis in containers, but at some point you need to be able to interact with the rest of the host system (e.g. your laptop) to actually deliver the results. This is done via bind mounts. When you use a bind mount, a file or directory on the host machine is mounted into a container. That way, when the container generates a file in such a directory it will appear in the mounted directory on your host system.

Tip

Docker also has a more advanced way of data storage called volumes. Volumes provide added flexibility and are independent of the host machine’s file system having a specific directory structure available. They are particularly useful when you want to share data between containers.

  • First of all let's check some of your linux knowledge.

Question

How would you specify the current directory?
1 providing an absolute path
2 providing a relative path
3 PWD
4 pwd
5 .

Click to show the solution

All of them ^^

Let's know have a pratical case how to mount data in a container. In that practical we will build an index of a fasta file with bowtie2.

  • First of all let's create a folder and put some data in it:

    mkdir -p ~/containers-training/analysis
    cd ~/containers-training/analysis
    curl -o NCTC8325.fa.gz ftp://ftp.ensemblgenomes.org/pub/bacteria/release-37/fasta/bacteria_18_collection/staphylococcus_aureus_subsp_aureus_nctc_8325/dna/Staphylococcus_aureus_subsp_aureus_nctc_8325.ASM1342v1.dna_rm.toplevel.fa.gz
    gunzip -c NCTC8325.fa.gz > NCTC8325.fa
    

  • check your current directory with pwd

  • check the content of your current drectory with ls -l

  • Now try running the following Bash code:

docker run -v .:/analysis_in_container quay.io/biocontainers/bowtie2:2.5.0--py310h8d7afc0_0 bowtie2-build /analysis_in_container/NCTC8325.fa /analysis_in_container/NCTC8325

Command explanation

Docker will automatically download the container image for Bowtie2 version 2.5.1 from the remote repository https://quay.io/repository/biocontainers/bowtie2 and subsequently run the command!
This is the docker run [OPTIONS] IMAGE [COMMAND] [ARG...] syntax just like before.
In this case quay.io/biocontainers/bowtie2:2.5.1--py39h3321a2d_0 is the IMAGE but instead of first downloading and then running it we point to its remote location directly, which will cause Docker to download it on the fly.
The bowtie2-build part is the COMMAND followed by the ARG (the input tempfile and the output index)

The -v .:/analysis_in_container part is the OPTIONS which we use to mount the current directory inside the container in order to make the local analysis folder available to Bowtie2.

Tip

The -v $(pwd):/analysis_in_container or -v $(PWD):/analysis_in_container is also commonly used. Pay attention that the pwd command has to be interpreted before the whole command line. This is done using the $(<command>) syntax.

  • Now check the content of your current drectory with ls -l.

You should oberve the output produced in the container, which are present localy.

Tip

We've been discussing Docker in the context of running tools. Another application is as a kind of very powerful environment manager, similarly to Conda or PIXI. If you've organized your work into projects, then you can mount the whole project directory in a container and use the container as the terminal for running stuff while still using your normal OS for editing files and so on.

Quick recap

In this section we've learned:
- Where to find containers.
- How to use docker pull for downloading images from a central registry.
- How to use docker image ls for getting information about the images we have on our system.
- How to use docker run for starting a container from an image.
- How to use the -it flag for running in interactive mode.
- How to use bind mounts to share data between the container and the host system with -v pathComputer:pathInTheContainer.