Using Docker (and more!)

If you have questions about this guide or you want to make modifications please contact me @Gordon Zhong (Unlicensed) (if I’m still around) or simply just add to it.

1 Table of contents
2 Headnotes
3 Why use Docker?
4 Docker Products
5 Writing Dockerfiles
6 Dockerfiles v.s. Images v.s. Containers
7 What if I need more than one container?
- 7.1 Networking using Docker Compose
8 Compose v.s. Swarm v.s. Kubernetes v.s. Rancher
9 Other stuff you might find useful
10 Footnotes

Headnotes

This guide will often return to the concept of “development” and “production” containers. This is a common pattern in the world but isn’t a hard and fast rule. The idea is to develop in a fully featured environment in a “development container” and tear out everything you don’t need in a “production container”.

You can do this by creating a Dockerfile for each stage or leveraging multi-stage builds to build the production environment and then adding development dependencies on top in a different stage. Note: multiple stage builds are for stages of the SAME service - one Dockerfile per service. If none of this made any sense, read the rest of the guide first and then return here.

Why use Docker?

You can 100% skip this section if you aren’t interested in the why of the technology.

Two of the major issues with development and operations has been:

Making a consistent development environment (Dev)
Making a consistent production environment (Ops)

There has traditionally been a technology known as a virtual machine used to attempt to fill these gaps. This emulates¹ the entire operating system of a machine. This is a fairly robust way of creating a standard environment. However it is computationally and time expensive to run - requiring mostly costly developer machines and production servers especially when we need to run many VMs at the same time.

Docker popularised a new technology called containerisation that encapsulates each application into a lightweight “container” that are managed by the Docker engine. These containers only emulate the libraries instead of the whole operating system. This was significantly faster and less economically costly than running multiple virtual machines and has allowed for a closer overlap of Dev and Ops tasks into a DevOps workflow.

Docker Products

At the time of writing, Docker offers 5 products.

Product	What does it do?	Do I need it?

Product	What does it do?	Do I need it?
Docker Engine	Docker Engine is a tool that provisions and manages Docker containers on a single machine. This is the core of all things Docker.	This is bundled into other tools such as Toolbox and Desktop. However if you are running on Linux - you need to install this one.
Docker Toolbox	This is a legacy bundle of tools that is useful only on machines that don’t support hardware virtualisation (typically either Intel VT-x or AMD-V). This is much slower and less reliable than the newer products so avoid this if possible.	You only need to install this if your machine doesn’t support hardware virtualisation. This sometimes happens on older machines especially laptops. You may need to enable hardware virtualisation in your BIOS (Google it).
Docker Desktop	Docker Desktop is a bundle of tools used on “desktop” operating systems - i.e. MacOS and Windows.	You install this if you are on MacOS or Windows. It includes the Docker Engine and Docker Compose (and some other stuff too).
Docker Compose	Docker Compose is an orchestration tool. This is useful for managing multiple services that need to interact on a single host (more on that below).	You don’t need to install if you install Desktop. If you are a Linux user, install Docker Engine first and then install Compose separately.
Docker Hub	This an online hosted image repository where you can store your baked images.	You can’t install this but you may need to use this to store your images (more on that below).

Writing Dockerfiles

Your best friend for this section is the docs: https://docs.docker.com/engine/reference/builder/. Please note the syntax might have changed since this guide was written so be sure to cross-check with that.

Dockerfiles are instructions to the Docker engine. Each Dockerfile should be associated with one service e.g. one for the server, one for a database etc. We will call the environment inside the container, the container environment and the environment Docker is running in, the host environment.

Dockerfiles are typically called Dockerfile or service.Dockerfile.

Let’s go through a simple one for a Node server:

# Grab the latest Node base image
FROM node:latest

First we pull a node:latest image using the FROM command. An image is a built Dockerfile. In particular, most popular environments have a corresponding image uploaded to http://dockerhub.com/. This one is an official node image. The latest is a tag that typically correponds to a version.

Production Dockerfiles differ from development Dockerfiles here. You normally want to pin your image to a specific version since the latest image could introduce changes that might break the application. For development this is less important. You can see the list of all the avaliable tags for node here: https://hub.docker.com/_/node

This image acts a “base” to build your Dockerfile off. In particular it will do things like install dependencies for you e.g. node, npm, yarn etc. It will also specify which operating system libraries the Dockerfile is using e.g. Debian, Alpine, Microsoft Server Core etc.

Then we set a working directory using the WORKDIR command.

# Set the current working directory inside the container
WORKDIR /server

This creates a new directory called /server in the container. All Docker commands in the file will now run in this directory.

Next we want to install Node dependencies using our favoured Node package manager. In this case we’re using the one bundled with Node called npm.

# Copy package.json and package-lock.json into the container
COPY package.json package-lock.json ./

# Install node modules inside the container using the copied package.json
RUN npm install

The line with COPY tells the Docker Engine to copy all the files listed on the host computer to the directory in docker container specified by the last argument (./ - remember this now refers to /server since we used the WORKDIR command above)

The RUN command executes a shell command (using the default shell) in the docker container. In this case we are installing all the node modules.

Production images differ from development images here. We would only want to install “production” dependencies in a production Docker image to save space and time. For npm this is npm install --production

We are taking advantage of one of Docker’s most powerful features here. Each command in a Dockerfile is cached into a layer. These layers reflect the change that was made on that line. For commands such as COPY and ADD the caching algorithm checks the checksums for each file to consider when to invalidate the cache. Once a layer is invalidated Docker will re-run ALL the following layers (since they build on top of each other).

Now we want to copy all the relevant project files into the container.

# Copy the entire project into the container
COPY . .

This COPY command copies every file in the working directory of the build context of the host (see below) to the working directory (/server) of the container.

We are copying every file from our host to the container. This is often not desirable since it can increase the size of the Docker image which makes them take longer to pull and push. We can set a file called .dockerignore to ignore files we are either already regenerating in the container e.g. node_modules or things we don’t want to be passed into the build context at all e.g. locally compiled file caches or secrets.

Note our images are public. Everything in your build context is discoverable later so ENSURE YOU DO NOT INCLUDE ANY SECRETS IN YOUR IMAGE. You can inject them later during run-time.

You can read more about it here: https://docs.docker.com/engine/reference/builder/#dockerignore-file

Note that we copied the package.json and package-lock.json in the step before. This is because we will often make changes to the project files without touching the dependencies. Since the dependency manifests aren’t changed Docker will continue to cache the installation step saving a lot of time re-installing everything.

Very important note: Production images copy the files once and once only. In development images it is often beneficial to be able to edit files on the host and see changes reflected in container and vice versa. We will discuss this shortly below.

We EXPOSE a port so we can access the service listening on that port from outside the container. Note we can’t access any other services on the host without exposing the ports in some way.

# Expose the port to the outside world
EXPOSE 3001

Finally, we set a CMD that will automatically run a shell command when we start a new container from this image. The first argument is the program and the rest are arguments for the program.

# Run the server
CMD ["npm", "run", "start:production"]

Side note: This can be overriden (easily) during run-time unlike ENTRYPOINT. This is useful sometimes when we can to execute other shell commands such as /bin/bash to open a shell to make changes in the container.

The container will stop if the main process inside exits (in this case npm run start:production). Typically processes like servers shouldn’t ever stop so this isn’t problematic. For processes which do end early like shells we can take advantage of some flags described below (namely interactive mode and attach a tty).

All together we have:

# Grab the latest Node base image
FROM node:latest

# Set the current working directory inside the container
WORKDIR /server

# Copy package.json and package-lock.json into the container
COPY package.json package-lock.json ./

# Install node modules inside the container using the copied package.json
RUN npm install

# Copy the entire project into the container
COPY . .

# Expose the port to the outside world
EXPOSE 3001 

# Run the server
CMD ["npm", "run", "start:production"]

Dockerfiles v.s. Images v.s. Containers

tldr; Dockerfiles → Images → Containers

We can build the Dockerfile above into an image using the following shell command:

docker build -t csesoc/notangles-server:latest -f Dockerfile .

Let’s break that down:

docker build: Tells the Docker Engine we want to build a Dockerfile into an image. An image is a built (executed) version of a Dockerfile. This is somewhat akin to the process of compiling code into a program.
-t notangles-server:latest: This tells the Docker Engine to tag the image with the name csesoc/notangles-server and the tag latest.
The part before the slash is the name of the repository (typically on Docker Hub) and the part after is the image name.
The tag can be anything you want but most people include the semantic version (or latest) with the base OS. It is good practice to tag all your images so you can find them later.
-f Dockerfile: Tells the Docker Engine which Dockerfile to build. This argument can be omitted and defaults to a file named Dockerfile in the build context (specified by the next argument).
.: specifies the location of the build context. In this case . means current working directory. All files in the build context are avaliable to ADD or COPY from into the container (except the ones ignored by .dockerignore)

A container can be run from an image. This is akin to executing the program we mentioned previously. We can start a container like so:

docker run -d csesoc/notangles-server:latest

docker run: Tells the Docker Engine we want to create a new container with the following image.
-d: Tells Docker to run in detached mode - i.e. in the background. This is useful for containers such as databases etc. and containers in production.
csesoc/notangles-server:latest is the image name and tag. If the image exists locally it uses that first otherwise will attempt to pull the image from its list of image repositories (by default this is Docker Hub) e.g. mongo:latest will pull the latest Mongo DB image and execute it.

You might find the image description is sometimes written without a slash e.g. mongo:latest. This could mean the image is purely local and not being pushed to a repository. This could also mean that the image is in the Docker Official Images repository on Docker Hub which can be pulled without the repository part.

Some times we want to run a shell in the container so we can make local changes:

docker run -it csesoc/notangles-server:latest /bin/bash

We have two short flags: -i and -t (note for most shells we can join the flags - might not apply in places like Windows). These tell the Docker Engine we want to run in interactive mode and to open a tty for a terminal respectively.
Adding /bin/bash on the end of the command overrides the CMD instruction in the Dockerfile and executes that instead (i.e. opens a bash shell).

In standard Docker setups you can use docker push to push to Docker Hub. We use Docker Hub’s autobuild process to build the master branch of your Github repository.

You can talk to your Technical Director to get that set up for you.

Furthermore if you want to execute a command in an already running container (e.g. a web server) you can use docker exec. This can be useful for running test commands as you’re making code changes.

docker exec -it <container_name> /bin/bash

You can find the <container_name> by using docker ps on the host machine.

What if I need more than one container?

For modern apps you will often need more than one service. However it will often become quickly unmaintainable if we just start each container (remember one Dockerfile per service model) manually. A few tools have stepped in to fill that gap. One of these orchestration tools is Docker Compose which is bundled in most installations of Docker.

It allows us to define multiple services in an infrastructure-as-code way. Docker Compose can help build, create containers for and generally manage these services on a single host computer. This is great for development purposes e.g. when you need a local testing database.

Let’s take a look at a sample docker-compose.yml file. They are written in the YAML Ain’t Markup Language (YAML) which is sort of like a human readable JSON (serialisation language).

Like above you should reference the docs: https://docs.docker.com/compose/ since the guide might be out-of-date. The Compose reference https://docs.docker.com/compose/compose-file/ should explain all the avaliable keys.

version: "3.8"

services:
  web:
    build:
      context: .
      dockerfile: Dockerfile 
    ports:
      - target: 3001
        published: 8080
    volumes:
      - type: bind
        source: .
        target: /server
    depends_on:
      - db
      - cache
  
  db:
    image: mongo:latest
    restart: unless-stopped
    
  cache:
    image: redis:latest
    volumes:
      - type: volume
          source: rediscache
          target: /tmp/rediscache
  
volumes:
  rediscache:

First we have the version of the Compose file. This is very important because it determines which features are available to you.

version: "3.8"

Next we define a service called web. This will be our web server:

services:
  web:
    build:
      context: .
      dockerfile: Dockerfile

The first key build is tells Compose to build a specified Dockerfile (just like docker build).

Next we publish a port on the server container. This binds 8080 in the host to 3001 in the container. This means we can access the server on port 3001 using something like http://localhost:8080 on the host.

Note this ports block isn’t necessary if the only thing you want is private networking e.g. between a server and a database container. This is explained below.

  ports:
    - target: 3001
      published: 8080
      protocol: tcp

Note you need to bind the IP of the server to 0.0.0.0 (normally the wildcard address) in the container if you want to expose the server. The localhost of the container is only accessible in the container.

We are defining a volume here. A volume is a place where data is stored that persist between the containers being removed. In particular this volume is a bind-mount. This means if I make a change in the source directory (on the host) i.e. the current working directory those changes are reflected on the target directory (in the container).

    volumes:
      - type: bind
        source: .
        target: /server

Note: This kind of pattern is very common in development containers but don’t do it for production containers. This is because we shouldn’t be changing files in production! _{even if the change is small?}^{Yes even if the change is small}_{It’s not a good idea unless you want to ruin a perfectly good weekend}

Finally the depends_on block tells Compose to wait for the db and the cache server to be initialised before the web block.

  depends_on:
    - db
    - cache

We then define a service called db:

db:
  image: mongo:latest
  restart: unless-stopped

The image key tells Compose to pull an image (i.e. has already been built). In this case it is pulling the latest MongoDB image.

The restart key tells Compose to restart this service unless we manually stop it using docker stop <container_name>

Finally we define a service called cache:

cache:
  image: redis:latest
  volumes:
    - type: volume
        source: rediscache
        target: /tmp/rediscache

This pulls the Redis image and defines a named volume. Note this is different to bind-mounting the data. A named volume is managed by Docker and can’t be (easily) accessed from the host. It does not reflect any (user) host directory. This is useful for data you want to store but you don’t want others to edit/access directly from the host e.g. database data or cache data.

The source rediscache is defined at the bottom of the docker-compose.yml like so:

volumes:
  rediscache:

We can then tell Compose to start all these services in the background by using:

docker compose up -d

Networking using Docker Compose

You can read more here: https://docs.docker.com/compose/networking/

By default Compose creates a local area network between all your containers. This means you can talk between services without the need of a ports block. Every container can access each other using their service name.

e.g. a connection URL for a database in the service web could be mongo://db:27017. We are using the service name db in this case.

Compose v.s. Swarm v.s. Kubernetes v.s. Rancher

You may have heard of Docker Swarm or Kubernetes (k8s) and know they are related to containers somehow. They are also orchestration tools just like Docker Compose. The primary difference is they are intended for use across multiple hosts. This is very common in larger scale projects which run on multiple servers for performance and reliability reasons.

Docker Swarm is Docker’s offering and Kubernetes is maintained by the Cloud Native Computing Foundation - Kubernetes as of writing is much more popular (even Docker itself has finally built in proper native support).

For our production server (Wheatley (server)) we use Rancher to manage our containers. Rancher is a visual interface for Kubernetes and also provides our management features on top of that. You can read about it here: How to deploy a project on Wheatley.

Other stuff you might find useful

docker ps -a is useful to see all the currently running and stopped containers.
docker logs -f <container_name> lets you see the logs of a currently running container e.g. a web server’s access logs
docker system prune is a particularly destructive and fun way to clean up space by pruning away old images and containers.

You should take a read of this: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/. It’s an excellent read on how to make your Dockerfiles and Docker images clearer, faster and smaller than I could ever explain.
On Windows if you can use the (currently new) WSL 2 integration. It cleans up many of the residual problems with running Docker on Windows with file permissions and weird things with low-level stuff by just emulating the whole Linux kernel (remember the Docker Engine part of Docker only strictly emulates the libraries - Docker Desktop helps create a light-weight VM of the operating system for all the containers to run on).
If you use VSCode you can use https://code.visualstudio.com/docs/remote/containers to run VSCode seamlessly within your container. It makes making changes and testing in the container a breeze. YMMV especially on Macs (might not be compatible)

Footnotes

Strictly emulation (software) differs from virtualisation (hardware) in a technical sense, but here I’m using the common sense of the term for simplicity.

DevSoc