Using Docker (and more!)
If you have questions about this guide or you want to make modifications please contact me @Gordon Zhong (Unlicensed) (if I’m still around) or simply just add to it.
Table of contents
Headnotes
This guide will often return to the concept of “development” and “production” containers. This is a common pattern in the world but isn’t a hard and fast rule. The idea is to develop in a fully featured environment in a “development container” and tear out everything you don’t need in a “production container”.
You can do this by creating a Dockerfile for each stage or leveraging multi-stage builds to build the production environment and then adding development dependencies on top in a different stage. Note: multiple stage builds are for stages of the SAME service - one Dockerfile per service. If none of this made any sense, read the rest of the guide first and then return here.
Why use Docker?
You can 100% skip this section if you aren’t interested in the why of the technology.
Two of the major issues with development and operations has been:
Making a consistent development environment (Dev)
Making a consistent production environment (Ops)
There has traditionally been a technology known as a virtual machine used to attempt to fill these gaps. This emulates1 the entire operating system of a machine. This is a fairly robust way of creating a standard environment. However it is computationally and time expensive to run - requiring mostly costly developer machines and production servers especially when we need to run many VMs at the same time.
Docker popularised a new technology called containerisation that encapsulates each application into a lightweight “container” that are managed by the Docker engine. These containers only emulate the libraries instead of the whole operating system. This was significantly faster and less economically costly than running multiple virtual machines and has allowed for a closer overlap of Dev and Ops tasks into a DevOps workflow.
Docker Products
At the time of writing, Docker offers 5 products.
Product | What does it do? | Do I need it? |
---|---|---|
Docker Engine | Docker Engine is a tool that provisions and manages Docker containers on a single machine. This is the core of all things Docker. | This is bundled into other tools such as Toolbox and Desktop. |
Docker Toolbox | This is a legacy bundle of tools that is useful only on machines that don’t support hardware virtualisation (typically either Intel VT-x or AMD-V). This is much slower and less reliable than the newer products so avoid this if possible. | You only need to install this if your machine doesn’t support hardware virtualisation. This sometimes happens on older machines especially laptops. |
Docker Desktop | Docker Desktop is a bundle of tools used on “desktop” operating systems - i.e. MacOS and Windows. | You install this if you are on MacOS or Windows. It includes the Docker Engine and Docker Compose (and some other stuff too). |
Docker Compose | Docker Compose is an orchestration tool. This is useful for managing multiple services that need to interact on a single host (more on that below). | You don’t need to install if you install Desktop. If you are a Linux user, install Docker Engine first and then install Compose separately. |
Docker Hub | This an online hosted image repository where you can store your baked images. | You can’t install this but you may need to use this to store your images (more on that below). |
Writing Dockerfiles
Your best friend for this section is the docs: https://docs.docker.com/engine/reference/builder/. Please note the syntax might have changed since this guide was written so be sure to cross-check with that.
Dockerfiles are instructions to the Docker engine. Each Dockerfile should be associated with one service e.g. one for the server, one for a database etc. We will call the environment inside the container, the container environment and the environment Docker is running in, the host environment.
Dockerfiles are typically called Dockerfile
or service.Dockerfile
.
Let’s go through a simple one for a Node server:
# Grab the latest Node base image
FROM node:latest
First we pull a node:latest
image using the FROM
command. An image is a built Dockerfile. In particular, most popular environments have a corresponding image uploaded to http://dockerhub.com/. This one is an official node
image. The latest
is a tag that typically correponds to a version.
Production Dockerfiles differ from development Dockerfiles here. You normally want to pin your image to a specific version since the latest image could introduce changes that might break the application. For development this is less important. You can see the list of all the avaliable tags for node
here: https://hub.docker.com/_/node
This image acts a “base” to build your Dockerfile off. In particular it will do things like install dependencies for you e.g. node
, npm
, yarn
etc. It will also specify which operating system libraries the Dockerfile is using e.g. Debian, Alpine, Microsoft Server Core etc.
Then we set a working directory using the WORKDIR
command.
# Set the current working directory inside the container
WORKDIR /server
This creates a new directory called /server
in the container. All Docker commands in the file will now run in this directory.
Next we want to install Node dependencies using our favoured Node package manager. In this case we’re using the one bundled with Node called npm
.
# Copy package.json and package-lock.json into the container
COPY package.json package-lock.json ./
# Install node modules inside the container using the copied package.json
RUN npm install
The line with COPY
tells the Docker Engine to copy all the files listed on the host computer to the directory in docker container specified by the last argument (./
- remember this now refers to /server
since we used the WORKDIR
command above)
The RUN
command executes a shell command (using the default shell) in the docker container. In this case we are installing all the node modules.
Production images differ from development images here. We would only want to install “production” dependencies in a production Docker image to save space and time. For npm
this is npm install --production
We are taking advantage of one of Docker’s most powerful features here. Each command in a Dockerfile is cached into a layer. These layers reflect the change that was made on that line. For commands such as COPY
and ADD
the caching algorithm checks the checksums for each file to consider when to invalidate the cache. Once a layer is invalidated Docker will re-run ALL the following layers (since they build on top of each other).
You can read more about it here: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache
Now we want to copy all the relevant project files into the container.
# Copy the entire project into the container
COPY . .
This COPY
command copies every file in the working directory of the build context of the host (see below) to the working directory (/server
) of the container.
We are copying every file from our host to the container. This is often not desirable since it can increase the size of the Docker image which makes them take longer to pull and push. We can set a file called .dockerignore
to ignore files we are either already regenerating in the container e.g. node_modules
or things we don’t want to be passed into the build context at all e.g. locally compiled file caches or secrets.
Note our images are public. Everything in your build context is discoverable later so ENSURE YOU DO NOT INCLUDE ANY SECRETS IN YOUR IMAGE. You can inject them later during run-time.
You can read more about it here: https://docs.docker.com/engine/reference/builder/#dockerignore-file
Note that we copied the package.json
and package-lock.json
in the step before. This is because we will often make changes to the project files without touching the dependencies. Since the dependency manifests aren’t changed Docker will continue to cache the installation step saving a lot of time re-installing everything.
Very important note: Production images copy the files once and once only. In development images it is often beneficial to be able to edit files on the host and see changes reflected in container and vice versa. We will discuss this shortly below.
We EXPOSE
a port so we can access the service listening on that port from outside the container. Note we can’t access any other services on the host without exposing the ports in some way.
# Expose the port to the outside world
EXPOSE 3001
Finally, we set a CMD
that will automatically run a shell command when we start a new container from this image. The first argument is the program and the rest are arguments for the program.
# Run the server
CMD ["npm", "run", "start:production"]
Side note: This can be overriden (easily) during run-time unlike ENTRYPOINT
. This is useful sometimes when we can to execute other shell commands such as /bin/bash
to open a shell to make changes in the container.
The container will stop if the main process inside exits (in this case npm run start:production
). Typically processes like servers shouldn’t ever stop so this isn’t problematic. For processes which do end early like shells we can take advantage of some flags described below (namely interactive mode and attach a tty).
All together we have:
# Grab the latest Node base image
FROM node:latest
# Set the current working directory inside the container
WORKDIR /server
# Copy package.json and package-lock.json into the container
COPY package.json package-lock.json ./
# Install node modules inside the container using the copied package.json
RUN npm install
# Copy the entire project into the container
COPY . .
# Expose the port to the outside world
EXPOSE 3001
# Run the server
CMD ["npm", "run", "start:production"]
Dockerfiles v.s. Images v.s. Containers
tldr; Dockerfiles → Images → Containers
We can build the Dockerfile above into an image using the following shell command:
docker build -t csesoc/notangles-server:latest -f Dockerfile .
Let’s break that down:
docker build
: Tells the Docker Engine we want to build a Dockerfile into an image. An image is a built (executed) version of a Dockerfile. This is somewhat akin to the process of compiling code into a program.-t notangles-server:latest
: This tells the Docker Engine to tag the image with the namecsesoc/notangles-server
and the taglatest
.
The part before the slash is the name of the repository (typically on Docker Hub) and the part after is the image name.
The tag can be anything you want but most people include the semantic version (orlatest
) with the base OS. It is good practice to tag all your images so you can find them later.-f Dockerfile
: Tells the Docker Engine which Dockerfile to build. This argument can be omitted and defaults to a file namedDockerfile
in the build context (specified by the next argument)..
: specifies the location of the build context. In this case.
means current working directory. All files in the build context are avaliable toADD
orCOPY
from into the container (except the ones ignored by.dockerignore
)
A container can be run from an image. This is akin to executing the program we mentioned previously. We can start a container like so:
docker run -d csesoc/notangles-server:latest
docker run
: Tells the Docker Engine we want to create a new container with the following image.-d
: Tells Docker to run in detached mode - i.e. in the background. This is useful for containers such as databases etc. and containers in production.csesoc/notangles-server:latest
is the image name and tag. If the image exists locally it uses that first otherwise will attempt to pull the image from its list of image repositories (by default this is Docker Hub) e.g.mongo:latest
will pull the latest Mongo DB image and execute it.
You might find the image description is sometimes written without a slash e.g. mongo:latest
. This could mean the image is purely local and not being pushed to a repository. This could also mean that the image is in the Docker Official Images repository on Docker Hub which can be pulled without the repository part.
Some times we want to run a shell in the container so we can make local changes:
docker run -it csesoc/notangles-server:latest /bin/bash
We have two short flags:
-i
and-t
(note for most shells we can join the flags - might not apply in places like Windows). These tell the Docker Engine we want to run in interactive mode and to open a tty for a terminal respectively.Adding
/bin/bash
on the end of the command overrides theCMD
instruction in the Dockerfile and executes that instead (i.e. opens abash
shell).
In standard Docker setups you can use docker push
to push to Docker Hub. We use Docker Hub’s autobuild process to build the master
branch of your Github repository.
You can read more here: https://docs.docker.com/engine/reference/commandline/push/ and here: https://docs.docker.com/docker-hub/builds/
You can talk to your Technical Director to get that set up for you.
Furthermore if you want to execute a command in an already running container (e.g. a web server) you can use docker exec
. This can be useful for running test commands as you’re making code changes.
docker exec -it <container_name> /bin/bash
You can find the <container_name>
by using docker ps
on the host machine.
What if I need more than one container?
For modern apps you will often need more than one service. However it will often become quickly unmaintainable if we just start each container (remember one Dockerfile per service model) manually. A few tools have stepped in to fill that gap. One of these orchestration tools is Docker Compose which is bundled in most installations of Docker.
It allows us to define multiple services in an infrastructure-as-code way. Docker Compose can help build, create containers for and generally manage these services on a single host computer. This is great for development purposes e.g. when you need a local testing database.
Let’s take a look at a sample docker-compose.yml
file. They are written in the YAML Ain’t Markup Language (YAML) which is sort of like a human readable JSON (serialisation language).
Like above you should reference the docs: https://docs.docker.com/compose/ since the guide might be out-of-date. The Compose reference https://docs.docker.com/compose/compose-file/ should explain all the avaliable keys.
version: "3.8"
services:
web:
build:
context: .
dockerfile: Dockerfile
ports:
- target: 3001
published: 8080
volumes:
- type: bind
source: .
target: /server
depends_on:
- db
- cache
db:
image: mongo:latest
restart: unless-stopped
cache:
image: redis:latest
volumes:
- type: volume
source: rediscache
target: /tmp/rediscache
volumes:
rediscache:
First we have the version of the Compose file. This is very important because it determines which features are available to you.
version: "3.8"
Next we define a service called web
. This will be our web server:
services:
web:
build:
context: .
dockerfile: Dockerfile
The first key build
is tells Compose to build a specified Dockerfile (just like docker build
).
Next we publish a port on the server container. This binds 8080
in the host to 3001
in the container. This means we can access the server on port 3001
using something like http://localhost:8080
on the host.
Note this ports
block isn’t necessary if the only thing you want is private networking e.g. between a server and a database container. This is explained below.
ports:
- target: 3001
published: 8080
protocol: tcp
Note you need to bind the IP of the server to 0.0.0.0
(normally the wildcard address) in the container if you want to expose the server. The localhost
of the container is only accessible in the container.
We are defining a volume here. A volume is a place where data is stored that persist between the containers being removed. In particular this volume is a bind-mount. This means if I make a change in the source
directory (on the host) i.e. the current working directory those changes are reflected on the target
directory (in the container).
volumes:
- type: bind
source: .
target: /server
Note: This kind of pattern is very common in development containers but don’t do it for production containers. This is because we shouldn’t be changing files in production! even if the change is small?Yes even if the change is smallIt’s not a good idea unless you want to ruin a perfectly good weekend
Finally the depends_on
block tells Compose to wait for the db
and the cache
server to be initialised before the web
block.
depends_on:
- db
- cache
We then define a service called db
:
db:
image: mongo:latest
restart: unless-stopped
The image
key tells Compose to pull an image (i.e. has already been built). In this case it is pulling the latest MongoDB image.
The restart
key tells Compose to restart this service unless we manually stop it using docker stop <container_name>
Finally we define a service called cache
:
cache:
image: redis:latest
volumes:
- type: volume
source: rediscache
target: /tmp/rediscache
This pulls the Redis image and defines a named volume. Note this is different to bind-mounting the data. A named volume is managed by Docker and can’t be (easily) accessed from the host. It does not reflect any (user) host directory. This is useful for data you want to store but you don’t want others to edit/access directly from the host e.g. database data or cache data.
The source rediscache
is defined at the bottom of the docker-compose.yml
like so:
volumes:
rediscache:
We can then tell Compose to start all these services in the background by using:
docker compose up -d
Networking using Docker Compose
You can read more here: https://docs.docker.com/compose/networking/
By default Compose creates a local area network between all your containers. This means you can talk between services without the need of a ports
block. Every container can access each other using their service name.
e.g. a connection URL for a database in the service web
could be mongo://db:27017
. We are using the service name db
in this case.
Compose v.s. Swarm v.s. Kubernetes v.s. Rancher
You may have heard of Docker Swarm or Kubernetes (k8s) and know they are related to containers somehow. They are also orchestration tools just like Docker Compose. The primary difference is they are intended for use across multiple hosts. This is very common in larger scale projects which run on multiple servers for performance and reliability reasons.
Docker Swarm is Docker’s offering and Kubernetes is maintained by the Cloud Native Computing Foundation - Kubernetes as of writing is much more popular (even Docker itself has finally built in proper native support).
For our production server (Wheatley (server)) we use Rancher to manage our containers. Rancher is a visual interface for Kubernetes and also provides our management features on top of that. You can read about it here: How to deploy a project on Wheatley.
Other stuff you might find useful
docker ps -a
is useful to see all the currently running and stopped containers.docker logs -f <container_name>
lets you see the logs of a currently running container e.g. a web server’s access logsdocker system prune
is a particularly destructive and fun way to clean up space by pruning away old images and containers.
You should take a read of this: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/. It’s an excellent read on how to make your Dockerfiles and Docker images clearer, faster and smaller than I could ever explain.
On Windows if you can use the (currently new) WSL 2 integration. It cleans up many of the residual problems with running Docker on Windows with file permissions and weird things with low-level stuff by just emulating the whole Linux kernel (remember the Docker Engine part of Docker only strictly emulates the libraries - Docker Desktop helps create a light-weight VM of the operating system for all the containers to run on).
If you use VSCode you can use https://code.visualstudio.com/docs/remote/containers to run VSCode seamlessly within your container. It makes making changes and testing in the container a breeze. YMMV especially on Macs (might not be compatible)
Footnotes
Strictly emulation (software) differs from virtualisation (hardware) in a technical sense, but here I’m using the common sense of the term for simplicity.