Docker Swarm is a container orchestration tool that makes it easy to manage and scale your existing Docker infrastructure. It consists of a pool of Docker hosts that run in Swarm mode with some nodes acting as managers, workers, or both. Using Docker Swarm mode to manage your Docker containers brings the following benefits:

  • It allows you to incrementally apply updates with zero downtime.
  • It increases application resilience to outages by reconciling any differences between the actual state and your expressed desired state.
  • It eases the process of scaling your applications since you only need to define the desired number of replicas in the cluster.
  • It is built into the docker CLI, so you don't need additional software to get up and running.
  • It enables multi-host networking such that containers deployed on different nodes can communicate with each other easily.

In this tutorial, you will learn key concepts in Docker Swarm and set up a highly available Swarm cluster that is resilient to failures. You will also learn some best practices and recommendations to ensure that your Swarm setup is fault tolerant.

Prerequisites

Before proceeding with this tutorial, ensure that you have access to five Ubuntu 22.04 servers. This is necessary to demonstrate a highly available set up, although it is also possible to run Docker Swarm on a single machine. You also need to configure each server with a user that has administrative privileges.

The following ports must also be available on each server for communication purposes between the nodes. On Ubuntu 22.04, they are open by default:

  • TCP port 2377 for cluster management communications,
  • TCP and UDP port 7946 for communication among nodes,
  • TCP and UDP port 4789 for overlay network traffic.

Explaining Docker Swarm terminology

Before proceeding with this tutorial, let's examine some terms and definitions in Docker Swarm so that you have enough understanding of what each one means when they are used in this article and in other Docker Swarm resources.

  • Node: refers to an instance of the Docker engine in the Swarm cluster.
  • Manager nodes: they are tasked with handling orchestration and cluster management functions, and dispatching incoming tasks to worker nodes. They can also act as worker nodes unless placed in Drain mode (recommended).
  • Leader: this is a specific manager node that is elected to perform orchestration tasks and management/maintenance operations by all the manager nodes in the cluster using the Raft Consensus Algorithm.
  • Worker nodes: are Docker instances whose sole purpose is to receive and execute Swarm tasks from manager nodes.
  • Swarm task: refers to a Docker container and the commands that run inside the container. Once a task is assigned to a node, it can run or fail but it cannot be transferred to a different node.
  • Swarm service: this is the mechanism for defining tasks that should be executed on a node. It involves specifying the container image and commands that should run inside the container.
  • Drain: means that new tasks are no longer assigned to a node, and existing tasks are reassigned to other available nodes.

Docker Swarm requirements for high availability

A highly available Docker Swarm setup ensures that if a node fails, services on the failed node are re-provisioned and assigned to other available nodes in the cluster. A Docker Swarm setup that consists of one or two manager nodes is not considered highly available because any incident will cause operations on the cluster to be interrupted. Therefore the minimum number of manager nodes in a highly available Swarm cluster should be three.

The table below shows the number of failures a Swarm cluster can tolerate depending on the number of manager nodes in the cluster:

Manager NodesFailures tolerated
10
20
31
41
52
62
73

As you can see, having an even number of manager nodes does not help with failure tolerance, so you should always maintain an odd number of manager nodes. Fault tolerance improves as you add more manager nodes, but Docker recommends no more than seven managers so that performance is not negatively impacted since each node must acknowledge proposals to update the state of the cluster.

You should also distribute your manager nodes in separate locations so they are not affected by the same outage. If they run on the same server, a hardware problem could cause them all to go down. The high availability Swarm cluster that you will be set up in this tutorial will therefore exhibit the following characteristics:

  • 5 total nodes (2 workers and 3 managers) with each one running on a separate server.
  • 2 worker nodes (worker-1 and worker-2).
  • 3 manager nodes (manager-1, manager-2, and manager-3).
Meshery Title
Playground

Connect to live clusters
Discover, validate, and visualize
Kubernetes infrastructure with ease.

Docker and Meshery

Docker Extension for Meshery
is now available!

Managing cloud native infrastructure has never been easier.

Step 1 — Installing Docker

In this step, you will install Docker on all five Ubuntu servers. Therefore, execute all the commands below (and in step 2) on all five servers. If your host offers a snapshot feature, you may be able to run the commands on a single server and use that server as a base for the other four instances.

Let's start by installing the latest version of the Docker Engine (20.10.18 at the time of writing). Go ahead and update the package information list from all configured sources on your system:

1sudo apt update

Afterward, install the following packages to allow apt to use packages over HTTPS:

1sudo apt install apt-transport-https ca-certificates curl software-properties-common

Next, add the GPG key for the official Docker repository to the server:

1curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

Once the GPG key is added, include the official Docker repository in the server's apt sources list:

1echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Finally, update apt once again and install the Docker Engine:

1sudo apt update
1sudo apt install docker-ce

Once the relevant packages are installed, you can check the status of the docker service using the command below:

1sudo systemctl status docker

If everything goes well, you should observe that the container engine is active and running on your server.

Step 2 — Executing the Docker command without sudo

By default, the docker command can only be executed by the root user or any user in the docker group (auto created on installation). If you execute a docker command without prefixing it with sudo or running it through a user that belongs to the docker group, you will get a permission error that looks like this:

1Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json": dial unix /var/run/docker.sock: connect: permission denied

As mentioned earlier, using sudo with docker is a security risk, so the solution to the above error is to add the relevant user to the docker group. This can be achieved through the command below:

1sudo usermod -aG docker $USER

Next, run the following command and enter the user's password when prompted for the changes to take effect:

1su - $USER

You should now be able to run docker commands without prefixing them with sudo. For example, when you run the command docker ps, you should observe the output.

Before proceeding to the next step, ensure that all the commands in step 1 and step 2 have been executed on all five servers.

Step 3 — Initializing the Swarm Cluster

At this point, each of your five Docker instances are acting as separate hosts and not as part of a Swarm cluster. Therefore, in this step, we will initialize the Swarm cluster on the manager-1 server and add the hosts to the cluster accordingly.

Start by logging into one of the Ubuntu servers (manager-1) and retrieve the private IP address of the machine using the following command:

1hostname -I | awk '{print $1}'

Copy the IP address to your clipboard and replace the <manager_1_server_ip> placeholder in the command below to initialize Swarm mode:

1docker swarm init --advertise-addr <manager_1_server_ip>

If the command is successful, you will see output indicating that the Swarm has been initialized and that the current node is now a manager. It will also provide a command to join worker nodes to the cluster. Copy the command for later use.

Next, SSH into each of the other four Ubuntu servers (manager-2, manager-3, worker-1, and worker-2) and run the command you copied earlier to join them to the Swarm cluster. The command should look like this:

1docker swarm join --token <token> <manager_1_server_ip>:<port>

After running the command on each server, you should see output indicating that the node has joined the Swarm as either a manager or a worker. To verify the status of the Swarm cluster, you can run the command docker node ls on the manager node:

1docker node ls

You should see a list of all the nodes in the Swarm cluster, including their IDs, hostname, status, availability, and whether they are a manager or a worker.

Step 4 — Deploying the Application Stack

Now that you have a functioning Docker Swarm cluster, you can deploy your application stack. In this tutorial, we will use a simple example of a web application stack consisting of a front-end service and a back-end service.

Start by creating a new directory for your application stack on the manager node:

1mkdir app-stack cd app-stack

Next, create a file called docker-compose.yml in the app-stack directory and open it in a text editor:

1nano docker-compose.yml

Copy and paste the following YAML code into the docker-compose.yml file:

1version: '3.8' services: frontend: image: nginx:latest ports: - 80:80 deploy: replicas: 2 restart_policy: condition: on-failure backend: image: httpd:latest ports: - 8080:80 deploy: replicas: 2 restart_policy: condition: on-failure

This Docker Compose file defines two services: frontend and backend. The frontend service uses the nginx:latest image and maps port 80 of the host to port 80 of the container. It is configured to have 2 replicas and to restart on failure. The backend service uses the httpd:latest image and maps port 8080 of the host to port 80 of the container. It is also configured to have 2 replicas and to restart on failure.

Save and close the docker-compose.yml file.

To deploy the application stack, run the following command:

1docker stack deploy -c docker-compose.yml app-stack

If the command is successful, you should see output indicating that the services are being deployed. You can check the status of the services by running the command docker service ls:

1docker service ls

You should see a list of the services in the stack, including their names, mode, replicas, and ports.

Conclusion

In this tutorial, you learned how to set up a highly available Docker Swarm cluster and deploy a simple application stack. This setup provides fault tolerance and load balancing for your applications, allowing you to scale them easily as your needs grow.

Next steps:

  • Explore more Docker Swarm features, such as service updates and rolling updates.
  • Deploy your own application stack using Docker Compose.
  • Learn about Docker networking and how to create overlay networks.
Kanvas

Get started with Kanvas!

Explore and understand your infrastructure at a glance with our powerful visualizer tool. Gain insights, track dependencies, and optimize performance effortlessly.

Related Resources

Layer5, the cloud native management company

An empowerer of engineers, Layer5 helps you extract more value from your infrastructure. Creator and maintainer of cloud native standards. Maker of Meshery, the cloud native manager.