My way of recovering from a #Docker #Swarm meltdown effectively

In case of storage, network issues, DHCP or a system crash it might happen that the swarm cluster is not recovering from the failure.

Identify the failing manager node (ex. manager3).

SSH into a working manager node (ex.manager2)

docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
0rkceljwlpfolz5ubdjv6nger node3 Unknown Active
0z9o13vn3hh3n14mt3xwshisv * manager2 Ready Active Leader
146msc2qe88w3vtkgx32lnhrz manager3 Unknown Active Unreachable
3f7f3ha8f0v7dbexgamkz1ge4 node4 Unknown Active
570dabhxakxdfp78sv1wy2aro node7 Ready Active
ck9mie2hk6punfckvgzyr9vw2 node8 Unknown Active
cvcgmxlzqtfnx2m3vh1ma1s28 manager1 Unknown Active Reachable
dh1j68lohfn3h7patwei3iifx node6 Unknown Active
e83bzmu6sqr024ltvx05rw48i node2 Unknown Active
euelt0nz9ql7v3dlp1rdwy64r node1 Ready Active

Then demote the failing manager3

docker node demote manager3

Then remove manager3‍

docker node rm --force manager3

Then obtain the manager join token

docker swarm join-token -q manager
SWMTKN-1-3883ut...d-7xv4hpz8hbk9h0h49nkarhuqd

SSH in the removed manager (manager3 in this case)

docker swarm leave

Add the manager again with token obtain in previous step

docker swarm join \
--token SWMTKN-1-3883utwtrmo00o0ugkosmued-7xv4049nkarhuqd \
IP_ADDRES_OF_ACTIVE_MANAGER
docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
0rkceljwlpfolz5ubdjv6nger node3 Ready Active
0z9o13vn3hh3n14mt3xwshisv manager2 Ready Active Leader
3f7f3ha8f0v7dbexgamkz1ge4 node4 Ready Active
570dabhxakxdfp78sv1wy2aro node7 Ready Active
ck9mie2hk6punfckvgzyr9vw2 node8 Ready Active
cvcgmxlzqtfnx2m3vh1ma1s28 manager1 Ready Active Reachable
dh1j68lohfn3h7patwei3iifx node6 Ready Active
e83bzmu6sqr024ltvx05rw48i node2 Ready Active
ephh8d24htl70a79886ehr7lr * manager3 Ready Active Reachable
euelt0nz9ql7v3dlp1rdwy64r node1 Ready Active
root@manager3:/home/ubuntu#

Often, if if there are nodes not joining or are unstable, the cluster will heal itself at this point.

You might need to ‘re-scale’ your services so that you can redistribute the load.
A tool like #manomarks/visualizer might also give an easy and nice overview of your cluster.

dockerdocker containersdocker swarm modelearndockerdockersec

BruCON co-founder, OWASP supporter, Application Delivery and Web Application Security, Kubernetes and container, pentesting enthousiast, BBQ & cocktails !!