[ClusterLabs] Queries about a Cluster setup inside a docker

Fri Mar 28 08:29:24 UTC 2025

Hi Klaus,

Answer below.

how you are tackling the - quite strict - requirement
of sbd for a watchdog-device (even if it is just softdog - which isn't available for your setup either) to guarantee reliable self-fencing?
<Deva>
We have modified sbd and using similar pipe based approach communicate with base linix (watchdog_tickle modified) .
If the sbd daemon has a problem the tickle stops and a daemon in the base linux would identify it and take actions like re-start the docker container.

Regards,
Deva.

Internal Use - Confidential

From: Klaus Wenninger <kwenning at redhat.com>
Sent: 28 March 2025 13:38
To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Cc: Narayanan, Devarajan <Devarajan.Narayanan at dell.com>
Subject: Re: [ClusterLabs] Queries about a Cluster setup inside a docker

[EXTERNAL EMAIL]

On Tue, Mar 25, 2025 at 3:37 PM Narayanan, Devarajan via Users <users at clusterlabs.org<mailto:users at clusterlabs.org>> wrote:
Hi,

I have a setup where I have multiple docker instances on a base linux (Node1).
I have a cluster inside each docker instances which pair with a similar setup on another node (node2) and form 2 node clusters.
See pic1 below.

In this setup, the cluster status etc resides in the docker overlay file system I presume.
<Query1> Is there a clear list of files which have the cluster status (Basically data of corosync, pacemaker, sbd, crmsh processes I think)?

<Query2> In this setup if I wanted the cluster data to be persistent across “remove and re-run of the docker instance”, what can I do?

Presuming cluster data will be in /var, /etc, /usr folders, I tried the following solution.
Created volumes for var, etc and usr and then during docker run used the options like “-v var_vol:/var -v etc_vol:/etc -v usr_vol”
With this some portion worked and saw some weird behaviour as well.
<Query3> Is this a correct way of solving the problem to have a persistent cluster data? Have I missed mapping any folder?

FYI, I have given the details about the experiment I tried to verify if the cluster data is consistent below (Experiment).
Let me know if this makes sense.

Slightly off topic regarding your questions ...
Just out of curiosity without really having investigated what docker offers to help with I wanted to ask how you are tackling the - quite strict - requirement
of sbd for a watchdog-device (even if it is just softdog - which isn't available for your setup either) to guarantee reliable self-fencing?
sbd in containers would be an interesting field to investigate and thus I'm interested in what is there and what we could do to improve the
situation. Many years back I for instance implemented a pipe-device offering a similar interface as the kernel-watchdog-drivers provided by
each instance of the per container lxc process (iirc the lxc processes then had an integration with watchdog-daemon while an instance of
watchdog daemon interacting with the pipe running inside each container - everything way before sbd appeared on the horizon - just as an example).

Regards,
Klaus

Pic1
[cid:image001.png at 01DB9FE8.3160FB30]

Experiment
Tried the following experiment. Please let me know if this makes sense
1) In a proper working cluster, stopped app-1-on-node-2 container on node2 to get the following crm status in app-1-on-node-1
Node List:
  * Online: [ app-1-on-node-1 ]
  * OFFLINE: [ app-1-on-node-2 ]
2) Stopped and started the app-1-on-node-1 and checked the crm status. Remained the same as before
Node List:
  * Online: [ app-1-on-node-1 ]
  * OFFLINE: [ app-1-on-node-2 ]
3) Remove the container app-1-on-node-1 and run it newly and then checked the crm status.
  Now the status was changed by not showing the app-1-on-node-2 (Presume the reason for this is old cluster data is not available)
Node List:
  * Online: [ app-1-on-node-1 ]
4) Repeated the step 1 and observed the crm status (This time I used “-v var_vol:/var -v etc_vol:/etc -v usr_vol" during docker run)
Node List:
  * Online: [ app-1-on-node-1 ]
  * OFFLINE: [ app-1-on-node-2 ]
5) Remove the container app-1-on-node-1 and run it newly (with “-v var_vol:/var -v etc_vol:/etc -v usr_vol" during docker run) and then checked the crm status.

6) Now checked the crm status
Node List:
  * Online: [ app-1-on-node-1 ]
  * OFFLINE: [ app-1-on-node-2 ]

Regards,
Deva.

Internal Use - Confidential
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users [lists.clusterlabs.org]<https://urldefense.com/v3/__https:/lists.clusterlabs.org/mailman/listinfo/users__;!!LpKI!mUN_tEyhzd935xq2QwiY_HDqrEQ9CEZjtAzHHca5jkRALA42VbYD_wgSGuBn2uERAcrCHMTtjmz6BdP_k_kZEds$>

ClusterLabs home: https://www.clusterlabs.org/ [clusterlabs.org]<https://urldefense.com/v3/__https:/www.clusterlabs.org/__;!!LpKI!mUN_tEyhzd935xq2QwiY_HDqrEQ9CEZjtAzHHca5jkRALA42VbYD_wgSGuBn2uERAcrCHMTtjmz6BdP_rt0VjeU$>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250328/882c7349/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 171551 bytes
Desc: image001.png
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250328/882c7349/attachment-0001.png>