[ClusterLabs] Antw: [EXT] fencing configuration

Tue Jun 7 06:24:56 EDT 2022

>>> Zoran Bošnjak <zoran.bosnjak at via.si> schrieb am 07.06.2022 um 10:26 in
Nachricht <1951254459.265.1654590407828.JavaMail.zimbra at via.si>:
> Hi, I need some help with correct fencing configuration in 5‑node cluster.
> 
> The speciffic issue is that there are 3 rooms, where in addition to node 
> failure scenario, each room can fail too (for example in case of room power

> failure or room network failure).
> 
> room0: [ node0 ]
> roomA: [ node1, node2 ]
> roomB: [ node3, node4 ]

First, it's good that even after a complete room failed, you will still have a
quorum.

> 
> ‑ ipmi board is present on each node
> ‑ watchdog timer is available
> ‑ shared storage is not available

The last one sounds adventuous to me, but I'll read on...

> 
> Please advice, what would be a proper fencing configuration in this case.

sbd using shared storage ;-)

> 
> The intention is to configure ipmi fencing (using "fence_idrac" agent) plus

> watchdog timer as a fallback. In other words, I would like to tell the 
> pacemaker: "If fencing is required, try to fence via ipmi. In case of ipmi 
> fence failure, after some timeout assume watchdog has rebooted the node, so

> it is safe to proceed, as if the (self)fencing had succeeded)."

An interesting question would be how to reach any node in a room if that room
failed.
A perfect solution would be to have a shared storage in every room and
configure 3-way sbd disks.
In addition you could use three-way mirroring of your data, just to be
paranoid ;-)

> 
> From the documentation is not clear to me whether this would be:
> a) multiple fencing where ipmi would be first level and sbd would be a 
> second level fencing (where sbd always succeeds)
> b) or this is considered a single level fencing with a timeout
> 
> I have tried to followed option b) and create stonith resource for each node

> and setup the stonith‑watchdog‑timeout, like this:
> 
> ‑‑‑
> # for each node... [0..4]
> export name=...
> export ip=...
> export password=...
> sudo pcs stonith create "fence_ipmi_$name" fence_idrac \
>     lanplus=1 ip="$ip" \
>     username="admin"  password="$password" \
>     pcmk_host_list="$name" op monitor interval=10m timeout=10s
> 
> sudo pcs property set stonith‑watchdog‑timeout=20
> 
> # start dummy resource
> sudo pcs resource create dummy ocf:heartbeat:Dummy op monitor interval=30s
> ‑‑‑
> 
> I am not sure if additional location constraints have to be specified for 
> stonith resources. For example: I have noticed that pacemaker will start a 
> stonith resource on the same node as the fencing target. Is this OK? 
> 
> Should there be any location constraints regarding fencing and rooms?
> 
> 'sbd' is running, properties are as follows:
> 
> ‑‑‑
> $ sudo pcs property show
> Cluster Properties:
>  cluster‑infrastructure: corosync
>  cluster‑name: debian
>  dc‑version: 2.0.3‑4b1f869f0f
>  have‑watchdog: true
>  last‑lrm‑refresh: 1654583431
>  stonith‑enabled: true
>  stonith‑watchdog‑timeout: 20
> ‑‑‑
> 
> Ipmi fencing (when the ipmi connection is alive) works correctly for each 
> node. The watchdog timer also seems to be working correctly. The problem is

> that dummy resource is not restarted as expected.

My favourite here is "crm_mon -1Arfj" ;-)

> 
> In the test scenario, the dummy resource is currently running on node1. I 
> have simulated node failure by unplugging the ipmi AND host network 
> interfaces from node1. The result was that node1 gets rebooted (by
watchdog), 
> but the rest of the pacemaker cluster was unable to fence node1 (this is 
> expected, since node1's ipmi is not accessible). The problem is that dummy 
> resource remains stopped and node1 unclean. I was expecting that 

"unclean" means fencing is either in progress, or did not succeed (like when
you have no fencing at all).

> stonith‑watchdog‑timeout kicks in, so that dummy resource gets restarted on

> some other node which has quorum. 

So that actually does the fencing. Logs could be interesting to read, too.

> 
> Obviously there is something wrong with my configuration, since this seems 
> to be a reasonably simple scenario for the pacemaker. Appreciate your help.

See above.

Regards,
Ulrich