[ClusterLabs] fencing configuration
Andrei Borzenkov
arvidjaar at gmail.com
Tue Jun 7 10:51:36 EDT 2022
On 07.06.2022 11:50, Klaus Wenninger wrote:
>>
>> From the documentation is not clear to me whether this would be:
>> a) multiple fencing where ipmi would be first level and sbd would be a second level fencing (where sbd always succeeds)
>> b) or this is considered a single level fencing with a timeout
>
> With b) falling back to watchdog-fencing wouldn't work properly
> although I remember
> some recent change that might make it fall back without issues.
b) works here:
Jun 07 17:35:50 ha2 pacemaker-controld[7069]: notice: Requesting
fencing (reboot) of node qnetd
Jun 07 17:35:50 ha2 pacemaker-fenced[7065]: notice: Client
pacemaker-controld.7069 wants to fence (reboot) qnetd using any device
Jun 07 17:35:50 ha2 pacemaker-fenced[7065]: notice: Requesting peer
fencing (reboot) targeting qnetd
Jun 07 17:35:50 ha2 pacemaker-fenced[7065]: notice: watchdog is not
eligible to fence (reboot) qnetd: static-list
Jun 07 17:35:50 ha2 pacemaker-schedulerd[7068]: warning: Calculated
transition 14 (with warnings), saving inputs in
/var/lib/pacemaker/pengine/pe-warn-95.bz2
Jun 07 17:35:50 ha2 pacemaker-fenced[7065]: notice: Requesting that ha1
perform 'reboot' action targeting qnetd
Jun 07 17:35:53 ha2 pacemaker-fenced[7065]: notice: Requesting that ha2
perform 'reboot' action targeting qnetd
Jun 07 17:35:53 ha2 pacemaker-fenced[7065]: notice: watchdog is not
eligible to fence (reboot) qnetd: static-list
Jun 07 17:35:55 ha2 stonith[11138]: external_reset_req: '_dummy reset'
for host qnetd failed with rc 1
Jun 07 17:35:57 ha2 stonith[11142]: external_reset_req: '_dummy reset'
for host qnetd failed with rc 1
Jun 07 17:35:57 ha2 pacemaker-fenced[7065]: error: Operation 'reboot'
[11141] targeting qnetd using dummy_stonith returned 1
Jun 07 17:35:57 ha2 pacemaker-fenced[7065]: warning:
dummy_stonith[11141] [ Performing: stonith -t external/_dummy -E -T
reset qnetd ]
Jun 07 17:35:57 ha2 pacemaker-fenced[7065]: warning:
dummy_stonith[11141] [ failed: qnetd 5 ]
Jun 07 17:35:57 ha2 pacemaker-fenced[7065]: notice: Couldn't find
anyone to fence (reboot) qnetd using any device
Jun 07 17:35:57 ha2 pacemaker-fenced[7065]: notice: Waiting 10s for
qnetd to self-fence (reboot) for client pacemaker-controld.7069
Jun 07 17:36:07 ha2 pacemaker-fenced[7065]: notice: Self-fencing
(reboot) by qnetd for pacemaker-controld.7069 assumed complete
Jun 07 17:36:07 ha2 pacemaker-fenced[7065]: notice: Operation 'reboot'
targeting qnetd by ha2 for pacemaker-controld.7069 at ha2: OK (complete)
Jun 07 17:36:07 ha2 pacemaker-controld[7069]: notice: Fence operation 7
for qnetd passed
Jun 07 17:36:07 ha2 pacemaker-controld[7069]: notice: Transition 14
(Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-95.bz2): Complete
Jun 07 17:36:07 ha2 pacemaker-controld[7069]: notice: State transition
S_TRANSITION_ENGINE -> S_IDLE
Jun 07 17:36:07 ha2 pacemaker-controld[7069]: notice: Peer qnetd was
terminated (reboot) by ha2 on behalf of pacemaker-controld.7069 at ha2: OK
The only gotcha is this stray error after everything have already completed.
Jun 07 17:37:05 ha2 pacemaker-fenced[7065]: notice: Peer's 'reboot'
action targeting qnetd for client pacemaker-controld.7069 timed out
Jun 07 17:37:05 ha2 pacemaker-fenced[7065]: notice: Couldn't find
anyone to fence (reboot) qnetd using any device
Jun 07 17:37:05 ha2 pacemaker-fenced[7065]: error:
request_peer_fencing: Triggered fatal assertion at fenced_remote.c:1799
: op->state < st_done
bor at bor-Latitude-E5450:~/src/ClusterLabs/pacemaker$
> I would try to go for a) as with a reasonably current
> pacemaker-version (iirc 2.1.0 and above)
> you should be able to make the watchdog-fencing-device visible as with
> other fencing-devices
Yep.
dummy_stonith
watchdog
2 fence devices found
> (just use fence_watchdog as the fence-agent - still implemented inside
> pacemaker
> fence-watchdog-binary actually just provides the meta-data).
> Like this you can limit watchdog-fencing to certain-nodes that do
> actually provide a proper
> hardware-watchdog and you can add it to a topology.
>
Well, as could be seen from above even though "watchdog" is not
eligible, pacemaker is still using it. So I am not sure it will work.
More information about the Users
mailing list