[ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

Wed Nov 22 23:09:03 EST 2017

22.11.2017 22:45, Klaus Wenninger пишет:
> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
>> VM on VSphere using shared VMDK as SBD. During basic tests by killing
>> corosync and forcing STONITH pacemaker was not started after reboot.
>> In logs I see during boot
> Using a two node cluster with a single shared disk might
> be dangerous if using sbd before 1.3.1. (if pacemaker-watcher
> is enabled a loss of the virtual-disk will make the node
> fall back to quorum  - which doesn't really tell much in case
> of two node clusters - so your disk will possibly become a
> single point of failure - even worse you will get corruption
> if the disk is lost - the side that is still able to write to the
> disk will think it has fenced the other while that doesn't see
> the poison-pill but is still happy having quorum due to the
> two node corosync feature)
>>

Given one single external shared storage array is there much advantages
in adding more devices? I just followed SUSE best practices paper and
documentation:

One Device
The most simple implementation. It is appropriate for clusters where all
of your data is on the same shared storage.

https://www.suse.com/docrep/documents/crfn7g3wji/sap_hana_sr_cost_optimized_scenario_12_sp1.pdf

(cluster is configured basically as in the latter link, names adjusted).

I suppose, VSphere adds some possible source of corruption so having
several devices across different datastores may be considered.
Unfortunately I had no response to my general question about SBD in
virtual environment so it probably not that common ... :)

>> Nov 22 16:04:56 sapprod01s crmd[3151]:     crit: We were allegedly
>> just fenced by sapprod01p for sapprod01p
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
>> process (3151) can no longer be respawned,
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down Pacemaker
>>
>> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
>> stonith with SBD always takes msgwait (at least, visually host is not
>> declared as OFFLINE until 120s passed). But VM rebots lightning fast
>> and is up and running long before timeout expires.
>>
>> I think I have seen similar report already. Is it something that can
>> be fixed by SBD/pacemaker tuning?
> Don't know it from sbd but have seen where fencing using
> the cycle-method with machines that boot quickly leads to
> strange behavior.
> If you configure sbd to not clear the disk-slot on startup
> (SBD_START_MODE=clean) it should be left to the other
> side to do that which should prevent the other node from
> coming up while the one fencing is still waiting. 

That's what happens already and that I would like to (be able to) avoid.

> You might
> set the method from cycle to off/on to make the fencing
> side clean the slot.
> 

Hmm ... but what would power on system which is self powered off by SBD?

Also this is not clear from SBD documentation - does it behave
differently when stonith is set to reboot or power cycle?

>>
>> I can provide full logs tomorrow if needed.
> Yes would be interesting to see more ...
> 

OK, today I setup another cluster, will see if I get the same behavior
and collect logs then.

> If what I'm writing doesn't make too much sense
> to you this might be due to me not really knowing
> how sbd is configured with SLES ;-)
> 

It does make all sort of sense, just I'm not so deep in that stuff.