[ClusterLabs] Antw: Growing a cluster from 1 node without fencing

Mon Aug 14 12:46:27 UTC 2017

On 08/14/2017 12:20 PM, Ulrich Windl wrote:
> Hi!
>
> Have you tried studying the logs? Usually you get useful information from
> there (to share!).
>
> Regards,
> Ulrich
>
>>>> Edwin Török <edvin.torok at citrix.com> schrieb am 14.08.2017 um 11:51 in
> Nachricht <3d1653ad-50b5-07e3-9392-92d7d651324e at citrix.com>:
>> Hi,
>>
>>
>> When setting up a cluster with just 1 node with auto-tie-breaker and 
>> DLM, and incrementally adding more I got some unexpected fencing if the 
>> 2nd node doesn't join the cluster soon enough.
>>
>> What I also found surprising is that if the cluster has ever seen 2 
>> nodes, then turning off the 2nd node works fine and doesn't cause 
>> fencing (using auto-tie-breaker).
>>
>>
>> I have a hardware watchdog, and can reproduce the problem with these (or 
>> older) versions and sequence of steps:
>>
>> corosync-2.4.0-9.el7.x86_64
>> pacemaker-1.1.16-12.el7.x86_64
>> sbd-1.3.0-3.el7.x86_64
>> pcs-0.9.158-6.el7.x86_64
>>
>> pcs cluster destroy
>> rm /var/lib/corosync/* -f
>> pcs cluster auth -u hacluster cluster1 cluster2
>> pcs cluster setup --name cluster cluster1 --auto_tie_breaker=1
>> pcs stonith sbd enable

How does your /etc/sysconfig/sbd look like?
With just that pcs-command you get some default-config with
watchdog-only-support.
Without cluster-property stonith-watchdog-timeout set to a
value matching (twice is a good choice) the watchdog-timeout
configured in /etc/sysconfig/sbd (default = 5s) a node will never
assume the unseen partner as fenced.
Anyway watchdog-only-sbd is of very limited use in 2-node
scenarios. Kind of limits the availability to the one of the node
that would win the tie_breaker-game. But might still be useful
in certain scenarios of course. (like load-sharing ...)

>> pcs cluster start --all
>> pcs property set no-quorum-policy=ignore
>> # or pcs property set no-quorum-policy=freeze
>> # or pcs property set no-quorum-policy=suicide
>> pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s 
>> on-fail=fence clone interleave=true ordered=true
>> while ! dlm_tool join testls; do sleep 1; done
>> crm_mon -1
>> pcs cluster node add cluster2&
>> journalctl --follow
>>
>>
>> What am I doing wrong, and how can I avoid fencing?
>> I thought that setting no-quorum-policy to ignore would prevent this (if 

That will just prevents self-fencing in case of lost quorum.
Other reasons for self-fencing are still possible. e.g. failing of dlm
in your
case or a node becoming unclean.

Regards,
Klaus

>> I have just 1 node I don't really need fencing until the 2nd node is 
>> actually up), but if there are any active DLM lockspaces that doesn't 
>> seem to be the case.
>>
>> Thanks,
>> --Edwin
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> http://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org