[ClusterLabs] no-quorum-policy=ignore is (Deprecated ) and replaced with other options but not an effective solution

Tue Jun 27 11:03:31 EDT 2023

On 27.06.2023 07:21, Priyanka Balotra wrote:
> Hi Andrei,
> After this state the system went through some more fencings and we saw the
> following state:
> 
> :~ # crm status
> Cluster Summary:
>    * Stack: corosync
>    * Current DC: FILE-2 (version
> 2.1.2+20211124.ada5c3b36-150400.2.43-2.1.2+20211124.ada5c3b36) - partition
> with quorum

It says "partition with quorum" so what exactly is the problem?

>    * Last updated: Mon Jun 26 12:44:15 2023
>    * Last change:  Mon Jun 26 12:41:12 2023 by root via cibadmin on FILE-2
>    * 4 nodes configured
>    * 11 resource instances configured
> 
> Node List:
>    * Node FILE-1: UNCLEAN (offline)
>    * Node FILE-4: UNCLEAN (offline)
>    * Online: [ FILE-2 ]
>    * Online: [ FILE-3 ]
> 
> At this stage FILE-1 and FILE-4 were continuously getting fenced (we have
> device based stonith configured but the resource was not up ) .
> Two nodes were online and two were offline. So quorum wasn't attained
> again.
> 1)  For such a scenario we need help to be able to have one cluster live .
> 2)  And in cases where only one node of the cluster is up and others are
> down we need the resources and cluster to be up .
> 
> Thanks
> Priyanka
> 
> On Tue, Jun 27, 2023 at 12:25 AM Andrei Borzenkov <arvidjaar at gmail.com>
> wrote:
> 
>> On 26.06.2023 21:14, Priyanka Balotra wrote:
>>> Hi All,
>>> We are seeing an issue where we replaced no-quorum-policy=ignore with
>> other
>>> options in corosync.conf order to simulate the same behaviour :
>>>
>>>
>>> *     wait_for_all: 0*
>>>
>>> *        last_man_standing: 1        last_man_standing_window: 20000*
>>>
>>> There was another property (auto-tie-breaker) tried but couldn't
>> configure
>>> it as crm did not recognise this property.
>>>
>>> But even after using these options, we are seeing that system is not
>>> quorate if at least half of the nodes are not up.
>>>
>>> Some properties from crm config are as follows:
>>>
>>>
>>>
>>> *primitive stonith-sbd stonith:external/sbd \        params
>>> pcmk_delay_base=5s.*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *.property cib-bootstrap-options: \        have-watchdog=true \
>>>
>> dc-version="2.1.2+20211124.ada5c3b36-150400.2.43-2.1.2+20211124.ada5c3b36"
>>> \        cluster-infrastructure=corosync \        cluster-name=FILE \
>>>     stonith-enabled=true \        stonith-timeout=172 \
>>> stonith-action=reboot \        stop-all-resources=false \
>>> no-quorum-policy=ignorersc_defaults build-resource-defaults: \
>>> resource-stickiness=1rsc_defaults rsc-options: \
>>> resource-stickiness=100 \        migration-threshold=3 \
>>> failure-timeout=1m \        cluster-recheck-interval=10minop_defaults
>>> op-options: \        timeout=600 \        record-pending=true*
>>>
>>> On a 4-node setup when the whole cluster is brought up together we see
>>> error logs like:
>>>
>>> *2023-06-26T11:35:17.231104+00:00 FILE-1 pacemaker-schedulerd[26359]:
>>> warning: Fencing and resource management disabled due to lack of quorum*
>>>
>>> *2023-06-26T11:35:17.231338+00:00 FILE-1 pacemaker-schedulerd[26359]:
>>> warning: Ignoring malformed node_state entry without uname*
>>>
>>> *2023-06-26T11:35:17.233771+00:00 FILE-1 pacemaker-schedulerd[26359]:
>>> warning: Node FILE-2 is unclean!*
>>>
>>> *2023-06-26T11:35:17.233857+00:00 FILE-1 pacemaker-schedulerd[26359]:
>>> warning: Node FILE-3 is unclean!*
>>>
>>> *2023-06-26T11:35:17.233957+00:00 FILE-1 pacemaker-schedulerd[26359]:
>>> warning: Node FILE-4 is unclean!*
>>>
>>
>> According to this output FILE-1 lost connection to three other nodes, in
>> which case it cannot be quorate.
>>
>>>
>>> Kindly help correct the configuration to make the system function
>> normally
>>> with all resources up, even if there is just one node up.
>>>
>>> Please let me know if any more info is needed.
>>>
>>> Thanks
>>> Priyanka
>>>
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/