[ClusterLabs] Antw: [EXT] Re: Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

Mon Mar 1 02:10:11 EST 2021

>>> Digimer <lists at alteeve.ca> schrieb am 26.02.2021 um 17:34 in Nachricht
<699432c7-89a6-41bf-c805-f4a7a0a4a8cd at alteeve.ca>:
> On 2021‑02‑26 11:19 a.m., Eric Robinson wrote:
>> At 5:16 am Pacific time Monday, one of our cluster nodes failed and its
>> mysql services went down. The cluster did not automatically recover.
>> 
>> We’re trying to figure out:
>> 
>>  1. Why did it fail?
>>  2. Why did it not automatically recover?
>> 
>> The cluster did not recover until we manually executed…
>> 
>> # pcs resource cleanup p_mysql_622
>> 
>> OS: CentOS Linux release 7.5.1804 (Core)
>> 
>> Cluster version:
>> 
>> corosync.x86_64                  2.4.5‑4.el7                     @base
>> corosync‑qdevice.x86_64          2.4.5‑4.el7                     @base
>> pacemaker.x86_64                 1.1.21‑4.el7                    @base
>> 
>> Two nodes: 001db01a, 001db01b
>> 
>> The following log snippet is from node 001db01a:
>> 
>> [root at 001db01a cluster]# grep "Feb 22 05:1[67]" corosync.log‑20210223
> 
> <snip>
> 
>> Feb 22 05:16:30 [91682] 001db01a    pengine:  warning: cluster_status:  
> Fencing and resource management disabled due to lack of quorum
> 
> Seems like there was no quorum from this node's perspective, so it won't
> do anything. What does the other node's logs say?

@Digimer: The other node's log was included ;-)

Regards,
Ulrich