[ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

Eric Robinson eric.robinson at psmnv.com
Fri Feb 26 12:23:47 EST 2021


> -----Original Message-----
> From: Digimer <lists at alteeve.ca>
> Sent: Friday, February 26, 2021 10:35 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> <users at clusterlabs.org>; Eric Robinson <eric.robinson at psmnv.com>
> Subject: Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went
> Down Anyway?
>
> On 2021-02-26 11:19 a.m., Eric Robinson wrote:
> > At 5:16 am Pacific time Monday, one of our cluster nodes failed and
> > its mysql services went down. The cluster did not automatically recover.
> >
> > We're trying to figure out:
> >
> >  1. Why did it fail?
> >  2. Why did it not automatically recover?
> >
> > The cluster did not recover until we manually executed...
> >
> > # pcs resource cleanup p_mysql_622
> >
> > OS: CentOS Linux release 7.5.1804 (Core)
> >
> > Cluster version:
> >
> > corosync.x86_64                  2.4.5-4.el7                     @base
> > corosync-qdevice.x86_64          2.4.5-4.el7                     @base
> > pacemaker.x86_64                 1.1.21-4.el7                    @base
> >
> > Two nodes: 001db01a, 001db01b
> >
> > The following log snippet is from node 001db01a:
> >
> > [root at 001db01a cluster]# grep "Feb 22 05:1[67]" corosync.log-20210223
>
> <snip>
>
> > Feb 22 05:16:30 [91682] 001db01a    pengine:  warning: cluster_status:
> Fencing and resource management disabled due to lack of quorum
>
> Seems like there was no quorum from this node's perspective, so it won't do
> anything. What does the other node's logs say?
>

The logs from the other node are at the bottom of the original email.

> What is the cluster configuration? Do you have stonith (fencing) configured?

2-node with a separate qdevice. No fencing.

> Quorum is a useful tool when things are working properly, but it doesn't help
> when things enter an undefined / unexpected state.
> When that happens, stonith saves you. So said another way, you must have
> stonith for a stable cluster, quorum is optional.
>

In this case, if fencing was enabled, which node would have fenced the other? Would they have gotten into a STONITH war?

More importantly, why did the failure of resource p_mysql_622 keep the whole cluster from recovering? As soon as I did 'pcs resource cleanup p_mysql_622' all the other resources recovered, but none of them are dependent on that resource.

> --
> Digimer
> Papers and Projects: https://alteeve.com/w/ "I am, somehow, less
> interested in the weight and convolutions of Einstein's brain than in the near
> certainty that people of equal talent have lived and died in cotton fields and
> sweatshops." - Stephen Jay Gould
Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.


More information about the Users mailing list