[Pacemaker] Problems when quorum lost for a short period of time

Andrew Beekhof andrew at beekhof.net
Wed Oct 2 00:55:49 EDT 2013


On 02/10/2013, at 6:26 AM, Lev Sidorenko <levs at securemedia.co.nz> wrote:

> Hello All!
> 
> I have a 4-nodes cluster setup.
> 
> It is actually 2 nodes for main+stanby and another two nodes just for
> provide quorum.

1 extra would have been enough

> 
> So, all resources run on the main node but only DRBD-slave runs on the
> standby node.
> 
> I have no-quorum-policy="stop"
> 
> So, sometimes main node looses connection to the cluster and reports
> "quorum lost" but after 1-2 seconds connection re-establish and reports
> "quorum retained"
> This causes a big problem: as soon main node lost quorum it starts to
> stop all resources. In the same time the second node starts to start
> resources. After couple of seconds main node rejoins cluster but still
> does not manage to stop all resources and part of resources already
> started on the second node. So, I have lots of conflicts between
> resources on these two nodes.
> 
> I tried to setup no-quorum-policy="suicide" hoping that as soon as main
> node lost connection to the cluster it will reboot itself which will
> give enough time for the second node start all of processes and become a
> main one.
> But with no-quorum-policy="suicide" main node just trying to STONITH all
> of others nodes but not reboot itself.

It will do that last IIRC

> 
> So: the question is: how can I setup to instantly reboot a node when the
> node detects that quorum lost?

Why don't you tweak the timings in corosync.conf (guess, you dont say what you're using) to be more tolerant of these blips instead?

> 
> Thank you in advance!
> 
> With the best regards,
> Lev Sidorenko.
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131002/d1209f9c/attachment-0003.sig>


More information about the Pacemaker mailing list