[Pacemaker] Recovery after lost quorum

Denis Witt denis.witt at concepts-and-training.de
Tue Jun 4 20:43:30 EDT 2013


Am 05.06.2013 um 02:15 schrieb Andrew Beekhof <andrew at beekhof.net>:

>> Jun  5 01:11:06 test4 pengine: [18625]: WARN: cluster_status: We do not have quorum - fencing and resource management disabled
>> Jun  5 01:11:06 test4 pengine: [18625]: notice: LogActions: Start   pingtest:0#011(test4 - blocked)
>> Jun  5 01:11:06 test4 pengine: [18625]: notice: LogActions: Start   drbd:0#011(test4 - blocked)
> 
> Here's your reason.  We didn't get quorum until:

>> Jun  5 01:11:11 test4 crmd: [18626]: notice: ais_dispatch_message: Membership 128: quorum acquired

Hi Andrew,

I thought this means that there is a quorum. Anyway, crm status says:

root at test4:~# crm status
============
Last updated: Wed Jun  5 02:36:20 2013
Last change: Tue Jun  4 17:55:28 2013 via crm_attribute on backup3
Stack: openais
Current DC: test4 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
8 Resources configured.
============

Online: [ test4 backup3 ]
OFFLINE: [ test3 ]

But no resources are started, so I suspect there really is quorum. Anyway, I noticed, that, if I start pacemaker on the backup3-node the services are restarted, even if it sometime takes some time. So I might have to live with the "not installed" messages and start the backup3-node in standby-Mode as long no one comes up with a better solution. Maybe I'll fake the status of the monitors on this node and add some location-rules to avoid that resources will be moved to this node.

Thanks for your help.

Best regards,
Denis Witt





More information about the Pacemaker mailing list