[Pacemaker] DRBD Split-brain (recovered), but still showing "Failed Actions"

Thu Apr 12 17:13:14 EDT 2012

On 04/11/2012 07:48 PM, Reid, Mike wrote:
> Thanks Andreas -- I'm not familiar with "maintenance-mode", very good to
> know.
> 
> 
> Before I go research your suggestion, is the basic idea that you can
> enable maintenance mode from ANY node (or just the node with the failed
> action?), restart pacemaker/corosync services on ALL nodes (or, again,
> just the one with the failed action?) -- all without any cluster service
> interruption -- and then disable maintenance mode once the cleaned up
> "Failed Actions" have been resolved?

It's a cluster wide option, so enabling it from any is fine. And yes,
stop pacemaker/corosync on all nodes (while in maintenance-mode) and
then start pacemaker/corosync again on all nodes .... old cluster status
is gone ... check your cluster and disable maintenance-mode -- no
service interruption expected

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> 
>>
>> Message: 3
>> Date: Wed, 11 Apr 2012 00:12:10 +0200
>> From: Andreas Kurz <andreas at hastexo.com>
>> To: pacemaker at oss.clusterlabs.org
>> Subject: Re: [Pacemaker] DRBD Split-brain (recovered), but still
>> 	showing "Failed Actions"
>> Message-ID: <4F84B03A.4030004 at hastexo.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> On 04/10/2012 05:43 PM, Reid, Mike wrote:
>>> Thank you for the suggestion, Andreas. Unfortunately, that does not
>>> appear
>>> to have cleaned up the Failed Actions either:
>>>
>>>> crm resource cleanup msDRBD
>>>
>>> Cleaning up resDRBD:0 on hostname2
>>> Cleaning up resDRBD:1 on hostname2
>>> Cleaning up resDRBD:0 on hostname1
>>> Cleaning up resDRBD:1 on hostname1
>>>
>>>> crm_mon -1
>>>
>>> [...]
>>> Failed actions:
>>>     resDRBD:1_promote_0 (node=hostname2, call=530, rc=-2, status=Timed
>>> Out): unknown exec error
>>>
>>>
>>> Are there any other options that do not involve a failover + restart?
>>
>> If you switch your cluster into maintenance mode ...
>>
>> crm configure property maintenance-mode=true
>>
>> ... you can stop pacemaker and even corosync without interrupting your
>> services ... don't forget to disable it again after restart.
>>
>> Regards,
>> Andreas
>>
>> -- 
>> Need help with Pacemaker?
>> http://www.hastexo.com/now
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120412/fd6cb71b/attachment-0003.sig>