[Pacemaker] DRBD Split-brain (recovered), but still showing "Failed Actions"

Wed Apr 11 13:48:25 EDT 2012

Thanks Andreas -- I'm not familiar with "maintenance-mode", very good to
know.

Before I go research your suggestion, is the basic idea that you can
enable maintenance mode from ANY node (or just the node with the failed
action?), restart pacemaker/corosync services on ALL nodes (or, again,
just the one with the failed action?) -- all without any cluster service
interruption -- and then disable maintenance mode once the cleaned up
"Failed Actions" have been resolved?

>
>Message: 3
>Date: Wed, 11 Apr 2012 00:12:10 +0200
>From: Andreas Kurz <andreas at hastexo.com>
>To: pacemaker at oss.clusterlabs.org
>Subject: Re: [Pacemaker] DRBD Split-brain (recovered), but still
>	showing "Failed Actions"
>Message-ID: <4F84B03A.4030004 at hastexo.com>
>Content-Type: text/plain; charset="iso-8859-1"
>
>On 04/10/2012 05:43 PM, Reid, Mike wrote:
>> Thank you for the suggestion, Andreas. Unfortunately, that does not
>>appear
>> to have cleaned up the Failed Actions either:
>> 
>>> crm resource cleanup msDRBD
>> 
>> Cleaning up resDRBD:0 on hostname2
>> Cleaning up resDRBD:1 on hostname2
>> Cleaning up resDRBD:0 on hostname1
>> Cleaning up resDRBD:1 on hostname1
>> 
>>> crm_mon -1
>> 
>> [...]
>> Failed actions:
>>     resDRBD:1_promote_0 (node=hostname2, call=530, rc=-2, status=Timed
>> Out): unknown exec error
>> 
>> 
>> Are there any other options that do not involve a failover + restart?
>
>If you switch your cluster into maintenance mode ...
>
>crm configure property maintenance-mode=true
>
>... you can stop pacemaker and even corosync without interrupting your
>services ... don't forget to disable it again after restart.
>
>Regards,
>Andreas
>
>-- 
>Need help with Pacemaker?
>http://www.hastexo.com/now