[Pacemaker] resource is too active problem in a 2-node cluster

Thu Feb 13 21:40:45 UTC 2014

Hello,

I am still looking for help on these "unknown error(1)" messages from pengine. Is my assessment correct that these error messages are the root cause for pacemaker to think that resource is active on both nodes? Any help will be much appreciated.

Ajay
________________________________________
From: Ajay Aggarwal [aaggarwal at verizon.com]
Sent: Tuesday, February 11, 2014 9:39 AM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] resource is too active problem in a 2-node cluster

Yes, we have cman (version: cman-3.0.12.1-49). We use manual fencing ( I
know it is not recommended).  There is an external monitoring and
fencing service that we use (our own).

Perhaps subject line "resource is too active problem in a 2-node
cluster" was misleading. Real problem is that resource is *NOT* too
active, but pacemaker thinks it is. Which leads to undesirable recovery
procedure. See log lines below

Feb 04 11:27:38 [45167] gol-5-7-0    pengine:  warning: unpack_rsc_op:     Processing failed op monitor for GOL-HA on gol-5-7-0: unknown error (1)
Feb 04 11:27:38 [45167] gol-5-7-0    pengine:  warning: unpack_rsc_op:     Processing failed op monitor for GOL-HA on gol-5-7-6: unknown error (1)
Feb 04 11:27:38 [45167] gol-5-7-0    pengine:    error: native_create_actions:     Resource GOL-HA (ocf::script.sh) is active on 2 nodes attempting recovery

On 02/10/2014 09:43 PM, Digimer wrote:
> On 10/02/14 09:13 PM, Aggarwal, Ajay wrote:
>> I have a 2 node cluster with no-quorum-policy=ignore. I call these
>> nodes as node-0 and node-1. In addition, I have two cluster resources
>> in a group; an IP-address and an OCF script.
>
> Turning off quorum on a 2-node cluster is fine, in fact, it's
> required. However, that makes stonith all the more important. Without
> stonith, in any cluster but in particualr on two node clusters, things
> will not work right.
>
> First and foremost; Configure stonith and test to make sure it works.
>
>>     Pacemaker version: 1.1.10
>>     Corosync version: 1-4.1-15
>>     OS: CentOS 6.4
>
> With CentOS/RHEL 6, you need cman as well. Please be sure to also
> configure fence_pcmk in cluster.conf to "hook" it into pacemaker's
> real fencing.
>
>> What am I doing wrong?
> <snip>
>>          <nvpair id="cib-bootstrap-options-stonith-enabled"
>> name="stonith-enabled" value="false"/>
>
> That. :)
>
> Once you have stonith working, see if the problem remains.
>

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org