[Pacemaker] Issues with Pacemaker / Corosync

Wed Jan 4 20:15:02 EST 2012

On Sat, Dec 24, 2011 at 9:44 AM, Arnold Krille <arnold at arnoldarts.de> wrote:
> Hi,
>
> On Friday 23 December 2011 16:03:37 Aravind M D wrote:
>>   I am facing some problem wth corosync and pacemaker implementation. I
>> have configured cluster on Debian squeeze, the package for corosync and
>> pacemaker is installed from backports.
>>   I am configuring two node cluster and i have configured one resource
>> also. Below is my configuration.
>>   root at nagt02a:~# crm configure show
>>   node nagt02
>>   node nagt02a
>>   primitive icinga lsb:icinga \
>>           op start interval="0" timeout="30s" \
>>           op stop interval="0" timeout="30s" \
>>           op monitor interval="30s" \
>>           meta multiple-active="stop_start"
>>   location prefer-nagt02 icinga 10: nagt02
>>   property $id="cib-bootstrap-options" \
>>           dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
>>           cluster-infrastructure="openais" \
>>           expected-quorum-votes="2" \
>>           stonith-enabled="false" \
>>           no-quorum-policy="ignore"
>>   Problem 1: When the service is active on nagt02 and if i manually start
>> the service on cgnagt02a the service is not disabling on nagt02a.
>
> I found that it will be stopped, but not as fast as you think it will.

No, it won't.
At least not with this configuration (and possibly not this version).

You'd need to explicitly monitor the 'stopped' role to force the
cluster into this level of paranoia.
By default we assume people with root access are not trying to corrupt
their data by manually starting cluster services.

> The
> monitoring action only runs on the active resource. But every now and then (I
> think every five to ten minutes but that is configurable) the cluster checks the
> whole status and therefor also detects services running where they shouldn't.
> With this you will probably find that once pacemaker sees the second icinga, it
> will shut down both to make sure and restart it on one node.
>
>>   Problem 2: For checking I have stopped the service on nagt02 and made
>> some changes on configuration files so service wont start again on nagt02.
>> What i am testing is when node comes from a failover and service was not
>> able to start on nagt02 it should start on nagt02a. But i am getting the
>> below error.
>>
>>   root at cgnagt02:~# crm_mon --one-shot
>>   Online: [ cgnagt02 cgnagt02a ]
>>    icinga (lsb:icinga):   Started cgnagt02 (unmanaged) FAILED
>>   Failed actions:
>>       icinga_monitor_30000 (node=cgnagt02, call=4, rc=6, status=complete):
>> not configured
>>       icinga_stop_0 (node=cgnagt02, call=5, rc=6, status=complete): not
>> configured
>
> Looks as if you making the service "not start" also made the service "not
> stop". And pacemaker won't start a service on one node which it can't shut
> down definitely on another node. Unless you configure fencing and the failed
> host gets killed by that I guess.

Correct. You can also use 'crm resource cleanup icinga' to have the
cluster redetect the current resource state.

>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>