[ClusterLabs] In N+1 cluster, add/delete of one resource result in other node resources to restart

Mon May 15 16:41:44 CEST 2017

On 05/15/2017 06:59 AM, Klaus Wenninger wrote:
> On 05/15/2017 12:25 PM, Anu Pillai wrote:
>> Hi Klaus,
>>
>> Please find attached cib.xml as well as corosync.conf.

Maybe you're only setting this while testing, but having
stonith-enabled=false and no-quorum-policy=ignore is highly dangerous in
any kind of network split.

FYI, default-action-timeout is deprecated in favor of setting a timeout
in op_defaults, but it doesn't hurt anything.

> Why wouldn't you keep placement-strategy with default
> to keep things simple. You aren't using any load-balancing
> anyway as far as I understood it.

It looks like the intent is to use placement-strategy to limit each node
to 1 resource. The configuration looks good for that.

> Haven't used resource-stickiness=INF. No idea which strange
> behavior that triggers. Try to have it just higher than what
> the other scores might some up to.

Either way would be fine. Using INFINITY ensures that no other
combination of scores will override it.

> I might have overseen something in your scores but otherwise
> there is nothing obvious to me.
> 
> Regards,
> Klaus

I don't see anything obvious either. If you have logs around the time of
the incident, that might help.

>> Regards,
>> Aswathi
>>
>> On Mon, May 15, 2017 at 2:46 PM, Klaus Wenninger <kwenning at redhat.com
>> <mailto:kwenning at redhat.com>> wrote:
>>
>>     On 05/15/2017 09:36 AM, Anu Pillai wrote:
>>     > Hi,
>>     >
>>     > We are running pacemaker cluster for managing our resources. We
>>     have 6
>>     > system running 5 resources and one is acting as standby. We have a
>>     > restriction that, only one resource can run in one node. But our
>>     > observation is whenever we add or delete a resource from cluster all
>>     > the remaining resources in the cluster are stopped and started back.
>>     >
>>     > Can you please guide us whether this normal behavior or we are
>>     missing
>>     > any configuration that is leading to this issue.
>>
>>     It should definitely be possible to prevent this behavior.
>>     If you share your config with us we might be able to
>>     track that down.
>>
>>     Regards,
>>     Klaus
>>
>>     >
>>     > Regards
>>     > Aswathi