[ClusterLabs] In N+1 cluster, add/delete of one resource result in other node resources to restart
Ken Gaillot
kgaillot at redhat.com
Fri May 19 17:53:52 CEST 2017
On 05/19/2017 04:14 AM, Anu Pillai wrote:
> Hi Ken,
>
> Did you get any chance to go through the logs?
sorry, not yet
> Do you need any more details ?
>
> Regards,
> Aswathi
>
> On Tue, May 16, 2017 at 3:04 PM, Anu Pillai
> <anu.pillai.subscrib at gmail.com <mailto:anu.pillai.subscrib at gmail.com>>
> wrote:
>
> Hi,
>
> Please find attached debug logs for the stated problem as well as
> crm_mon command outputs.
> In this case we are trying to remove/delete res3 and system/node
> (0005B94238BC) from the cluster.
>
> *_Test reproduction steps_*
>
> Current Configuration of the cluster:
> 0005B9423910 - res2
> 0005B9427C5A - res1
> 0005B94238BC - res3
>
> *crm_mon output:*
>
> Defaulting to one-shot mode
> You need to have curses available at compile time to enable console mode
> Stack: corosync
> Current DC: 0005B9423910 (version 1.1.14-5a6cdd1) - partition with
> quorum
> Last updated: Tue May 16 12:21:23 2017 Last change: Tue May
> 16 12:13:40 2017 by root via crm_attribute on 0005B9423910
>
> 3 nodes and 3 resources configured
>
> Online: [ 0005B94238BC 0005B9423910 0005B9427C5A ]
>
> res2 (ocf::redundancy:RedundancyRA): Started 0005B9423910
> res1 (ocf::redundancy:RedundancyRA): Started 0005B9427C5A
> res3 (ocf::redundancy:RedundancyRA): Started 0005B94238BC
>
>
> Trigger the delete operation for res3 and node 0005B94238BC.
>
> Following commands applied from node 0005B94238BC
> $ pcs resource delete res3 --force
> $ crm_resource -C res3
> $ pcs cluster stop --force
>
> Following command applied from DC(0005B9423910)
> $ crm_node -R 0005B94238BC --force
>
>
> *crm_mon output:*
> *
> *
> Defaulting to one-shot mode
> You need to have curses available at compile time to enable console mode
> Stack: corosync
> Current DC: 0005B9423910 (version 1.1.14-5a6cdd1) - partition with
> quorum
> Last updated: Tue May 16 12:21:27 2017 Last change: Tue May
> 16 12:21:26 2017 by root via cibadmin on 0005B94238BC
>
> 3 nodes and 2 resources configured
>
> Online: [ 0005B94238BC 0005B9423910 0005B9427C5A ]
>
>
> Observation is remaining two resources res2 and res1 were stopped
> and started.
>
>
> Regards,
> Aswathi
>
> On Mon, May 15, 2017 at 8:11 PM, Ken Gaillot <kgaillot at redhat.com
> <mailto:kgaillot at redhat.com>> wrote:
>
> On 05/15/2017 06:59 AM, Klaus Wenninger wrote:
> > On 05/15/2017 12:25 PM, Anu Pillai wrote:
> >> Hi Klaus,
> >>
> >> Please find attached cib.xml as well as corosync.conf.
>
> Maybe you're only setting this while testing, but having
> stonith-enabled=false and no-quorum-policy=ignore is highly
> dangerous in
> any kind of network split.
>
> FYI, default-action-timeout is deprecated in favor of setting a
> timeout
> in op_defaults, but it doesn't hurt anything.
>
> > Why wouldn't you keep placement-strategy with default
> > to keep things simple. You aren't using any load-balancing
> > anyway as far as I understood it.
>
> It looks like the intent is to use placement-strategy to limit
> each node
> to 1 resource. The configuration looks good for that.
>
> > Haven't used resource-stickiness=INF. No idea which strange
> > behavior that triggers. Try to have it just higher than what
> > the other scores might some up to.
>
> Either way would be fine. Using INFINITY ensures that no other
> combination of scores will override it.
>
> > I might have overseen something in your scores but otherwise
> > there is nothing obvious to me.
> >
> > Regards,
> > Klaus
>
> I don't see anything obvious either. If you have logs around the
> time of
> the incident, that might help.
>
> >> Regards,
> >> Aswathi
> >>
> >> On Mon, May 15, 2017 at 2:46 PM, Klaus Wenninger <kwenning at redhat.com <mailto:kwenning at redhat.com>
> >> <mailto:kwenning at redhat.com <mailto:kwenning at redhat.com>>> wrote:
> >>
> >> On 05/15/2017 09:36 AM, Anu Pillai wrote:
> >> > Hi,
> >> >
> >> > We are running pacemaker cluster for managing our resources. We
> >> have 6
> >> > system running 5 resources and one is acting as standby. We have a
> >> > restriction that, only one resource can run in one node. But our
> >> > observation is whenever we add or delete a resource from cluster all
> >> > the remaining resources in the cluster are stopped and started back.
> >> >
> >> > Can you please guide us whether this normal behavior or we are
> >> missing
> >> > any configuration that is leading to this issue.
> >>
> >> It should definitely be possible to prevent this behavior.
> >> If you share your config with us we might be able to
> >> track that down.
> >>
> >> Regards,
> >> Klaus
> >>
> >> >
> >> > Regards
> >> > Aswathi
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> <mailto:Users at clusterlabs.org>
> http://lists.clusterlabs.org/mailman/listinfo/users
> <http://lists.clusterlabs.org/mailman/listinfo/users>
>
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org
>
>
>
More information about the Users
mailing list