[ClusterLabs] In N+1 cluster, add/delete of one resource result in other node resources to restart
Anu Pillai
anu.pillai.subscrib at gmail.com
Wed May 24 01:58:26 EDT 2017
blank response for thread to appear in mailbox..pls ignore
On Tue, May 23, 2017 at 4:21 AM, Ken Gaillot <kgaillot at redhat.com> wrote:
> On 05/16/2017 04:34 AM, Anu Pillai wrote:
> > Hi,
> >
> > Please find attached debug logs for the stated problem as well as
> > crm_mon command outputs.
> > In this case we are trying to remove/delete res3 and system/node
> > (0005B94238BC) from the cluster.
> >
> > *_Test reproduction steps_*
> >
> > Current Configuration of the cluster:
> > 0005B9423910 - res2
> > 0005B9427C5A - res1
> > 0005B94238BC - res3
> >
> > *crm_mon output:*
> >
> > Defaulting to one-shot mode
> > You need to have curses available at compile time to enable console mode
> > Stack: corosync
> > Current DC: 0005B9423910 (version 1.1.14-5a6cdd1) - partition with quorum
> > Last updated: Tue May 16 12:21:23 2017 Last change: Tue May 16
> > 12:13:40 2017 by root via crm_attribute on 0005B9423910
> >
> > 3 nodes and 3 resources configured
> >
> > Online: [ 0005B94238BC 0005B9423910 0005B9427C5A ]
> >
> > res2 (ocf::redundancy:RedundancyRA): Started 0005B9423910
> > res1 (ocf::redundancy:RedundancyRA): Started 0005B9427C5A
> > res3 (ocf::redundancy:RedundancyRA): Started 0005B94238BC
> >
> >
> > Trigger the delete operation for res3 and node 0005B94238BC.
> >
> > Following commands applied from node 0005B94238BC
> > $ pcs resource delete res3 --force
> > $ crm_resource -C res3
> > $ pcs cluster stop --force
>
> I don't think "pcs resource delete" or "pcs cluster stop" does anything
> with the --force option. In any case, --force shouldn't be needed here.
>
> The crm_mon output you see is actually not what it appears. It starts with:
>
> May 16 12:21:27 [4661] 0005B9423910 crmd: notice: do_lrm_invoke:
> Forcing the status of all resources to be redetected
>
> This is usually the result of a "cleanup all" command. It works by
> erasing the resource history, causing pacemaker to re-probe all nodes to
> get the current state. The history erasure makes it appear to crm_mon
> that the resources are stopped, but they actually are not.
>
> In this case, I'm not sure why it's doing a "cleanup all", since you
> only asked it to cleanup res3. Maybe in this particular instance, you
> actually did "crm_resource -C"?
>
> > Following command applied from DC(0005B9423910)
> > $ crm_node -R 0005B94238BC --force
>
> This can cause problems. This command shouldn't be run unless the node
> is removed from both pacemaker's and corosync's configuration. If you
> actually are trying to remove the node completely, a better alternative
> would be "pcs cluster node remove 0005B94238BC", which will handle all
> of that for you. If you're not trying to remove the node completely,
> then you shouldn't need this command at all.
>
> >
> >
> > *crm_mon output:*
> > *
> > *
> > Defaulting to one-shot mode
> > You need to have curses available at compile time to enable console mode
> > Stack: corosync
> > Current DC: 0005B9423910 (version 1.1.14-5a6cdd1) - partition with quorum
> > Last updated: Tue May 16 12:21:27 2017 Last change: Tue May 16
> > 12:21:26 2017 by root via cibadmin on 0005B94238BC
> >
> > 3 nodes and 2 resources configured
> >
> > Online: [ 0005B94238BC 0005B9423910 0005B9427C5A ]
> >
> >
> > Observation is remaining two resources res2 and res1 were stopped and
> > started.
> >
> >
> > Regards,
> > Aswathi
> >
> > On Mon, May 15, 2017 at 8:11 PM, Ken Gaillot <kgaillot at redhat.com
> > <mailto:kgaillot at redhat.com>> wrote:
> >
> > On 05/15/2017 06:59 AM, Klaus Wenninger wrote:
> > > On 05/15/2017 12:25 PM, Anu Pillai wrote:
> > >> Hi Klaus,
> > >>
> > >> Please find attached cib.xml as well as corosync.conf.
> >
> > Maybe you're only setting this while testing, but having
> > stonith-enabled=false and no-quorum-policy=ignore is highly
> dangerous in
> > any kind of network split.
> >
> > FYI, default-action-timeout is deprecated in favor of setting a
> timeout
> > in op_defaults, but it doesn't hurt anything.
> >
> > > Why wouldn't you keep placement-strategy with default
> > > to keep things simple. You aren't using any load-balancing
> > > anyway as far as I understood it.
> >
> > It looks like the intent is to use placement-strategy to limit each
> node
> > to 1 resource. The configuration looks good for that.
> >
> > > Haven't used resource-stickiness=INF. No idea which strange
> > > behavior that triggers. Try to have it just higher than what
> > > the other scores might some up to.
> >
> > Either way would be fine. Using INFINITY ensures that no other
> > combination of scores will override it.
> >
> > > I might have overseen something in your scores but otherwise
> > > there is nothing obvious to me.
> > >
> > > Regards,
> > > Klaus
> >
> > I don't see anything obvious either. If you have logs around the
> time of
> > the incident, that might help.
> >
> > >> Regards,
> > >> Aswathi
> > >>
> > >> On Mon, May 15, 2017 at 2:46 PM, Klaus Wenninger <
> kwenning at redhat.com <mailto:kwenning at redhat.com>
> > >> <mailto:kwenning at redhat.com <mailto:kwenning at redhat.com>>> wrote:
> > >>
> > >> On 05/15/2017 09:36 AM, Anu Pillai wrote:
> > >> > Hi,
> > >> >
> > >> > We are running pacemaker cluster for managing our
> resources. We
> > >> have 6
> > >> > system running 5 resources and one is acting as standby. We
> have a
> > >> > restriction that, only one resource can run in one node.
> But our
> > >> > observation is whenever we add or delete a resource from
> cluster all
> > >> > the remaining resources in the cluster are stopped and
> started back.
> > >> >
> > >> > Can you please guide us whether this normal behavior or we
> are
> > >> missing
> > >> > any configuration that is leading to this issue.
> > >>
> > >> It should definitely be possible to prevent this behavior.
> > >> If you share your config with us we might be able to
> > >> track that down.
> > >>
> > >> Regards,
> > >> Klaus
> > >>
> > >> >
> > >> > Regards
> > >> > Aswathi
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170524/e5c94817/attachment-0003.html>
More information about the Users
mailing list