[ClusterLabs] All clones are stopped when one of them fails

Thu Dec 10 04:13:54 EST 2020

On Thu, Dec 10, 2020 at 1:08 AM Reid Wahl <nwahl at redhat.com> wrote:
>
> Thanks. I see it's only reproducible with stonith-enabled=false.
> That's the step I was skipping previously, as I always have stonith
> enabled in my clusters.
>
> I'm not sure whether that's expected behavior for some reason when
> stonith is disabled. Maybe someone else (e.g., Ken) can weigh in.

Never mind. This was a mistake on my part: I didn't re-add the stonith
**device** configuration when I re-enabled stonith.

So the behavior is the same regardless of whether stonith is enabled
or not. I attribute it to the OCF_ERR_CONFIGURED error.

Why exactly is this behavior unexpected, from your point of view?

Ref:
  - https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Administration/#_how_are_ocf_return_codes_interpreted

> I also noticed that the state4.xml file has a return code of 6 for the
> resource's start operation. That's an OCF_ERR_CONFIGURED, which is a
> fatal error. At least for primitive resources, this type of error
> prevents the resource from starting anywhere. So I'm somewhat
> surprised that the clone instances don't stop on all nodes even when
> fencing **is** enabled.
>
>
> Without stonith:
>
> Allocation scores:
> pcmk__clone_allocate: vg.bv_sanlock-clone allocation score on node1: -INFINITY
> pcmk__clone_allocate: vg.bv_sanlock-clone allocation score on node2: -INFINITY
> pcmk__clone_allocate: vg.bv_sanlock:0 allocation score on node1: -INFINITY
> pcmk__clone_allocate: vg.bv_sanlock:0 allocation score on node2: -INFINITY
> pcmk__clone_allocate: vg.bv_sanlock:1 allocation score on node1: -INFINITY
> pcmk__clone_allocate: vg.bv_sanlock:1 allocation score on node2: -INFINITY
> pcmk__native_allocate: vg.bv_sanlock:0 allocation score on node1: -INFINITY
> pcmk__native_allocate: vg.bv_sanlock:0 allocation score on node2: -INFINITY
> pcmk__native_allocate: vg.bv_sanlock:1 allocation score on node1: -INFINITY
> pcmk__native_allocate: vg.bv_sanlock:1 allocation score on node2: -INFINITY
>
> Transition Summary:
>  * Stop       vg.bv_sanlock:0     ( node2 )   due to node availability
>  * Stop       vg.bv_sanlock:1     ( node1 )   due to node availability
>
> Executing cluster transition:
>  * Pseudo action:   vg.bv_sanlock-clone_stop_0
>  * Resource action: vg.bv_sanlock   stop on node2
>  * Resource action: vg.bv_sanlock   stop on node1
>  * Pseudo action:   vg.bv_sanlock-clone_stopped_0
>
>
>
> With stonith:
>
> Allocation scores:
> pcmk__clone_allocate: vg.bv_sanlock-clone allocation score on node1: -INFINITY
> pcmk__clone_allocate: vg.bv_sanlock-clone allocation score on node2: -INFINITY
> pcmk__clone_allocate: vg.bv_sanlock:0 allocation score on node1: -INFINITY
> pcmk__clone_allocate: vg.bv_sanlock:0 allocation score on node2: -INFINITY
> pcmk__clone_allocate: vg.bv_sanlock:1 allocation score on node1: -INFINITY
> pcmk__clone_allocate: vg.bv_sanlock:1 allocation score on node2: -INFINITY
> pcmk__native_allocate: vg.bv_sanlock:0 allocation score on node1: -INFINITY
> pcmk__native_allocate: vg.bv_sanlock:0 allocation score on node2: -INFINITY
> pcmk__native_allocate: vg.bv_sanlock:1 allocation score on node1: -INFINITY
> pcmk__native_allocate: vg.bv_sanlock:1 allocation score on node2: -INFINITY
>
> Transition Summary:
>
> Executing cluster transition:
>
> On Wed, Dec 9, 2020 at 10:33 PM Pavel Levshin <lpk at 581.spb.su> wrote:
> >
> >
> > See the file attached. This one has been produced and tested with
> > pacemaker 1.1.16 (RHEL 7).
> >
> >
> > --
> >
> > Pavel
> >
> >
> > 08.12.2020 10:14, Reid Wahl :
> > > Can you provide the state4.xml file that you're using? I'm unable to
> > > reproduce this issue by the clone instance to fail on one node.
> > >
> > > Might need some logs as well.
> > >
> > > On Mon, Dec 7, 2020 at 10:40 PM Pavel Levshin <lpk at 581.spb.su> wrote:
> > >> Hello.
> > >>
> > >>
> > >> Despite many years of Pacemaker use, it never stops fooling me...
> > >>
> > >>
> > >> This time, I have faced a trivial problem. In my new setup, the cluster consists of several identical nodes. A clone resource (vg.sanlock) is started on every node, ensuring it has access to SAN storage. Almost all other resources are colocated and ordered after vg.sanlock.
> > >>
> > >>
> > >> This day, I've started a node, and vg.sanlock has failed to start. Then the cluster has desided to stop all the clone instances "due to node availability", taking down all other resources by dependencies. This seemes illogical to me. In the case of a failing clone, I would prefer to see it stopping on one node only. How do I do it properly?
> > >>
> > >>
> > >> I've tried this config with Pacemaker 2.0.3 and 1.1.16, the behaviour stays the same.
> > >>
> > >>
> > >> Reduced test config here:
> > >>
> > >>
> > >> pcs cluster auth test-pcmk0 test-pcmk1 <>/dev/tty
> > >>
> > >> pcs cluster setup --name test-pcmk test-pcmk0 test-pcmk1 --transport udpu \
> > >>
> > >>    --auto_tie_breaker 1
> > >>
> > >> pcs cluster start --all --wait=60
> > >>
> > >> pcs cluster cib tmp-cib.xml
> > >>
> > >> cp tmp-cib.xml tmp-cib.xml.deltasrc
> > >>
> > >> pcs -f tmp-cib.xml property set stonith-enabled=false
> > >>
> > >> pcs -f tmp-cib.xml resource defaults resource-stickiness=100
> > >>
> > >> pcs -f tmp-cib.xml resource create vg.sanlock ocf:pacemaker:Dummy \
> > >>
> > >>    op monitor interval=10 timeout=20 start interval=0s stop interval=0s \
> > >>
> > >>    timeout=20
> > >>
> > >> pcs -f tmp-cib.xml resource clone vg.sanlock interleave=true
> > >>
> > >> pcs cluster cib-push tmp-cib.xml diff-against=tmp-cib.xml.deltasrc
> > >>
> > >>
> > >>
> > >> And here goes cluster reaction to the failure:
> > >>
> > >>
> > >> # crm_simulate -x state4.xml -S
> > >>
> > >>
> > >>
> > >> Current cluster status:
> > >>
> > >> Online: [ test-pcmk0 test-pcmk1 ]
> > >>
> > >>
> > >>
> > >> Clone Set: vg.sanlock-clone [vg.sanlock]
> > >>
> > >>       vg.sanlock      (ocf::pacemaker:Dummy): FAILED test-pcmk0
> > >>
> > >>       Started: [ test-pcmk1 ]
> > >>
> > >>
> > >>
> > >> Transition Summary:
> > >>
> > >> * Stop       vg.sanlock:0     ( test-pcmk1 )   due to node availability
> > >>
> > >> * Stop       vg.sanlock:1     ( test-pcmk0 )   due to node availability
> > >>
> > >>
> > >>
> > >> Executing cluster transition:
> > >>
> > >> * Pseudo action:   vg.sanlock-clone_stop_0
> > >>
> > >> * Resource action: vg.sanlock   stop on test-pcmk1
> > >>
> > >> * Resource action: vg.sanlock   stop on test-pcmk0
> > >>
> > >> * Pseudo action:   vg.sanlock-clone_stopped_0
> > >>
> > >> * Pseudo action:   all_stopped
> > >>
> > >>
> > >>
> > >> Revised cluster status:
> > >>
> > >> Online: [ test-pcmk0 test-pcmk1 ]
> > >>
> > >>
> > >>
> > >> Clone Set: vg.sanlock-clone [vg.sanlock]
> > >>
> > >>       Stopped: [ test-pcmk0 test-pcmk1 ]
> > >>
> > >>
> > >> As a sidenote, if I make those clones globally-unique, they seem to behave properly. But nowhere I found a reference to this solution. In general, globally-unique clones are referred to only where resource agents make distinction between clone instances. This is not the case.
> > >>
> > >>
> > >> --
> > >>
> > >> Thanks,
> > >>
> > >> Pavel
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> Manage your subscription:
> > >> https://lists.clusterlabs.org/mailman/listinfo/users
> > >>
> > >> ClusterLabs home: https://www.clusterlabs.org/
> > >
> > >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> --
> Regards,
>
> Reid Wahl, RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA

-- 
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA