[ClusterLabs] crmsh resource failcount does not appear to work

Wed Dec 27 11:03:57 UTC 2017

On Wed, Dec 27, 2017 at 11:40 AM, Kristoffer Grönlund
<deceiver.g at gmail.com> wrote:
>
> Andrei Borzenkov <arvidjaar at gmail.com> writes:
>
> > As far as I can tell, pacemaker acts on failcount attributes qualified
> > by operation name, while crm sets/queries unqualified attribute; I do
> > not see any syntax to set fail-count for specific operation in crmsh.
>
> crmsh uses crm_attribute to get the failcount. It could be that this
> usage has stopped working as of 1.1.17..
>

There is probably misunderstanding. The problem is what attribute is
used, not how it is set.  crmsh sets (and as far as I can tell always
set) attribute with name fail-count-<resource> while pacemaker
internally sets and queries attributes with name
fail-count-<resource>#<operation>.

It is possible that this has changed in recent pacemaker versions of
course ... yep, here is crm_failcount commit that implemented new
(per-operation) failcounts. Which means "crm resource failcount set"
without qualifying by operation is simply not valid ... actually
crm_failcount will refuse to set failcount at all (only clear it).

https://github.com/ClusterLabs/pacemaker/commit/8323616179dc3f8038c6a69e7323757bd1feacb1#diff-6e58482648938fd488a920b9902daac4

>
> Cheers,
> Kristoffer
>
> >
> > ha1:~ # rpm -q crmsh
> > crmsh-4.0.0+git.1511604050.816cb0f5-1.1.noarch
> > ha1:~ # crm_mon -1rf
> > Stack: corosync
> > Current DC: ha2 (version 1.1.17-3.3-36d2962a8) - partition with quorum
> > Last updated: Sun Dec 24 10:55:54 2017
> > Last change: Sun Dec 24 10:55:47 2017 by hacluster via crmd on ha2
> >
> > 2 nodes configured
> > 4 resources configured
> >
> > Online: [ ha1 ha2 ]
> >
> > Full list of resources:
> >
> >  stonith-sbd  (stonith:external/sbd): Started ha1
> >  rsc_dummy_1  (ocf::pacemaker:Dummy): Started ha2
> >  Master/Slave Set: ms_Stateful_1 [rsc_Stateful_1]
> >      Masters: [ ha1 ]
> >      Slaves: [ ha2 ]
> >
> > Migration Summary:
> > * Node ha2:
> > * Node ha1:
> > ha1:~ # echo xxx > /run/Stateful-rsc_Stateful_1.state
> > ha1:~ # crm_failcount -G -r rsc_Stateful_1
> > scope=status  name=fail-count-rsc_Stateful_1 value=1
> > ha1:~ # crm resource failcount rsc_Stateful_1 show ha1
> > scope=status  name=fail-count-rsc_Stateful_1 value=0
> > ha1:~ # crm resource failcount rsc_Stateful_1 set ha1 4
> > ha1:~ # crm_failcount -G -r rsc_Stateful_1
> > scope=status  name=fail-count-rsc_Stateful_1 value=1
> > ha1:~ # crm resource failcount rsc_Stateful_1 show ha1
> > scope=status  name=fail-count-rsc_Stateful_1 value=4
> > ha1:~ # cibadmin -Q | grep fail-count
> >           <nvpair
> > id="status-1084752129-fail-count-rsc_Stateful_1.monitor_10000"
> > name="fail-count-rsc_Stateful_1#monitor_10000" value="1"/>
> >           <nvpair id="status-1084752129-fail-count-rsc_Stateful_1"
> > name="fail-count-rsc_Stateful_1" value="4"/>
> > ha1:~ #
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> --
> // Kristoffer Grönlund
> // kgronlund at suse.com