[ClusterLabs] Antw: Re: Antw: Re: Constant stop/start of resource in spite of interval=0

Kadlecsik József kadlecsik.jozsef at wigner.mta.hu
Tue May 21 07:17:26 EDT 2019


Hi,

On Tue, 21 May 2019, Ulrich Windl wrote:

> So maybe the original defective RA would be valuable for debugging the 
> issue. I guess the RA was invalid in some way that wasn't detected or 
> handled properly...

With the attached skeleton RA and the setting

primitive testid-testid0 ocf:local:testid \
        params name=testid0 id=0 foo=foo0 \
        op monitor timeout=30s interval=30s \
        meta target-role=Started

I can reproduce it easily. Maybe it's required that the RA and the 
instance be reloadable.

Best regards,
Jozsef

> >>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 21.05.2019 um 09:13 in
> Nachricht <bd253405-e98c-251e-e908-1431d6d65bde at gmail.com>:
> > 21.05.2019 0:46, Ken Gaillot пишет:
> >>>  
> >>>> From what's described here, the op-restart-digest is changing every
> >>>> time, which means something is going wrong in the hash comparison
> >>>> (since the definition is not really changing).
> >>>>
> >>>> The log that stands out to me is:
> >>>>
> >>>> trace   May 18 23:02:49 calculate_xml_digest_v1(83):0:
> >>>> digest:source   <parameters id="0"/>
> >>>>
> >>>> The id is the resource name, which isn't "0". That leads me to:
> >>>>
> >>>> trace   May 18 23:02:49 svc_read_output(87):0: Got 499 chars:
> >>>> <parameter name="id" unique="1" required="1">
> >>>>
> >>>> which is the likely source of the problem. "id" is a pacemaker
> >>>> property, 
> >>>> not an OCF resource parameter. It shouldn't be in the resource
> >>>> agent 
> >>>> meta-data. Remove that, and I bet it will be OK.
> >>>
> >>> I renamed the parameter to "tunnel_id", redefined the resources and 
> >>> started them again.
> >>>  
> >>>> BTW the "every 15 minutes" would be the cluster-recheck-interval
> >>>> cluster property.
> >>>
> >>> I have waited more than half an hour and there are no more 
> >>> stopping/starting of the resources. :-) I haven't thought that "id"
> >>> is 
> >>> reserved as parameter name.
> >> 
> >> It isn't, by the OCF standard. :) This could be considered a pacemaker
> >> bug; pacemaker should be able to distinguish its own "id" from an OCF
> >> parameter "id", but it currently can't.
> >> 
> > 
> > 
> > I'm really baffled by this explanation. I tried to create resource with
> > "id" unique instance property and I do not observe this problem. No
> > restarts.
> > 
> > As none of traces provided captures of the moment of restart-digest
> > mismatch I also am not sure where to look. I do not see "id" being
> > treated anyway specially in the code.
> > 
> > Somewhat interesting is that restart digest source in two traces is
> > different:
> > 
> > bor at bor-Latitude-E5450:~$ grep -w 'restart digest' /tmp/trace.log*
> > /tmp/trace.log:trace   May 18 23:02:49 append_restart_list(694):0:
> > restart digest source   <parameters id="0"/>
> > /tmp/trace.log:trace   May 18 23:02:50 append_restart_list(694):0:
> > restart digest source   <parameters id="1"/>
> > /tmp/trace.log.2:trace   May 20 13:56:16 append_restart_list(694):0:
> > restart digest source   <parameters name="eduroam IPv4 tunnel" id="0"/>
> > /tmp/trace.log.2:trace   May 20 13:56:17 append_restart_list(694):0:
> > restart digest source   <parameters name="eduroam IPv4 tunnel" id="0"/>
> > /tmp/trace.log.2:trace   May 20 13:56:18 append_restart_list(694):0:
> > restart digest source   <parameters name="Wigner guest IPv4 tunnel"
> id="1"/>
> > bor at bor-Latitude-E5450:~$
> > 
> > In one case it does not include "name" parameter. Whether configuration
> > was changed in between is unknown, we never have seen full RA metadata
> > in each case nor full resource definition so ...
> > 
> > My hunch is that "id" is red herring and something else has changed when
> > resource definition was edited. If I'm wrong I appreciate pointer to
> > code where "id" is mishandled.
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users 
> > 
> > ClusterLabs home: https://www.clusterlabs.org/ 
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

--
E-mail : kadlecsik.jozsef at wigner.mta.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: Wigner Research Centre for Physics, Hungarian Academy of Sciences
         H-1525 Budapest 114, POB. 49, Hungary
-------------- next part --------------
#!/bin/sh
#
#	Tunnel OCF RA. Enables and disables testids, with iptables rules
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like.  Any license provided herein, whether implied or
# otherwise, applies only to this software file.  Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
#

#######################################################################
# Initialization:

: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs

# Defaults
OCF_RESKEY_name_default="testid"
OCF_RESKEY_id_default="0"
OCF_RESKEY_foo_default=""
OCF_RESKEY_bar_default=""

: ${OCF_RESKEY_name=${OCF_RESKEY_name_default}}
: ${OCF_RESKEY_id=${OCF_RESKEY_id_default}}
: ${OCF_RESKEY_foo=${OCF_RESKEY_foo_default}}
: ${OCF_RESKEY_bar=${OCF_RESKEY_bar_default}}

#######################################################################

meta_data() {
	cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="testid">
<version>1.0</version>

<longdesc lang="en">
Test id

</longdesc>
<shortdesc lang="en">Test id</shortdesc>

<parameters>

<parameter name="name" unique="1" required="1">
<longdesc lang="en">
Filename for testid.
</longdesc>
<shortdesc lang="en">Filename for testid</shortdesc>
<content type="string" default="$OCF_RESKEY_name_default"/>
</parameter>

<parameter name="id" unique="1" required="1">
<longdesc lang="en">
Unique identifier number of the testid.
</longdesc>
<shortdesc lang="en">Unique identifier number of the testid.</shortdesc>
<content type="string" default="$OCF_RESKEY_id_default"/>
</parameter>

<parameter name="foo" unique="0" required="1">
<longdesc lang="en">
Not unique, required parameter
</longdesc>
<shortdesc lang="en">Not unique, required parameter</shortdesc>
<content type="string" default="$OCF_RESKEY_foo_default" />
</parameter>

<parameter name="bar" unique="0" required="0">
<longdesc lang="en">
Optional parameter
</longdesc>
<shortdesc lang="en">Optional parameter</shortdesc>
<content type="string" default="$OCF_RESKEY_bar_default" />
</parameter>

</parameters>

<actions>
<action name="start"        timeout="20" />
<action name="stop"         timeout="20" />
<action name="monitor"      timeout="20" interval="10" 
                            depth="0"/>
<action name="reload"       timeout="20" />
<action name="meta-data"    timeout="5" />
<action name="validate-all" timeout="20" />
</actions>
</resource-agent>
END
}

#######################################################################

testid_usage() {
	cat <<END
usage: $0 {start|stop|status|monitor|validate-all|meta-data}

Expects to have a fully populated OCF RA-compliant environment set.
END
}

testid_start() {
    touch /tmp/${OCF_RESKEY_name}.running
    return $OCF_SUCCESS
}

testid_stop() {
    rm -f /tmp/${OCF_RESKEY_name}.running
    return $OCF_SUCCESS
}

testid_status() {
    if [ -f /tmp/${OCF_RESKEY_name}.running ]; then
    	return $OCF_SUCCESS
    else
	return $OCF_NOT_RUNNING
    fi
}

testid_validate() {
    return $OCF_SUCCESS
}

# These two actions must always succeed
case $__OCF_ACTION in
meta-data)	meta_data
		# OCF variables are not set when querying meta-data
		exit 0
		;;
usage|help)	testid_usage
		exit $OCF_SUCCESS
		;;
esac

testid_validate || exit $?

case $__OCF_ACTION in
start)		testid_start;;
stop)		testid_stop;;
status|monitor)	testid_status;;
reload)		ocf_log info "Reloading..."
	        testid_stop
	        testid_start
		;;
validate-all)	;;
*)		testid_usage
		exit $OCF_ERR_UNIMPLEMENTED
		;;
esac
rc=$?
ocf_log debug "${OCF_RESOURCE_INSTANCE} $__OCF_ACTION returned $rc"
exit $rc


More information about the Users mailing list