[ClusterLabs] Antw: Re: Antw: Re: Constant stop/start of resource in spite of interval=0
Kadlecsik József
kadlecsik.jozsef at wigner.mta.hu
Tue May 21 07:17:26 EDT 2019
Hi,
On Tue, 21 May 2019, Ulrich Windl wrote:
> So maybe the original defective RA would be valuable for debugging the
> issue. I guess the RA was invalid in some way that wasn't detected or
> handled properly...
With the attached skeleton RA and the setting
primitive testid-testid0 ocf:local:testid \
params name=testid0 id=0 foo=foo0 \
op monitor timeout=30s interval=30s \
meta target-role=Started
I can reproduce it easily. Maybe it's required that the RA and the
instance be reloadable.
Best regards,
Jozsef
> >>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 21.05.2019 um 09:13 in
> Nachricht <bd253405-e98c-251e-e908-1431d6d65bde at gmail.com>:
> > 21.05.2019 0:46, Ken Gaillot пишет:
> >>>
> >>>> From what's described here, the op-restart-digest is changing every
> >>>> time, which means something is going wrong in the hash comparison
> >>>> (since the definition is not really changing).
> >>>>
> >>>> The log that stands out to me is:
> >>>>
> >>>> trace May 18 23:02:49 calculate_xml_digest_v1(83):0:
> >>>> digest:source <parameters id="0"/>
> >>>>
> >>>> The id is the resource name, which isn't "0". That leads me to:
> >>>>
> >>>> trace May 18 23:02:49 svc_read_output(87):0: Got 499 chars:
> >>>> <parameter name="id" unique="1" required="1">
> >>>>
> >>>> which is the likely source of the problem. "id" is a pacemaker
> >>>> property,
> >>>> not an OCF resource parameter. It shouldn't be in the resource
> >>>> agent
> >>>> meta-data. Remove that, and I bet it will be OK.
> >>>
> >>> I renamed the parameter to "tunnel_id", redefined the resources and
> >>> started them again.
> >>>
> >>>> BTW the "every 15 minutes" would be the cluster-recheck-interval
> >>>> cluster property.
> >>>
> >>> I have waited more than half an hour and there are no more
> >>> stopping/starting of the resources. :-) I haven't thought that "id"
> >>> is
> >>> reserved as parameter name.
> >>
> >> It isn't, by the OCF standard. :) This could be considered a pacemaker
> >> bug; pacemaker should be able to distinguish its own "id" from an OCF
> >> parameter "id", but it currently can't.
> >>
> >
> >
> > I'm really baffled by this explanation. I tried to create resource with
> > "id" unique instance property and I do not observe this problem. No
> > restarts.
> >
> > As none of traces provided captures of the moment of restart-digest
> > mismatch I also am not sure where to look. I do not see "id" being
> > treated anyway specially in the code.
> >
> > Somewhat interesting is that restart digest source in two traces is
> > different:
> >
> > bor at bor-Latitude-E5450:~$ grep -w 'restart digest' /tmp/trace.log*
> > /tmp/trace.log:trace May 18 23:02:49 append_restart_list(694):0:
> > restart digest source <parameters id="0"/>
> > /tmp/trace.log:trace May 18 23:02:50 append_restart_list(694):0:
> > restart digest source <parameters id="1"/>
> > /tmp/trace.log.2:trace May 20 13:56:16 append_restart_list(694):0:
> > restart digest source <parameters name="eduroam IPv4 tunnel" id="0"/>
> > /tmp/trace.log.2:trace May 20 13:56:17 append_restart_list(694):0:
> > restart digest source <parameters name="eduroam IPv4 tunnel" id="0"/>
> > /tmp/trace.log.2:trace May 20 13:56:18 append_restart_list(694):0:
> > restart digest source <parameters name="Wigner guest IPv4 tunnel"
> id="1"/>
> > bor at bor-Latitude-E5450:~$
> >
> > In one case it does not include "name" parameter. Whether configuration
> > was changed in between is unknown, we never have seen full RA metadata
> > in each case nor full resource definition so ...
> >
> > My hunch is that "id" is red herring and something else has changed when
> > resource definition was edited. If I'm wrong I appreciate pointer to
> > code where "id" is mishandled.
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
E-mail : kadlecsik.jozsef at wigner.mta.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: Wigner Research Centre for Physics, Hungarian Academy of Sciences
H-1525 Budapest 114, POB. 49, Hungary
-------------- next part --------------
#!/bin/sh
#
# Tunnel OCF RA. Enables and disables testids, with iptables rules
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like. Any license provided herein, whether implied or
# otherwise, applies only to this software file. Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
#
#######################################################################
# Initialization:
: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
# Defaults
OCF_RESKEY_name_default="testid"
OCF_RESKEY_id_default="0"
OCF_RESKEY_foo_default=""
OCF_RESKEY_bar_default=""
: ${OCF_RESKEY_name=${OCF_RESKEY_name_default}}
: ${OCF_RESKEY_id=${OCF_RESKEY_id_default}}
: ${OCF_RESKEY_foo=${OCF_RESKEY_foo_default}}
: ${OCF_RESKEY_bar=${OCF_RESKEY_bar_default}}
#######################################################################
meta_data() {
cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="testid">
<version>1.0</version>
<longdesc lang="en">
Test id
</longdesc>
<shortdesc lang="en">Test id</shortdesc>
<parameters>
<parameter name="name" unique="1" required="1">
<longdesc lang="en">
Filename for testid.
</longdesc>
<shortdesc lang="en">Filename for testid</shortdesc>
<content type="string" default="$OCF_RESKEY_name_default"/>
</parameter>
<parameter name="id" unique="1" required="1">
<longdesc lang="en">
Unique identifier number of the testid.
</longdesc>
<shortdesc lang="en">Unique identifier number of the testid.</shortdesc>
<content type="string" default="$OCF_RESKEY_id_default"/>
</parameter>
<parameter name="foo" unique="0" required="1">
<longdesc lang="en">
Not unique, required parameter
</longdesc>
<shortdesc lang="en">Not unique, required parameter</shortdesc>
<content type="string" default="$OCF_RESKEY_foo_default" />
</parameter>
<parameter name="bar" unique="0" required="0">
<longdesc lang="en">
Optional parameter
</longdesc>
<shortdesc lang="en">Optional parameter</shortdesc>
<content type="string" default="$OCF_RESKEY_bar_default" />
</parameter>
</parameters>
<actions>
<action name="start" timeout="20" />
<action name="stop" timeout="20" />
<action name="monitor" timeout="20" interval="10"
depth="0"/>
<action name="reload" timeout="20" />
<action name="meta-data" timeout="5" />
<action name="validate-all" timeout="20" />
</actions>
</resource-agent>
END
}
#######################################################################
testid_usage() {
cat <<END
usage: $0 {start|stop|status|monitor|validate-all|meta-data}
Expects to have a fully populated OCF RA-compliant environment set.
END
}
testid_start() {
touch /tmp/${OCF_RESKEY_name}.running
return $OCF_SUCCESS
}
testid_stop() {
rm -f /tmp/${OCF_RESKEY_name}.running
return $OCF_SUCCESS
}
testid_status() {
if [ -f /tmp/${OCF_RESKEY_name}.running ]; then
return $OCF_SUCCESS
else
return $OCF_NOT_RUNNING
fi
}
testid_validate() {
return $OCF_SUCCESS
}
# These two actions must always succeed
case $__OCF_ACTION in
meta-data) meta_data
# OCF variables are not set when querying meta-data
exit 0
;;
usage|help) testid_usage
exit $OCF_SUCCESS
;;
esac
testid_validate || exit $?
case $__OCF_ACTION in
start) testid_start;;
stop) testid_stop;;
status|monitor) testid_status;;
reload) ocf_log info "Reloading..."
testid_stop
testid_start
;;
validate-all) ;;
*) testid_usage
exit $OCF_ERR_UNIMPLEMENTED
;;
esac
rc=$?
ocf_log debug "${OCF_RESOURCE_INSTANCE} $__OCF_ACTION returned $rc"
exit $rc
More information about the Users
mailing list