[ClusterLabs] Pacemaker shows false status of a resource and doesn't react on OCF_NOT_RUNNING rc.

Dejan Muhamedagic dejanmm at fastmail.fm
Wed Jan 20 03:12:05 EST 2016


Hi,

On Tue, Jan 19, 2016 at 07:02:52PM +0200, Kostiantyn Ponomarenko wrote:
> Just in case, this is the monitor function from the resource agent:
> ra_monitor() {
> #   ocf_log info "$RA: [monitor]"
>     systemctl status ${service}

"status" always exits with 0. You want "is-active".

Thanks,

Dejan

>     rc=$?
>     if [ "$rc" -eq "0" ]; then
>         return $OCF_SUCCESS
>     fi
> 
>     ocf_log warn "$RA: [monitor] : got rc=$rc"
>     return $OCF_NOT_RUNNING
> }
> 
> Thank you,
> Kostia
> 
> On Tue, Jan 19, 2016 at 6:30 PM, Kostiantyn Ponomarenko <
> konstantin.ponomarenko at gmail.com> wrote:
> 
> > The resource that wasn't running, but was reported as running, is
> > "adminServer".
> >
> > Here are a brief chronological description:
> >
> > [Jan 19 23:42:16] The first time Pacemaker triggers its monitor function
> > at line #1107. (those lines are from its Resource Agent)
> > [Jan 19 23:42:16] Then Pacemaker starts the resource - line #1191.
> > [Jan 19 11:42:53] The first failure is reported by monitor operation at
> > line #1543.
> > [Jan 19 11:42:53] The fail-count is set, but I don't see any attempt from
> > Pacemaker to "start" the resource - the start function is not called (from
> > the logs) - line #1553.
> > [Jan 19 12:27:56] Then adminServer's monitor operation keeps returning
> > $OCF_NOT_RUNNING - starts at line #1860.
> > [Jan 19 12:57:53] Then the expired failcount is cleared at line #1969.
> > [Jan 19 12:57:53] Another call of the monitor function happens at line
> > #2038.
> > [Jan 19 12:57:53] I assume that the line #2046 means "not running" (?).
> > [Jan 19 12:57:53] The "stop" function is called - line #2150
> > [Jan 19 12:57:53] The "start" function is called and the resource is
> > successfully started - line #2164
> >
> >
> > The time change occurred while cluster was starting, I see this from
> > "journalctl --since="2016-01-19" --until="2016-01-20"":
> >
> > Jan 19 23:10:39 A2-2U12-302-LS ntpd[2210]: 0.0.0.0 c61c 0c clock_step
> > -43193.793349 s
> > Jan 19 11:10:45 A2-2U12-302-LS ntpd[2210]: 0.0.0.0 c614 04 freq_mode
> > Jan 19 11:10:45 A2-2U12-302-LS systemd[1]: Time has been changed
> >
> > I am attaching corosync.log.
> >
> > Thank you,
> > Kostia
> >
> > On Tue, Jan 19, 2016 at 5:17 PM, Bogdan Dobrelya <bdobrelia at mirantis.com>
> > wrote:
> >
> >> On 19.01.2016 16:13, Ken Gaillot wrote:
> >> > On 01/19/2016 06:49 AM, Kostiantyn Ponomarenko wrote:
> >> >> One of resources in my cluster is not actually running, but "crm_mon"
> >> shows
> >> >> it with the "Started" status.
> >> >> Its resource agent's monitor function returns "$OCF_NOT_RUNNING", but
> >> >> Pacemaker doesn't react on this anyhow - crm_mon show the resource as
> >> >> Started.
> >> >> I couldn't find an explanation to this behavior, so I suppose it is a
> >> bug,
> >> >> is it?
> >> >
> >> > That is unexpected. Can you post the configuration and logs from around
> >> > the time of the issue?
> >> >
> >>
> >> Oh, sorry, I forgot to mention the related thread [0]. That is exactly
> >> the case I reported there. Looks same, so I thought you've just updated
> >> my thread :)
> >>
> >> These may be merged perhaps.
> >>
> >> [0] http://clusterlabs.org/pipermail/users/2016-January/002035.html
> >>
> >> >
> >> > _______________________________________________
> >> > Users mailing list: Users at clusterlabs.org
> >> > http://clusterlabs.org/mailman/listinfo/users
> >> >
> >> > Project Home: http://www.clusterlabs.org
> >> > Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> > Bugs: http://bugs.clusterlabs.org
> >> >
> >>
> >>
> >> --
> >> Best regards,
> >> Bogdan Dobrelya,
> >> Irc #bogdando
> >>
> >> _______________________________________________
> >> Users mailing list: Users at clusterlabs.org
> >> http://clusterlabs.org/mailman/listinfo/users
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> >
> >

> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Users mailing list