[Pacemaker] PE ignores monitor failure of stonith:external/rackpdu

Pavlos Parissis pavlos.parissis at gmail.com
Tue Nov 2 07:13:43 EDT 2010


On 2 November 2010 11:22, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:

> Hi,
>
> On Fri, Oct 29, 2010 at 08:37:04AM +0200, Pavlos Parissis wrote:
> > Hi,
> >
> > I wanted to check what happens when the monitor of a fencing agents
> > fails, thus I disconnected the PDU from network, reduced the monitor
> > interval and put debug statements on the fencing script.
> >
> > here is the debug statements on the status code
> > status)
> >         if [ -z "$pduip" ]; then
> >             exit 1
> >         fi
> >         date >> /tmp/pdu.monitor
> >         if ping -w1 -c1 $pduip >/dev/null 2>&1; then
> >             exit 0
> >         else
> >             echo "failed" >> /tmp/pdu.monitor
> >             exit 1
> >         fi
> >         ;;
> >
> >
> > here is the debug output which states that monitor failed
> > [root at node-03 tmp]# cat pdu.monitor
> > Fri Oct 29 08:29:20 CEST 2010
> > Fri Oct 29 08:31:05 CEST 2010
> > failed
> > Fri Oct 29 08:32:50 CEST 2010
> > failed
> >
> > but pacemaker thinks is fine
> > [root at node-03 tmp]# crm status|grep pdu
> >  pdu    (stonith:external/rackpdu):     Started node-03
> > [root at node-03 tmp]#
> >
> >
> > and here is the resource
> > primitive pdu stonith:external/rackpdu \
> >         params community="empisteftiko"
> > names_oid=".1.3.6.1.4.1.318.1.1.4.4.2.1.4"
> > oid=".1.3.6.1.4.1.318.1.1.4.4.2.1.3" hostlist="AUTO"
> > pduip="192.168.100.100" stonith-timeout="30" \
> >         op monitor interval="1m" timeout="60s"
> >
> > Is it the expected behaviour?
>
> Definitely not. If you do the monitor action from the command
> line does that also return the unexpected exit code:
>

from the code I pasted you can see it returned 1.

>
> # stonith -t external/rackpdu community="empisteftiko"
> names_oid=".1.3.6.1.4.1.318.1.1.4.4.2.1.4" ... -lS
>
> Which pacemaker release do you run? I couldn't reproduce this
> with a recent Pacemaker.
>

that it was on 1.1.3 and now I run 1.0.9.
Do you want me to run the test on 1.0.9?


>
> Thanks,
>
> Dejan
>
> > Cheers,
> > Pavlos
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20101102/9d6d8d9c/attachment-0001.html>


More information about the Pacemaker mailing list