[Pacemaker] PE ignores monitor failure of stonith:external/rackpdu

Dejan Muhamedagic dejanmm at fastmail.fm
Tue Nov 2 08:02:43 EDT 2010


Hi,

On Tue, Nov 02, 2010 at 12:13:43PM +0100, Pavlos Parissis wrote:
> On 2 November 2010 11:22, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> 
> > Hi,
> >
> > On Fri, Oct 29, 2010 at 08:37:04AM +0200, Pavlos Parissis wrote:
> > > Hi,
> > >
> > > I wanted to check what happens when the monitor of a fencing agents
> > > fails, thus I disconnected the PDU from network, reduced the monitor
> > > interval and put debug statements on the fencing script.
> > >
> > > here is the debug statements on the status code
> > > status)
> > >         if [ -z "$pduip" ]; then
> > >             exit 1
> > >         fi
> > >         date >> /tmp/pdu.monitor
> > >         if ping -w1 -c1 $pduip >/dev/null 2>&1; then
> > >             exit 0
> > >         else
> > >             echo "failed" >> /tmp/pdu.monitor
> > >             exit 1
> > >         fi
> > >         ;;
> > >
> > >
> > > here is the debug output which states that monitor failed
> > > [root at node-03 tmp]# cat pdu.monitor
> > > Fri Oct 29 08:29:20 CEST 2010
> > > Fri Oct 29 08:31:05 CEST 2010
> > > failed
> > > Fri Oct 29 08:32:50 CEST 2010
> > > failed
> > >
> > > but pacemaker thinks is fine
> > > [root at node-03 tmp]# crm status|grep pdu
> > >  pdu    (stonith:external/rackpdu):     Started node-03
> > > [root at node-03 tmp]#
> > >
> > >
> > > and here is the resource
> > > primitive pdu stonith:external/rackpdu \
> > >         params community="empisteftiko"
> > > names_oid=".1.3.6.1.4.1.318.1.1.4.4.2.1.4"
> > > oid=".1.3.6.1.4.1.318.1.1.4.4.2.1.3" hostlist="AUTO"
> > > pduip="192.168.100.100" stonith-timeout="30" \
> > >         op monitor interval="1m" timeout="60s"
> > >
> > > Is it the expected behaviour?
> >
> > Definitely not. If you do the monitor action from the command
> > line does that also return the unexpected exit code:
> >
> 
> from the code I pasted you can see it returned 1.

There is a difference. stonith-ng (stonithd) is a daemon that
runs a perl script (fencing_legacy) which invokes stonith which
then invokes the plugin. A problem can occur in any of these
components. It's important to find out where.

> > # stonith -t external/rackpdu community="empisteftiko"
> > names_oid=".1.3.6.1.4.1.318.1.1.4.4.2.1.4" ... -lS
> >
> > Which pacemaker release do you run? I couldn't reproduce this
> > with a recent Pacemaker.
> >
> 
> that it was on 1.1.3 and now I run 1.0.9.
> Do you want me to run the test on 1.0.9?

Yes, please. 1.0.9 is still running the old, and well tested,
stonithd, so the result could be different.

Thanks,

Dejan

> > Thanks,
> >
> > Dejan
> >
> > > Cheers,
> > > Pavlos
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs:
> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >

> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





More information about the Pacemaker mailing list