[Pacemaker] Re: crm_mon shows nothing about stonith 'reset' failure

Takenaka Kazuhiro takenaka.kazuhiro at oss.ntt.co.jp
Tue Sep 16 05:48:03 EDT 2008


Hi, Andrew

 > Nope.
 > This is not stored anywhere since there is nowhere it can be
 > reconstructed from (like the lrmd for resource operations) when
 > rebuilding the status section.

Why does the current cib.xml definition have no room for
stonith 'reset' failures? Simply not implemented? Or is
there any other reason?

 > And if your stonith resources are failing, a) you have bigger
 > problems, and b) you'll get nice big ERROR messages in the logs.

a) I saw 'dummy' didn't fail over. Is this a "bigger problems"?

b) The only way to know stonith 'reset' failures is watching
    the logs. Do I understand right?

> On Tue, Sep 16, 2008 at 03:11, Takenaka Kazuhiro
> <takenaka.kazuhiro at oss.ntt.co.jp> wrote:
>> > Hi All,
>> >
>> > I ran a test to see what would happen when stonith 'reset' failed.
>> > Before the test, I thought 'crm_mon' should show something about the
>> > failure.
> 
> Nope.
> This is not stored anywhere since there is nowhere it can be
> reconstructed from (like the lrmd for resource operations) when
> rebuilding the status section.
> 
> And if your stonith resources are failing, a) you have bigger
> problems, and b) you'll get nice big ERROR messages in the logs.
> 
>> > But 'crm_mon' didn't show anything.
>> >
>> > What I did is the following.
>> >
>> > 1. I started the stonith-enabled two nodes cluster. The names of
>> >   the nodes were 'node01' and 'node02'.  See configuration files
>> >   in attached 'hb_reports.tgz' for more details.
>> >
>> >   I made a few modifications to 'ssh' for the test and renamed it
>> >   to 'sshTEST'. I also attached 'sshTEST'. The diferences are
>> >   written in it.
>> >
>> > 2. I performed the following command.
>> >
>> >   # iptables -A INPUT -i eth3 -p tcp --dport 22 -j REJECT
>> >
>> >   'eth3' is connected to the network for 'sshTEST'.
>> >
>> > 3. I deleted the state file of 'dummy' at 'node01'.
>> >
>> >   # rm -f /var/run/heartbeat/rsctmp/Dummy-dummy.state
>> >
>> > Soon the failure of 'dummy' was logged into /var/log/ha-log
>> > and 'crm_mon' also displayed it.
>> >
>> > After a while the failure of 'reset' performed by 'sshTEST'
>> > also logged, but 'crm_mon' didn't display it.
>> >
>> > Did I make any misconfigurations or any misoperations that
>> > made 'crm_mon' work incorrectly.
>> >
>> > Or 'crm_mon' really don't show anything about stonith 'reset'
>> > failure ?
>> >
>> > I used Heartbeat(e8154a602bf4) + Pacemaker(d4a14f276c28)
>> > for this test.
>> >
>> > Best regard.
>> > --
>> > Takenaka Kazuhiro <takenaka.kazuhiro at oss.ntt.co.jp>


-- 
Takenaka Kazuhiro <takenaka.kazuhiro at oss.ntt.co.jp>




More information about the Pacemaker mailing list