[Pacemaker] stonith reboot behavior

Fri Jun 19 20:40:10 UTC 2009

A "reboot" should never fail. That is, it should always guarantee that the system actually went down entirely. It does not need to guarantee that it comes back up automatically. If it gets stuck in the boot-up process, you can just manually intervene and fix that whenever it's possible and when it eventually comes back up, everything should be golden.

Now, if you for some reason cannot reach your remote reboot device to force the reboot, or if that device fails to reboot it, the node issuing the reboot should alert the rest of the cluster it could not reboot the device and one of the other nodes in the cluster should make the attempt. If everything fails and that node stays running, then you could indeed end up with very bad things happening. However, this should not be any kind of common occurrence, and if you hardware is broken so that things don't work and you don't replace the hardware, that's not something an HA system can account for.

-----Original Message-----
From: Dan Urist [mailto:durist at ucar.edu]
Sent: Friday, June 19, 2009 4:15 PM
To: pacemaker at oss.clusterlabs.org
Subject: [Pacemaker] stonith reboot behavior

My apologies if this is documented somewhere-- I've looked and haven't
found it.

What happens if a stonith reboot fails? Does it retry, and if so how
many times and with what timeout and is that configureable?

I have some hardware that has a buggy raid card that occasionally can't
find its boot disk, but works fine after a reset.
--
Dan Urist
durist at ucar.edu
303-497-2459 (office)
303-961-2675 (cell)

_______________________________________________
Pacemaker mailing list
Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

CONFIDENTIAL.  This e-mail and any attached files are confidential and should be destroyed and/or returned if you are not the intended and proper recipient.