[ClusterLabs] Antw: stonith continues to reboot server once fencing occurs

Fri May 11 13:02:21 EDT 2018

I have attached the /var/log/cluster/corosync.log here.

The fenced node continues to be rebooted even after the stonith timeout.  The only way I have of stopping the reboot cycle is to completely stop the cluster on the remaining node.

Stonith should be able to detect that the fenced node was successfully rebooted and stop trying to fence it.  I have done this using both the cycle method and the onoff method, both methods have the same result.

Chuck Dickerson
Jacobs
JSC - EG3
(281) 244-5895

-----Original Message-----
From: Users [mailto:users-bounces at clusterlabs.org] On Behalf Of Ulrich Windl
Sent: Friday, May 11, 2018 8:47 AM
To: users at clusterlabs.org
Subject: [ClusterLabs] Antw: stonith continues to reboot server once fencing occurs

Hi!

Could it be that the node reboots faster than the stonith timeout? So the node will unexpectedly come up...

Without logs it's hard to say.

Regards,
Ulrich

>>> "Dickerson, Charles Chuck (JSC-EG)[Jacobs Technology, Inc.]"
<charles.e.dickerson at nasa.gov> schrieb am 11.05.2018 um 15:32 in Nachricht
<0C5150D42E2B3F43B83EC3F62B3B8EE421D5F259 at NDJSMBX201.ndc.nasa.gov>:
> I have a 2 node cluster, once fencing occurs, the fenced node is 
> continually

> rebooted every time it comes up.
> 
> Configuration:  2 identical nodes ‑ Centos 7.4, pacemaker 1.1.18, pcs 
> 0.9.162, fencing configured using fence_ipmilan The cluster is set to 
> ignore quorum and stonith is enabled.  Firewalld has been disabled.
> 
> I can manually issue the fence_ipmilan command and the specified node 
> is rebooted, comes back up and fence_ipmilan sees this and reports success.
> 
> If fencing is initiated via the "pcs stonith fence" command, 
> stonith_admin command, or by disrupting the communication between the 
> nodes, the proper node is rebooted, but the stonith_admin command 
> times out and never sees the

> node as rebooted.  The node is then rebooted every time it comes back 
> up on

> the network.  The status remains UNCLEAN in pcs status.
> 
> Chuck Dickerson
> Jacobs
> JSC ‑ EG3
> (281) 244‑5895

_______________________________________________
Users mailing list: Users at clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: corosync.txt
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180511/7282f29d/attachment-0002.txt>