[ClusterLabs] Antw: Re: Antw: stonith continues to reboot server once fencing occurs

Mon May 14 06:23:16 UTC 2018

Hi!

I'm wondering about this:
vmhost1-fsl.bcn is shutting down

That doesn't read like a STONITH, but like a regular shutdown (which may
hang).

The other thing that reads strange for a two-node cluster is this:
[11130] vmhost0-fsl.jsc.nasa.gov corosyncnotice  [TOTEM ] A new membership
(192.168.1.140:184) was formed. Members left: 2

This sounds odd, too:
[11130] vmhost0-fsl.jsc.nasa.gov corosyncwarning [MAIN  ] Totem is unable to
form a cluster because of an operating system or network fault. The most common
cause of this message is that the local firewall is configured improperly.

Regards,
Ulrich

>>> "Dickerson, Charles Chuck (JSC-EG)[Jacobs Technology, Inc.]"
<charles.e.dickerson at nasa.gov> schrieb am 11.05.2018 um 19:02 in Nachricht
<0C5150D42E2B3F43B83EC3F62B3B8EE421D5F575 at NDJSMBX201.ndc.nasa.gov>:
> I have attached the /var/log/cluster/corosync.log here.
> 
> The fenced node continues to be rebooted even after the stonith timeout.  
> The only way I have of stopping the reboot cycle is to completely stop the 
> cluster on the remaining node.
> 
> Stonith should be able to detect that the fenced node was successfully 
> rebooted and stop trying to fence it.  I have done this using both the cycle

> method and the onoff method, both methods have the same result.
> 
> Chuck Dickerson
> Jacobs
> JSC - EG3
> (281) 244-5895
> 
> -----Original Message-----
> From: Users [mailto:users-bounces at clusterlabs.org] On Behalf Of Ulrich
Windl
> Sent: Friday, May 11, 2018 8:47 AM
> To: users at clusterlabs.org 
> Subject: [ClusterLabs] Antw: stonith continues to reboot server once fencing

> occurs
> 
> Hi!
> 
> Could it be that the node reboots faster than the stonith timeout? So the 
> node will unexpectedly come up...
> 
> Without logs it's hard to say.
> 
> Regards,
> Ulrich
> 
>>>> "Dickerson, Charles Chuck (JSC-EG)[Jacobs Technology, Inc.]"
> <charles.e.dickerson at nasa.gov> schrieb am 11.05.2018 um 15:32 in Nachricht
> <0C5150D42E2B3F43B83EC3F62B3B8EE421D5F259 at NDJSMBX201.ndc.nasa.gov>:
>> I have a 2 node cluster, once fencing occurs, the fenced node is 
>> continually
> 
>> rebooted every time it comes up.
>> 
>> Configuration:  2 identical nodes ‑ Centos 7.4, pacemaker 1.1.18, pcs 
>> 0.9.162, fencing configured using fence_ipmilan The cluster is set to 
>> ignore quorum and stonith is enabled.  Firewalld has been disabled.
>> 
>> I can manually issue the fence_ipmilan command and the specified node 
>> is rebooted, comes back up and fence_ipmilan sees this and reports
success.
>> 
>> If fencing is initiated via the "pcs stonith fence" command, 
>> stonith_admin command, or by disrupting the communication between the 
>> nodes, the proper node is rebooted, but the stonith_admin command 
>> times out and never sees the
> 
>> node as rebooted.  The node is then rebooted every time it comes back 
>> up on
> 
>> the network.  The status remains UNCLEAN in pcs status.
>> 
>> Chuck Dickerson
>> Jacobs
>> JSC ‑ EG3
>> (281) 244‑5895
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org