[ClusterLabs] Antw: Re: Antw: stonith continues to reboot server once fencing occurs
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Mon May 14 02:23:16 EDT 2018
Hi!
I'm wondering about this:
vmhost1-fsl.bcn is shutting down
That doesn't read like a STONITH, but like a regular shutdown (which may
hang).
The other thing that reads strange for a two-node cluster is this:
[11130] vmhost0-fsl.jsc.nasa.gov corosyncnotice [TOTEM ] A new membership
(192.168.1.140:184) was formed. Members left: 2
This sounds odd, too:
[11130] vmhost0-fsl.jsc.nasa.gov corosyncwarning [MAIN ] Totem is unable to
form a cluster because of an operating system or network fault. The most common
cause of this message is that the local firewall is configured improperly.
Regards,
Ulrich
>>> "Dickerson, Charles Chuck (JSC-EG)[Jacobs Technology, Inc.]"
<charles.e.dickerson at nasa.gov> schrieb am 11.05.2018 um 19:02 in Nachricht
<0C5150D42E2B3F43B83EC3F62B3B8EE421D5F575 at NDJSMBX201.ndc.nasa.gov>:
> I have attached the /var/log/cluster/corosync.log here.
>
> The fenced node continues to be rebooted even after the stonith timeout.
> The only way I have of stopping the reboot cycle is to completely stop the
> cluster on the remaining node.
>
> Stonith should be able to detect that the fenced node was successfully
> rebooted and stop trying to fence it. I have done this using both the cycle
> method and the onoff method, both methods have the same result.
>
> Chuck Dickerson
> Jacobs
> JSC - EG3
> (281) 244-5895
>
> -----Original Message-----
> From: Users [mailto:users-bounces at clusterlabs.org] On Behalf Of Ulrich
Windl
> Sent: Friday, May 11, 2018 8:47 AM
> To: users at clusterlabs.org
> Subject: [ClusterLabs] Antw: stonith continues to reboot server once fencing
> occurs
>
> Hi!
>
> Could it be that the node reboots faster than the stonith timeout? So the
> node will unexpectedly come up...
>
> Without logs it's hard to say.
>
> Regards,
> Ulrich
>
>>>> "Dickerson, Charles Chuck (JSC-EG)[Jacobs Technology, Inc.]"
> <charles.e.dickerson at nasa.gov> schrieb am 11.05.2018 um 15:32 in Nachricht
> <0C5150D42E2B3F43B83EC3F62B3B8EE421D5F259 at NDJSMBX201.ndc.nasa.gov>:
>> I have a 2 node cluster, once fencing occurs, the fenced node is
>> continually
>
>> rebooted every time it comes up.
>>
>> Configuration: 2 identical nodes ‑ Centos 7.4, pacemaker 1.1.18, pcs
>> 0.9.162, fencing configured using fence_ipmilan The cluster is set to
>> ignore quorum and stonith is enabled. Firewalld has been disabled.
>>
>> I can manually issue the fence_ipmilan command and the specified node
>> is rebooted, comes back up and fence_ipmilan sees this and reports
success.
>>
>> If fencing is initiated via the "pcs stonith fence" command,
>> stonith_admin command, or by disrupting the communication between the
>> nodes, the proper node is rebooted, but the stonith_admin command
>> times out and never sees the
>
>> node as rebooted. The node is then rebooted every time it comes back
>> up on
>
>> the network. The status remains UNCLEAN in pcs status.
>>
>> Chuck Dickerson
>> Jacobs
>> JSC ‑ EG3
>> (281) 244‑5895
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list