[ClusterLabs] Three node cluster becomes completely fenced if one node leaves

Fri Mar 31 02:32:48 EDT 2017

> The original message has the logs from nodes 1 and 3. Node 2, the one that
> got fenced in this test, doesn't really show much. Here are the logs from
> it:
>
> Mar 24 16:35:10 b014 ntpd[2318]: Deleting interface #5 enp6s0f0,
> 192.168.100.14#123, interface stats: received=0, sent=0, dropped=0,
> active_time=3253 secs
> Mar 24 16:35:10 b014 ntpd[2318]: Deleting interface #7 enp6s0f0,
> fe80::a236:9fff:fe8a:6500%6#123, interface stats: received=0, sent=0,
> dropped=0, active_time=3253 secs
> Mar 24 16:35:13 b014 corosync[2166]: notice  [TOTEM ] A processor failed,
> forming new configuration.
> Mar 24 16:35:13 b014 corosync[2166]:  [TOTEM ] A processor failed, forming
> new configuration.
> Mar 24 16:35:13 b014 corosync[2166]: notice  [TOTEM ] The network interface
> is down.

This is problem. Corosync handles ifdown really badly. If this was not 
intentional it may be caused by NetworkManager. Then please install 
equivalent of NetworkManager-config-server package (it's actually one 
file called 00-server.conf so you can extract it from, for example, 
Fedora package 
https://www.rpmfind.net/linux/RPM/fedora/devel/rawhide/x86_64/n/NetworkManager-config-server-1.8.0-0.1.fc27.noarch.html)

Regards,
   Honza

> Mar 24 16:35:13 b014 corosync[2166]: notice  [TOTEM ] adding new UDPU
> member {192.168.100.13}
> Mar 24 16:35:13 b014 corosync[2166]: notice  [TOTEM ] adding new UDPU
> member {192.168.100.14}
> Mar 24 16:35:13 b014 corosync[2166]: notice  [TOTEM ] adding new UDPU
> member {192.168.100.15}
> Mar 24 16:35:13 b014 corosync[2166]:  [TOTEM ] The network interface is
> down.
> Mar 24 16:35:13 b014 corosync[2166]:  [TOTEM ] adding new UDPU member
> {192.168.100.13}
> Mar 24 16:35:13 b014 corosync[2166]:  [TOTEM ] adding new UDPU member
> {192.168.100.14}
> Mar 24 16:35:13 b014 corosync[2166]:  [TOTEM ] adding new UDPU member
> {192.168.100.15}
>
> -------
> Seth Reid
>
>
>
> On Wed, Mar 29, 2017 at 7:17 AM, Bob Peterson <rpeterso at redhat.com> wrote:
>
>> ----- Original Message -----
>> | I will try to install updated packages from ubuntu 16.10 or newer. It
>> can't
>> | get worse than not working.
>> |
>> | Can you think of any logs that might help? I've enabled debug on corosync
>> | log, but it really doesn't show anything else other than corosync
>> exiting.
>> | Any diagnostic tools you can recommend?
>> |
>> | -------
>> | Seth Reid
>>
>>
>> Hi Seth,
>>
>> Can you post the pertinent messages from the consoles of all nodes in the
>> cluster? Hopefully you were monitoring them.
>>
>> Regards,
>>
>> Bob Peterson
>> Red Hat File Systems
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>