[ClusterLabs] Warning: handle_startup_fencing: Blind faith: not fencing unseen nodes

Thu Dec 15 12:05:32 EST 2016

Thanks for detailed explanation. This was very helpful.

I had removed option

startup-fencing: false

now the warning message was gone.

On 14/12/16 20:12, Ken Gaillot wrote:
> On 12/14/2016 11:14 AM, Denis Gribkov wrote:
>> Hi Everyone,
>>
>> Our company have 15-nodes asynchronous cluster without actually
>> configured FENCING/STONITH (as I think) features.
>>
>> The DC node log getting tons of messages like in subject:
>>
>> pengine:  warning: handle_startup_fencing:  Blind faith: not fencing
>> unseen nodes
> This is logged because you have set "startup-fencing: false".
>
> It's logged as a warning because that setting is potentially dangerous:
> a node that hasn't been seen by the cluster is not necessarily down --
> it could be up and accessing shared resources, but unable to communicate
> with the cluster. The only safe action is for the cluster to fence the node.
>
> As of Pacemaker 1.1.16, the message will be logged only once. Before
> Pacemaker 1.1.16, it is logged once per node, every time Pacemaker
> checks the cluster state.
>
> Of course, having "stonith-enabled: false" is also dangerous, because
> fencing is the only way to recover from certain error conditions.
>
>> the message repeated 15 times before next 15 messages:
>>
>> pengine:     info: determine_online_status: Node Node1 is online
>>
>> ...
>>
>> pengine:     info: determine_online_status: Node Node15 is online
>>
>>
>> The issue looks like similar to:
>>
>> http://oss.clusterlabs.org/pipermail/pacemaker/2014-June/021995.html
>>
>> but with own features.
>>
>>
>> Our variables:
>>
>> Oracle Linux Server release 6.8
>>
>> Pacemaker 1.1.14-8.el6
>>
>> Corosync Cluster Engine, version '1.4.7'
>>
>> CMAN 3.0.12.1
>>
>>
>> Cluster Properties:
>>   cluster-infrastructure: cman
>>   cluster-recheck-interval: 1
>>   dc-version: 1.1.14-8.el6-70404b0
>>   expected-quorum-votes: 3
>>   have-watchdog: false
>>   last-lrm-refresh: 1481444797
>>   maintenance-mode: false
>>   no-quorum-policy: ignore
>>   startup-fencing: false
>>   stonith-enabled: false
>>   symmetric-cluster: false
>>
>>
>> Example of /etc/cluster/cluster.conf:
>>
>> <cluster config_version="96" name="cluster">
>>    <fence_daemon/>
>>
>>    <clusternodes>
>>      <clusternode name="Node1" nodeid="1">
>>        <fence>
>>          <method name="pcmk-redirect">
>>            <device name="pcmk" port="Node1"/>
>>          </method>
>>        </fence>
>>        <altname name="Node1.name"/>
>>      </clusternode>
>>
>> <...>
>>
>>      <clusternode name="Node2" nodeid="15">
>>        <fence>
>>          <method name="pcmk-redirect">
>>            <device name="pcmk" port="Node2"/>
>>          </method>
>>        </fence>
>>        <altname name="Node2.name"/>
>>      </clusternode>
>>    </clusternodes>
>>    <cman expected_votes="2" two_node="0"/>
>>    <fencedevices>
>>      <fencedevice name="pcmk" agent="fence_pcmk"/>
>>    </fencedevices>
>>    <rm>
>>      <failoverdomains/>
>>      <resources/>
>>    </rm>
>>    <logging debug="off"/>
>> </cluster>
>>
>> Example of /etc/corosync/corosync.conf:
>>
>> compatibility: whitetank
>>
>> totem {
>>          version: 2
>>          secauth: on
>>          threads: 4
>>          rrp_mode: active
>>
>>          interface {
>>
>>                  member {
>>                          memberaddr: PRIVATE_IP_1
>>                  }
>>
>> ...
>>
>>                  member {
>>                          memberaddr: PRIVATE_IP_15
>>                  }
>>
>>                  ringnumber: 0
>>                  bindnetaddr: PRIVATE_NET_ADDR
>>                  mcastaddr: 226.0.0.1
>>                  mcastport: 5407
>>                  ttl: 1
>>          }
>>
>>         interface {
>>
>>                  member {
>>                          memberaddr: PUBLIC_IP_1
>>                  }
>> ...
>>
>>                  member {
>>                          memberaddr: PUBLIC_IP_15
>>                  }
>>
>>                  ringnumber: 1
>>                  bindnetaddr: PUBLIC_NET_ADDR
>>                  mcastaddr: 224.0.0.1
>>                  mcastport: 5405
>>                  ttl: 1
>>          }
>>
>>          transport: udpu
>> logging {
>>          fileline: off
>>          to_stderr: no
>>          to_logfile: yes
>>          logfile: /var/log/cluster/corosync.log
>>          logfile_priority: warning
>>          to_syslog: no
>>          debug: off
>>          timestamp: on
>>          logger_subsys {
>>             subsys: AMF
>>             debug: off
>>          }
>> }
>>
>>
>> Please let me know if you will need any other information.
>>
>> Thank you.
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org