[ClusterLabs] Warning: handle_startup_fencing: Blind faith: not fencing unseen nodes

Wed Dec 14 19:12:21 CET 2016

On 12/14/2016 11:14 AM, Denis Gribkov wrote:
> Hi Everyone,
> 
> Our company have 15-nodes asynchronous cluster without actually
> configured FENCING/STONITH (as I think) features.
> 
> The DC node log getting tons of messages like in subject:
> 
> pengine:  warning: handle_startup_fencing:  Blind faith: not fencing
> unseen nodes

This is logged because you have set "startup-fencing: false".

It's logged as a warning because that setting is potentially dangerous:
a node that hasn't been seen by the cluster is not necessarily down --
it could be up and accessing shared resources, but unable to communicate
with the cluster. The only safe action is for the cluster to fence the node.

As of Pacemaker 1.1.16, the message will be logged only once. Before
Pacemaker 1.1.16, it is logged once per node, every time Pacemaker
checks the cluster state.

Of course, having "stonith-enabled: false" is also dangerous, because
fencing is the only way to recover from certain error conditions.

> the message repeated 15 times before next 15 messages:
> 
> pengine:     info: determine_online_status: Node Node1 is online
> 
> ...
> 
> pengine:     info: determine_online_status: Node Node15 is online
> 
> 
> The issue looks like similar to:
> 
> http://oss.clusterlabs.org/pipermail/pacemaker/2014-June/021995.html
> 
> but with own features.
> 
> 
> Our variables:
> 
> Oracle Linux Server release 6.8
> 
> Pacemaker 1.1.14-8.el6
> 
> Corosync Cluster Engine, version '1.4.7'
> 
> CMAN 3.0.12.1
> 
> 
> Cluster Properties:
>  cluster-infrastructure: cman
>  cluster-recheck-interval: 1
>  dc-version: 1.1.14-8.el6-70404b0
>  expected-quorum-votes: 3
>  have-watchdog: false
>  last-lrm-refresh: 1481444797
>  maintenance-mode: false
>  no-quorum-policy: ignore
>  startup-fencing: false
>  stonith-enabled: false
>  symmetric-cluster: false
> 
> 
> Example of /etc/cluster/cluster.conf:
> 
> <cluster config_version="96" name="cluster">
>   <fence_daemon/>
> 
>   <clusternodes>
>     <clusternode name="Node1" nodeid="1">
>       <fence>
>         <method name="pcmk-redirect">
>           <device name="pcmk" port="Node1"/>
>         </method>
>       </fence>
>       <altname name="Node1.name"/>
>     </clusternode>
> 
> <...>
> 
>     <clusternode name="Node2" nodeid="15">
>       <fence>
>         <method name="pcmk-redirect">
>           <device name="pcmk" port="Node2"/>
>         </method>
>       </fence>
>       <altname name="Node2.name"/>
>     </clusternode>
>   </clusternodes>
>   <cman expected_votes="2" two_node="0"/>
>   <fencedevices>
>     <fencedevice name="pcmk" agent="fence_pcmk"/>
>   </fencedevices>
>   <rm>
>     <failoverdomains/>
>     <resources/>
>   </rm>
>   <logging debug="off"/>
> </cluster>
> 
> Example of /etc/corosync/corosync.conf:
> 
> compatibility: whitetank
> 
> totem {
>         version: 2
>         secauth: on
>         threads: 4
>         rrp_mode: active
> 
>         interface {
> 
>                 member {
>                         memberaddr: PRIVATE_IP_1
>                 }
> 
> ...
> 
>                 member {
>                         memberaddr: PRIVATE_IP_15
>                 }
> 
>                 ringnumber: 0
>                 bindnetaddr: PRIVATE_NET_ADDR
>                 mcastaddr: 226.0.0.1
>                 mcastport: 5407
>                 ttl: 1
>         }
> 
>        interface {
> 
>                 member {
>                         memberaddr: PUBLIC_IP_1
>                 }
> ...
> 
>                 member {
>                         memberaddr: PUBLIC_IP_15
>                 }
> 
>                 ringnumber: 1
>                 bindnetaddr: PUBLIC_NET_ADDR
>                 mcastaddr: 224.0.0.1
>                 mcastport: 5405
>                 ttl: 1
>         }
> 
>         transport: udpu
> logging {
>         fileline: off
>         to_stderr: no
>         to_logfile: yes
>         logfile: /var/log/cluster/corosync.log
>         logfile_priority: warning
>         to_syslog: no
>         debug: off
>         timestamp: on
>         logger_subsys {
>            subsys: AMF
>            debug: off
>         }
> }
> 
> 
> Please let me know if you will need any other information.
> 
> Thank you.