[Pacemaker] Blind Faith still fencing unseen nodes

Fri Jun 27 02:31:57 EDT 2014

On 13 Jun 2014, at 9:21 pm, Jason Hendry <jhendry at mintel.com> wrote:

> 
> Hi Everyone,
> 
> This is my first post, please let me know if I am missing any standard/essential information to help with debugging...
> 
> I have a 2-node cluster with node-level fencing.  The cluster appears to be configured with "Blind Faith" but my nodes are still killing each other if the host is up but the cluster is not running on it, to produce this I:
> 
> Power-on both nodes
> Stop the cluster on both node [pcs cluster stop]
> Start the cluster on a single node  [pcs cluster start]
> 
> After starting the cluster I get this message the cluster logs:
> 
> Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad    pengine:  warning: unpack_nodes: Blind faith: not fencing unseen nodes
> Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad    pengine:     info: determine_online_status_fencing: Node ha-nfs1 is active
> Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad    pengine:     info: determine_online_status: Node ha-nfs1 is online
> Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad    pengine:  warning: pe_fence_node: Node ha-nfs2 will be fenced because the peer has not been seen by the cluster
> 
> Am I miss-understanding the meaning of "Blind faith" or is something mis-configured?

Looks like you might have found a bug.
"Blind faith" is a particularly dangerous option to turn on, so it doesn't get tested very often.

A few lines further down in your logs should be a message from pengine that looks something like:

Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad    pengine:  warning: process_pe_message: Calculated Transition ${X}: /var/lib/pacemaker/pengine/pe-warn-${Y}.bz2

If you can send us that file I'll make sure it gets fixed. 

>  Both my nodes are:
> 
> Centos 6.5 (Final)  (uname -a:  Linux dev-drbd01.london.mintel.ad 2.6.32-431.17.1.el6.x86_64 #1 SMP Wed May 7 23:32:49 UTC 2014 x86_64 x86_64 x86_64 GNU/Linu
> pacemakerd --version  (  Pacemaker 1.1.10-14.el6_5.3  )
> 
> Here is my cluster configuration:
> 
> 
> pcs resource create nfsDRBD       ocf:linbit:drbd           drbd_resource=nfs op monitor interval=8s meta migration-thresholds=0
> pcs resource create nfsLVM        ocf:heartbeat:LVM         volgrpname="vg_drbd" op monitor interval=7s meta migration-thresholds=0
> pcs resource create nfsDir        ocf:heartbeat:Filesystem  device=/dev/vg_drbd/lv_nfs_home directory=/data/nfs/home fstype=ext4 run_fsck=force op monitor interval=6s meta migration-thresholds=0
> pcs resource create nfsService    lsb:nfs op monitor        interval=5s meta migration-thresholds=0
> pcs resource create nfsIP         ocf:heartbeat:IPaddr2     ip=a.b.c.d cidr_netmask=32 op monitor interval=9s meta migration-thresholds=0
> pcs resource create network_ping  ocf:pacemaker:ping        name=network_ping multiplier=5 host_list="a.b.c.d w.x.y.z" attempts=3 timeout=1 failure_score=10 op monitor interval=4s
> pcs resource clone  network_ping                            op meta interleave=true
> 
> pcs resource master nfsDRBD_ms nfsDRBD master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Started is-managed=true
> pcs resource group add nfsGroup nfsLVM nfsDir nfsService nfsIP
> 
> pcs constraint order promote nfsDRBD_ms then start nfsGroup kind=Mandatory symmetrical=false
> pcs constraint order stop nfsGroup then demote nfsDRBD_ms kind=Optional symmetrical=false
> pcs constraint colocation add nfsGroup with master nfsDRBD_ms INFINITY
> 
> pcs property set no-quorum-policy=ignore
> pcs property set expected-quorum-votes=1
> pcs property set stonith-enabled=true
> pcs property set default-resource-stickiness=200
> pcs property set batch-limit=1
> pcs property set startup-fencing=false
> 
> pcs stonith create ha-nfs1_poweroff fence_virsh action=off ipaddr=a.b.c.d login=stonith secure=yes identity_file=/data/stonith_id_rsa port=dev-drbd01.london pcmk_host_map="ha-nfs1:dev-drbd01.london" op meta priority=200
> pcs stonith create ha-nfs2_poweroff fence_virsh action=off ipaddr=w.x.y.z login=stonith secure=yes identity_file=/data/stonith_id_rsa port=dev-drbd02.london pcmk_host_map="ha-nfs2:dev-drbd02.london" op meta priority=200
> 
> pcs stonith level add 1 ha-nfs1 ha-nfs1_poweroff
> pcs stonith level add 1 ha-nfs2 ha-nfs2_poweroff
> 
> pcs constraint location ha-nfs1_poweroff prefers ha-nfs1=-INFINITY
> pcs constraint location ha-nfs2_poweroff prefers ha-nfs2=-INFINITY
> pcs constraint location nfsDRBD rule role=Master defined network_ping
> 
> Jason H
> 
> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
> 
> Contact details for our other offices can be found at 
> http://www.mintel.com/office-locations
> .
> 
> This email and any attachments may include content that is confidential, privileged 
> 
>  or otherwise protected under applicable law. Unauthorised disclosure, copying, distribution 
>  or use of the contents is prohibited and may be unlawful. If you have received this email in error,
>  including without appropriate authorisation, then please reply to the sender about the error 
>  and delete this email and any attachments.
> 
> 
> _______________________________________________
> 
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140627/c1fafee1/attachment-0003.sig>