<div dir="ltr"><br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="gmail-">On 24/03/17 04:44 PM, Seth Reid wrote:<br>

> I have a three node Pacemaker/GFS2 cluster on Ubuntu 16.04. Its not in<br>

> production yet because I'm having a problem during fencing. When I<br>

> disable the network interface of any one machine, the disabled machines<br>

> is properly fenced leaving me, briefly, with a two node cluster. A<br>

> second node is then fenced off immediately, and the remaining node<br>

> appears to try to fence itself off. This leave two nodes with<br>

> corosync/pacemaker stopped, and the remaining machine still in the<br>

> cluster but showing an offline node and an UNCLEAN node. What can be<br>

> causing this behavior?<br>

<br>

</span>It looks like the fence attempt failed, leaving the cluster hung. When<br>

you say all nodes were fenced, did all nodes actually reboot? Or did the<br>

two surviving nodes just lock up? If the later, then that is the proper<br>

response to a failed fence (DLM stays blocked).<br></blockquote><div><br></div><div>The action if "off", so we aren't rebooting. The logs do still say reboot though. In terms of actual fencing, only node 2 gets fenced, in that its keys get removed from the shared volume. Node 1's keys don't get removed so that is the failed fence. Node2 fence succeeds.</div><div><br></div><div>Of the remaining nodes, node 1 is offline in that corosync and pacemaker are no longer running, so it can't access cluster resources. Node 3 shows node 1 as Online but in a clean state. Neither node 1 or node 3 can write to the cluster, but node 3 still has corosync and pacemaker running.</div><div><br></div><div>Here are the commands I used to build the cluster. I meant to put these in the original post.</div><div><br></div><div><div>(single machine)$> pcs property set no-quorum-policy=freeze</div><div>(single machine)$> pcs property set stonith-enabled=true</div><div>(single machine)$> pcs property set symmetric-cluster=true</div><div>(single machine)$> pcs cluster enable --all    </div><div>(single machine)$> pcs stonith create fence_wh fence_scsi debug="/var/log/cluster/fence-debug.log" vgs_path="/sbin/vgs" sg_persist_path="/usr/bin/sg_persist" sg_turs_path="/usr/bin/sg_turs" pcmk_reboot_action="off" pcmk_host_list="b013-cl b014-cl b015-cl" pcmk_monitor_action="metadata" meta provides="unfencing" --force</div><div>(single machine)$> pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s on-fail=fence clone interleave=true ordered=true</div><div>(single machine)$> pcs resource create clvmd ocf:heartbeat:clvm op monitor interval=30s on-fail=fence clone interleave=true ordered=true</div><div>(single machine)$> pcs constraint order start dlm-clone then clvmd-clone</div><div>(single machine)$> pcs constraint colocation add clvmd-clone with dlm-clone</div><div>(single machine)$> mkfs.gfs2 -p lock_dlm -t webhosts:share_data -j 3 /dev/mapper/share-data</div><div>(single machine)$> pcs resource create gfs2share Filesystem device="/dev/mapper/share-data" directory="/share" fstype="gfs2" options="noatime,nodiratime" op monitor interval=10s on-fail=fence clone interleave=true </div><div>(single machine)$> pcs constraint order start clvmd-clone then gfs2share-clone</div><div>(single machine)$> pcs constraint colocation add gfs2share-clone with clvmd-clone</div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<span class="gmail-"><br>

> Each machine has a dedicated network interface for the cluster, and<br>

> there is a vlan on the switch devoted to just these interfaces.<br>

> In the following, I disabled the interface on node id 2 (b014). Node 1<br>

> (b013) is fenced as well. Node 2 (b015) is still up.<br>

><br>

> Logs from b013:<br>

> Mar 24 16:35:01 b013 CRON[19133]: (root) CMD (command -v debian-sa1 ><br>

> /dev/null && debian-sa1 1 1)<br>

> Mar 24 16:35:13 b013 corosync[2134]: notice  [TOTEM ] A processor<br>

> failed, forming new configuration.<br>

> Mar 24 16:35:13 b013 corosync[2134]:  [TOTEM ] A processor failed,<br>

> forming new configuration.<br>

> Mar 24 16:35:17 b013 corosync[2134]: notice  [TOTEM ] A new membership<br>

</span>> (<a href="http://192.168.100.13:576" rel="noreferrer" target="_blank">192.168.100.13:576</a> <<a href="http://192.168.100.13:576" rel="noreferrer" target="_blank">http://192.168.100.13:576</a>>) was formed. Members left: 2<br>

<span class="gmail-">> Mar 24 16:35:17 b013 corosync[2134]: notice  [TOTEM ] Failed to receive<br>

> the leave message. failed: 2<br>

> Mar 24 16:35:17 b013 corosync[2134]:  [TOTEM ] A new membership<br>

</span>> (<a href="http://192.168.100.13:576" rel="noreferrer" target="_blank">192.168.100.13:576</a> <<a href="http://192.168.100.13:576" rel="noreferrer" target="_blank">http://192.168.100.13:576</a>>) was formed. Members left: 2<br>

<div><div class="gmail-h5">> Mar 24 16:35:17 b013 corosync[2134]:  [TOTEM ] Failed to receive the<br>

> leave message. failed: 2<br>

> Mar 24 16:35:17 b013 attrd[2223]:   notice: crm_update_peer_proc: Node<br>

> b014-cl[2] - state is now lost (was member)<br>

> Mar 24 16:35:17 b013 cib[2220]:   notice: crm_update_peer_proc: Node<br>

> b014-cl[2] - state is now lost (was member)<br>

> Mar 24 16:35:17 b013 cib[2220]:   notice: Removing b014-cl/2 from the<br>

> membership list<br>

> Mar 24 16:35:17 b013 cib[2220]:   notice: Purged 1 peers with id=2<br>

> and/or uname=b014-cl from the membership cache<br>

> Mar 24 16:35:17 b013 pacemakerd[2187]:   notice: crm_reap_unseen_nodes:<br>

> Node b014-cl[2] - state is now lost (was member)<br>

> Mar 24 16:35:17 b013 attrd[2223]:   notice: Removing b014-cl/2 from the<br>

> membership list<br>

> Mar 24 16:35:17 b013 attrd[2223]:   notice: Purged 1 peers with id=2<br>

> and/or uname=b014-cl from the membership cache<br>

> Mar 24 16:35:17 b013 stonith-ng[2221]:   notice: crm_update_peer_proc:<br>

> Node b014-cl[2] - state is now lost (was member)<br>

> Mar 24 16:35:17 b013 stonith-ng[2221]:   notice: Removing b014-cl/2 from<br>

> the membership list<br>

> Mar 24 16:35:17 b013 stonith-ng[2221]:   notice: Purged 1 peers with<br>

> id=2 and/or uname=b014-cl from the membership cache<br>

> Mar 24 16:35:17 b013 dlm_controld[2727]: 3091 fence request 2 pid 19223<br>

> nodedown time 1490387717 fence_all dlm_stonith<br>

> Mar 24 16:35:17 b013 kernel: [ 3091.800118] dlm: closing connection to<br>

> node 2<br>

> Mar 24 16:35:17 b013 crmd[2227]:   notice: crm_reap_unseen_nodes: Node<br>

> b014-cl[2] - state is now lost (was member)<br>

> Mar 24 16:35:17 b013 dlm_stonith: stonith_api_time: Found 0 entries for<br>

> 2/(null): 0 in progress, 0 completed<br>

> Mar 24 16:35:18 b013 stonith-ng[2221]:   notice: Operation reboot of<br>

> b014-cl by b015-cl for stonith-api.19223@b013-cl.<wbr>7aeb2ffb: OK<br>

> Mar 24 16:35:18 b013 stonith-api[19223]: stonith_api_kick: Node 2/(null)<br>

> kicked: reboot<br>

> Mar 24 16:35:18 b013 kernel: [ 3092.421495] dlm: closing connection to<br>

> node 3<br>

> Mar 24 16:35:18 b013 kernel: [ 3092.422246] dlm: closing connection to<br>

> node 1<br>

> Mar 24 16:35:18 b013 dlm_controld[2727]: 3092 abandoned lockspace share_data<br>

> Mar 24 16:35:18 b013 dlm_controld[2727]: 3092 abandoned lockspace clvmd<br>

> Mar 24 16:35:18 b013 kernel: [ 3092.426545] dlm: dlm user daemon left 2<br>

> lockspaces<br>

> Mar 24 16:35:18 b013 systemd[1]: corosync.service: Main process exited,<br>

> code=exited, status=255/n/a<br>

> Mar 24 16:35:18 b013 cib[2220]:    error: Connection to the CPG API<br>

> failed: Library error (2)<br>

> Mar 24 16:35:18 b013 systemd[1]: corosync.service: Unit entered failed<br>

> state.<br>

> Mar 24 16:35:18 b013 attrd[2223]:    error: Connection to cib_rw failed<br>

> Mar 24 16:35:18 b013 systemd[1]: corosync.service: Failed with result<br>

> 'exit-code'.<br>

> Mar 24 16:35:18 b013 attrd[2223]:    error: Connection to<br>

> cib_rw[0x560754147990] closed (I/O condition=17)<br>

> Mar 24 16:35:18 b013 systemd[1]: pacemaker.service: Main process exited,<br>

> code=exited, status=107/n/a<br>

> Mar 24 16:35:18 b013 pacemakerd[2187]:    error: Connection to the CPG<br>

> API failed: Library error (2)<br>

> Mar 24 16:35:18 b013 systemd[1]: pacemaker.service: Unit entered failed<br>

> state.<br>

> Mar 24 16:35:18 b013 attrd[2223]:   notice: Disconnecting client<br>

> 0x560754149000, pid=2227...<br>

> Mar 24 16:35:18 b013 systemd[1]: pacemaker.service: Failed with result<br>

> 'exit-code'.<br>

> Mar 24 16:35:18 b013 lrmd[2222]:  warning: new_event_notification<br>

> (2222-2227-8): Bad file descriptor (9)<br>

> Mar 24 16:35:18 b013 stonith-ng[2221]:    error: Connection to cib_rw failed<br>

> Mar 24 16:35:18 b013 stonith-ng[2221]:    error: Connection to<br>

> cib_rw[0x5579c03ecdd0] closed (I/O condition=17)<br>

> Mar 24 16:35:18 b013 lrmd[2222]:    error: Connection to stonith-ng failed<br>

> Mar 24 16:35:18 b013 lrmd[2222]:    error: Connection to<br>

> stonith-ng[0x55888c8ef820] closed (I/O condition=17)<br>

> Mar 24 16:37:02 b013 kernel: [ 3196.469475] dlm: node 0: socket error<br>

> sending to node 2, port 21064, sk_err=113/113<br>

> Mar 24 16:37:02 b013 kernel: [ 3196.470675] dlm: node 0: socket error<br>

> sending to node 2, port 21064, sk_err=113/113<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.833544] INFO: task gfs2_quotad:3054<br>

> blocked for more than 120 seconds.<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.834565]       Not tainted<br>

> 4.4.0-66-generic #87-Ubuntu<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.835413] "echo 0 ><br>

> /proc/sys/kernel/hung_task_<wbr>timeout_secs" disables this message.<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836656] gfs2_quotad     D<br>

> ffff880fd747fa38     0  3054      2 0x00000000<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836663]  ffff880fd747fa38<br>

> 00000001d8144018 ffff880fd975f2c0 ffff880fd7a972c0<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836666]  ffff880fd7480000<br>

> ffff887fd81447b8 ffff887fd81447d0 ffff881fd7af00b0<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836669]  0000000000000004<br>

> ffff880fd747fa50 ffffffff818384d5 ffff880fd7a972c0<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836672] Call Trace:<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836688]  [<ffffffff818384d5>]<br>

> schedule+0x35/0x80<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836695]  [<ffffffff8183b380>]<br>

> rwsem_down_read_failed+0xe0/<wbr>0x140<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836701]  [<ffffffff81406574>]<br>

> call_rwsem_down_read_failed+<wbr>0x14/0x30<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836704]  [<ffffffff8183a920>] ?<br>

> down_read+0x20/0x30<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836726]  [<ffffffffc0583324>]<br>

> dlm_lock+0x84/0x1f0 [dlm]<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836731]  [<ffffffff810b57e3>] ?<br>

> check_preempt_wakeup+0x193/<wbr>0x220<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836755]  [<ffffffffc06a5da0>] ?<br>

> gdlm_recovery_result+0x130/<wbr>0x130 [gfs2]<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836764]  [<ffffffffc06a5050>] ?<br>

> gdlm_cancel+0x30/0x30 [gfs2]<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836769]  [<ffffffff810ab579>] ?<br>

> ttwu_do_wakeup+0x19/0xe0<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836779]  [<ffffffffc06a5499>]<br>

> gdlm_lock+0x1d9/0x300 [gfs2]<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836788]  [<ffffffffc06a5050>] ?<br>

> gdlm_cancel+0x30/0x30 [gfs2]<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836798]  [<ffffffffc06a5da0>] ?<br>

> gdlm_recovery_result+0x130/<wbr>0x130 [gfs2]<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836807]  [<ffffffffc0686e5f>]<br>

> do_xmote+0x16f/0x290 [gfs2]<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836816]  [<ffffffffc068705c>]<br>

> run_queue+0xdc/0x2d0 [gfs2]<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836824]  [<ffffffffc06875ef>]<br>

> gfs2_glock_nq+0x20f/0x410 [gfs2]<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836834]  [<ffffffffc06a2006>]<br>

> gfs2_statfs_sync+0x76/0x1c0 [gfs2]<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836841]  [<ffffffff810ed018>] ?<br>

> del_timer_sync+0x48/0x50<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836851]  [<ffffffffc06a1ffc>] ?<br>

> gfs2_statfs_sync+0x6c/0x1c0 [gfs2]<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836861]  [<ffffffffc0697fe3>]<br>

> quotad_check_timeo.part.18+<wbr>0x23/0x80 [gfs2]<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836871]  [<ffffffffc069ad01>]<br>

> gfs2_quotad+0x241/0x2d0 [gfs2]<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836876]  [<ffffffff810c41e0>] ?<br>

> wake_atomic_t_function+0x60/<wbr>0x60<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836886]  [<ffffffffc069aac0>] ?<br>

> gfs2_wake_up_statfs+0x40/0x40 [gfs2]<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836890]  [<ffffffff810a0ba8>]<br>

> kthread+0xd8/0xf0<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836893]  [<ffffffff810a0ad0>] ?<br>

> kthread_create_on_node+0x1e0/<wbr>0x1e0<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836897]  [<ffffffff8183c98f>]<br>

> ret_from_fork+0x3f/0x70<br>

> Mar 24 16:37:46 b013 kernel: [ 3240.836900]  [<ffffffff810a0ad0>] ?<br>

> kthread_create_on_node+0x1e0/<wbr>0x1e0<br>

><br>

> Logs from b015:<br>

> Mar 24 16:35:01 b015 CRON[19781]: (root) CMD (command -v debian-sa1 ><br>

> /dev/null && debian-sa1 1 1)<br>

> Mar 24 16:35:13 b015 corosync[2105]: notice  [TOTEM ] A processor<br>

> failed, forming new configuration.<br>

> Mar 24 16:35:13 b015 corosync[2105]:  [TOTEM ] A processor failed,<br>

> forming new configuration.<br>

> Mar 24 16:35:17 b015 corosync[2105]: notice  [TOTEM ] A new membership<br>

</div></div>> (<a href="http://192.168.100.13:576" rel="noreferrer" target="_blank">192.168.100.13:576</a> <<a href="http://192.168.100.13:576" rel="noreferrer" target="_blank">http://192.168.100.13:576</a>>) was formed. Members left: 2<br>

<span class="gmail-">> Mar 24 16:35:17 b015 corosync[2105]: notice  [TOTEM ] Failed to receive<br>

> the leave message. failed: 2<br>

> Mar 24 16:35:17 b015 corosync[2105]:  [TOTEM ] A new membership<br>

</span>> (<a href="http://192.168.100.13:576" rel="noreferrer" target="_blank">192.168.100.13:576</a> <<a href="http://192.168.100.13:576" rel="noreferrer" target="_blank">http://192.168.100.13:576</a>>) was formed. Members left: 2<br>

<div><div class="gmail-h5">> Mar 24 16:35:17 b015 corosync[2105]:  [TOTEM ] Failed to receive the<br>

> leave message. failed: 2<br>

> Mar 24 16:35:17 b015 attrd[2253]:   notice: crm_update_peer_proc: Node<br>

> b014-cl[2] - state is now lost (was member)<br>

> Mar 24 16:35:17 b015 attrd[2253]:   notice: Removing b014-cl/2 from the<br>

> membership list<br>

> Mar 24 16:35:17 b015 attrd[2253]:   notice: Purged 1 peers with id=2<br>

> and/or uname=b014-cl from the membership cache<br>

> Mar 24 16:35:17 b015 stonith-ng[2251]:   notice: crm_update_peer_proc:<br>

> Node b014-cl[2] - state is now lost (was member)<br>

> Mar 24 16:35:17 b015 stonith-ng[2251]:   notice: Removing b014-cl/2 from<br>

> the membership list<br>

> Mar 24 16:35:17 b015 cib[2249]:   notice: crm_update_peer_proc: Node<br>

> b014-cl[2] - state is now lost (was member)<br>

> Mar 24 16:35:17 b015 crmd[2255]:   notice: State transition S_IDLE -><br>

> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL<br>

> origin=abort_transition_graph ]<br>

> Mar 24 16:35:17 b015 kernel: [ 3478.622093] dlm: closing connection to<br>

> node 2<br>

> Mar 24 16:35:17 b015 stonith-ng[2251]:   notice: Purged 1 peers with<br>

> id=2 and/or uname=b014-cl from the membership cache<br>

> Mar 24 16:35:17 b015 cib[2249]:   notice: Removing b014-cl/2 from the<br>

> membership list<br>

> Mar 24 16:35:17 b015 cib[2249]:   notice: Purged 1 peers with id=2<br>

> and/or uname=b014-cl from the membership cache<br>

> Mar 24 16:35:17 b015 crmd[2255]:   notice: crm_reap_unseen_nodes: Node<br>

> b014-cl[2] - state is now lost (was member)<br>

> Mar 24 16:35:17 b015 pacemakerd[2159]:   notice: crm_reap_unseen_nodes:<br>

> Node b014-cl[2] - state is now lost (was member)<br>

> Mar 24 16:35:18 b015 systemd[1]:<br>

> dev-disk-by\x2did-scsi\<wbr>x2d36782bcb0007085a70000081958<wbr>aee1ff.device: Dev<br>

> dev-disk-by\x2did-scsi\<wbr>x2d36782bcb0007085a70000081958<wbr>aee1ff.device<br>

> appeared twice with different sysfs paths<br>

> /sys/devices/pci0000:00/0000:<wbr>00:03.0/0000:08:00.0/host7/<wbr>port-7:0/end_device-7:0/<wbr>target7:0:0/7:0:0:0/block/sdc<br>

> and /sys/devices/virtual/block/dm-<wbr>0<br>

> Mar 24 16:35:18 b015 systemd[1]:<br>

> dev-disk-by\x2did-wwn\<wbr>x2d0x6782bcb0007085a7000008195<wbr>8aee1ff.device: Dev<br>

> dev-disk-by\x2did-wwn\<wbr>x2d0x6782bcb0007085a7000008195<wbr>8aee1ff.device<br>

> appeared twice with different sysfs paths<br>

> /sys/devices/pci0000:00/0000:<wbr>00:03.0/0000:08:00.0/host7/<wbr>port-7:0/end_device-7:0/<wbr>target7:0:0/7:0:0:0/block/sdc<br>

> and /sys/devices/virtual/block/dm-<wbr>0<br>

> Mar 24 16:35:18 b015 systemd[1]:<br>

> dev-disk-by\x2did-scsi\<wbr>x2d36782bcb0007085a70000081958<wbr>aee1ff.device: Dev<br>

> dev-disk-by\x2did-scsi\<wbr>x2d36782bcb0007085a70000081958<wbr>aee1ff.device<br>

> appeared twice with different sysfs paths<br>

> /sys/devices/pci0000:00/0000:<wbr>00:03.0/0000:08:00.0/host7/<wbr>port-7:0/end_device-7:0/<wbr>target7:0:0/7:0:0:0/block/sdc<br>

> and /sys/devices/virtual/block/dm-<wbr>0<br>

> Mar 24 16:35:18 b015 systemd[1]:<br>

> dev-disk-by\x2did-wwn\<wbr>x2d0x6782bcb0007085a7000008195<wbr>8aee1ff.device: Dev<br>

> dev-disk-by\x2did-wwn\<wbr>x2d0x6782bcb0007085a7000008195<wbr>8aee1ff.device<br>

> appeared twice with different sysfs paths<br>

> /sys/devices/pci0000:00/0000:<wbr>00:03.0/0000:08:00.0/host7/<wbr>port-7:0/end_device-7:0/<wbr>target7:0:0/7:0:0:0/block/sdc<br>

> and /sys/devices/virtual/block/dm-<wbr>0<br>

> Mar 24 16:35:18 b015 stonith-ng[2251]:  warning: fence_scsi[19818]<br>

> stderr: [ WARNING:root:Parse error: Ignoring unknown option 'port=b014-cl' ]<br>

> Mar 24 16:35:18 b015 stonith-ng[2251]:  warning: fence_scsi[19818]<br>

> stderr: [  ]<br>

> Mar 24 16:35:18 b015 stonith-ng[2251]:   notice: Operation 'reboot'<br>

> [19818] (call 2 from stonith-api.19223) for host 'b014-cl' with device<br>

> 'fence_wh' returned: 0 (OK)<br>

> Mar 24 16:35:18 b015 stonith-ng[2251]:   notice: Operation reboot of<br>

> b014-cl by b015-cl for stonith-api.19223@b013-cl.<wbr>7aeb2ffb: OK<br>

> Mar 24 16:35:18 b015 dlm_controld[2656]: 3479 fence request 2 pid 19880<br>

> nodedown time 1490387717 fence_all dlm_stonith<br>

> Mar 24 16:35:18 b015 dlm_controld[2656]: 3479 tell corosync to remove<br>

> nodeid 1 from cluster<br>

> Mar 24 16:35:18 b015 systemd[1]:<br>

> dev-disk-by\x2did-scsi\<wbr>x2d36782bcb0007085a70000081958<wbr>aee1ff.device: Dev<br>

> dev-disk-by\x2did-scsi\<wbr>x2d36782bcb0007085a70000081958<wbr>aee1ff.device<br>

> appeared twice with different sysfs paths<br>

> /sys/devices/pci0000:00/0000:<wbr>00:03.0/0000:08:00.0/host7/<wbr>port-7:0/end_device-7:0/<wbr>target7:0:0/7:0:0:0/block/sdc<br>

> and /sys/devices/virtual/block/dm-<wbr>0<br>

> Mar 24 16:35:18 b015 systemd[1]:<br>

> dev-disk-by\x2did-wwn\<wbr>x2d0x6782bcb0007085a7000008195<wbr>8aee1ff.device: Dev<br>

> dev-disk-by\x2did-wwn\<wbr>x2d0x6782bcb0007085a7000008195<wbr>8aee1ff.device<br>

> appeared twice with different sysfs paths<br>

> /sys/devices/pci0000:00/0000:<wbr>00:03.0/0000:08:00.0/host7/<wbr>port-7:0/end_device-7:0/<wbr>target7:0:0/7:0:0:0/block/sdc<br>

> and /sys/devices/virtual/block/dm-<wbr>0<br>

> Mar 24 16:35:18 b015 dlm_controld[2656]: 3479 tell corosync to remove<br>

> nodeid 1 from cluster<br>

> Mar 24 16:35:18 b015 dlm_stonith: stonith_api_time: Found 2 entries for<br>

> 2/(null): 0 in progress, 2 completed<br>

> Mar 24 16:35:18 b015 dlm_stonith: stonith_api_time: Node 2/(null) last<br>

> kicked at: 1490387718<br>

> Mar 24 16:35:18 b015 kernel: [ 3479.266118] dlm: closing connection to<br>

> node 1<br>

> Mar 24 16:35:18 b015 kernel: [ 3479.266270] dlm: closing connection to<br>

> node 3<br>

> Mar 24 16:35:18 b015 dlm_controld[2656]: 3479 abandoned lockspace share_data<br>

> Mar 24 16:35:18 b015 dlm_controld[2656]: 3479 abandoned lockspace clvmd<br>

> Mar 24 16:35:18 b015 kernel: [ 3479.268325] dlm: dlm user daemon left 2<br>

> lockspaces<br>

> Mar 24 16:35:21 b015 corosync[2105]: notice  [TOTEM ] A processor<br>

> failed, forming new configuration.<br>

> Mar 24 16:35:21 b015 corosync[2105]:  [TOTEM ] A processor failed,<br>

> forming new configuration.<br>

> Mar 24 16:35:26 b015 corosync[2105]: notice  [TOTEM ] A new membership<br>

</div></div>> (<a href="http://192.168.100.15:580" rel="noreferrer" target="_blank">192.168.100.15:580</a> <<a href="http://192.168.100.15:580" rel="noreferrer" target="_blank">http://192.168.100.15:580</a>>) was formed. Members left: 1<br>

<span class="gmail-">> Mar 24 16:35:26 b015 corosync[2105]: notice  [TOTEM ] Failed to receive<br>

> the leave message. failed: 1<br>

> Mar 24 16:35:26 b015 corosync[2105]:  [TOTEM ] A new membership<br>

</span>> (<a href="http://192.168.100.15:580" rel="noreferrer" target="_blank">192.168.100.15:580</a> <<a href="http://192.168.100.15:580" rel="noreferrer" target="_blank">http://192.168.100.15:580</a>>) was formed. Members left: 1<br>

<div><div class="gmail-h5">> Mar 24 16:35:26 b015 corosync[2105]:  [TOTEM ] Failed to receive the<br>

> leave message. failed: 1<br>

> Mar 24 16:35:26 b015 attrd[2253]:   notice: crm_update_peer_proc: Node<br>

> b013-cl[1] - state is now lost (was member)<br>

> Mar 24 16:35:26 b015 attrd[2253]:   notice: Removing b013-cl/1 from the<br>

> membership list<br>

> Mar 24 16:35:26 b015 stonith-ng[2251]:   notice: crm_update_peer_proc:<br>

> Node b013-cl[1] - state is now lost (was member)<br>

> Mar 24 16:35:26 b015 attrd[2253]:   notice: Purged 1 peers with id=1<br>

> and/or uname=b013-cl from the membership cache<br>

> Mar 24 16:35:26 b015 stonith-ng[2251]:   notice: Removing b013-cl/1 from<br>

> the membership list<br>

> Mar 24 16:35:26 b015 pacemakerd[2159]:   notice: Membership 580: quorum<br>

> lost (1)<br>

> Mar 24 16:35:26 b015 cib[2249]:   notice: crm_update_peer_proc: Node<br>

> b013-cl[1] - state is now lost (was member)<br>

> Mar 24 16:35:26 b015 stonith-ng[2251]:   notice: Purged 1 peers with<br>

> id=1 and/or uname=b013-cl from the membership cache<br>

> Mar 24 16:35:26 b015 pacemakerd[2159]:   notice: crm_reap_unseen_nodes:<br>

> Node b013-cl[1] - state is now lost (was member)<br>

> Mar 24 16:35:26 b015 cib[2249]:   notice: Removing b013-cl/1 from the<br>

> membership list<br>

> Mar 24 16:35:26 b015 cib[2249]:   notice: Purged 1 peers with id=1<br>

> and/or uname=b013-cl from the membership cache<br>

> Mar 24 16:35:26 b015 crmd[2255]:   notice: Membership 580: quorum lost (1)<br>

> Mar 24 16:35:26 b015 crmd[2255]:   notice: crm_reap_unseen_nodes: Node<br>

> b013-cl[1] - state is now lost (was member)<br>

> Mar 24 16:35:26 b015 pengine[2254]:   notice: We do not have quorum -<br>

> fencing and resource management disabled<br>

> Mar 24 16:35:26 b015 pengine[2254]:  warning: Node b013-cl is unclean<br>

> because the node is no longer part of the cluster<br>

> Mar 24 16:35:26 b015 pengine[2254]:  warning: Node b013-cl is unclean<br>

> Mar 24 16:35:26 b015 pengine[2254]:  warning: Action dlm:1_stop_0 on<br>

> b013-cl is unrunnable (offline)<br>

> Mar 24 16:35:26 b015 pengine[2254]:  warning: Action dlm:1_stop_0 on<br>

> b013-cl is unrunnable (offline)<br>

> Mar 24 16:35:26 b015 pengine[2254]:  warning: Action clvmd:1_stop_0 on<br>

> b013-cl is unrunnable (offline)<br>

> Mar 24 16:35:26 b015 pengine[2254]:  warning: Action clvmd:1_stop_0 on<br>

> b013-cl is unrunnable (offline)<br>

> Mar 24 16:35:26 b015 pengine[2254]:  warning: Action gfs2share:1_stop_0<br>

> on b013-cl is unrunnable (offline)<br>

> Mar 24 16:35:26 b015 pengine[2254]:  warning: Action gfs2share:1_stop_0<br>

> on b013-cl is unrunnable (offline)<br>

> Mar 24 16:35:26 b015 pengine[2254]:  warning: Node b013-cl is unclean!<br>

> Mar 24 16:35:26 b015 pengine[2254]:   notice: Cannot fence unclean nodes<br>

> until quorum is attained (or no-quorum-policy is set to ignore)<br>

> Mar 24 16:35:26 b015 pengine[2254]:   notice: Start<br>

> fence_wh#011(b015-cl - blocked)<br>

> Mar 24 16:35:26 b015 pengine[2254]:   notice: Stop    dlm:1#011(b013-cl<br>

> - blocked)<br>

> Mar 24 16:35:26 b015 pengine[2254]:   notice: Stop<br>

>  clvmd:1#011(b013-cl - blocked)<br>

> Mar 24 16:35:26 b015 pengine[2254]:   notice: Stop<br>

>  gfs2share:1#011(b013-cl - blocked)<br>

> Mar 24 16:35:26 b015 pengine[2254]:  warning: Calculated Transition 9:<br>

> /var/lib/pacemaker/pengine/pe-<wbr>warn-2669.bz2<br>

> Mar 24 16:35:26 b015 crmd[2255]:   notice: Transition 9 (Complete=6,<br>

> Pending=0, Fired=0, Skipped=0, Incomplete=0,<br>

> Source=/var/lib/pacemaker/<wbr>pengine/pe-warn-2669.bz2): Complete<br>

> Mar 24 16:35:26 b015 crmd[2255]:   notice: State transition<br>

> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL<br>

> origin=notify_crmd ]<br>

> Mar 24 16:35:31 b015 controld(dlm)[20000]: ERROR: Uncontrolled lockspace<br>

> exists, system must reboot. Executing suicide fencing<br>

> Mar 24 16:35:31 b015 fence_scsi: Failed: keys cannot be same. You can<br>

> not fence yourself.<br>

> Mar 24 16:35:31 b015 fence_scsi: Please use '-h' for usage<br>

> Mar 24 16:35:31 b015 stonith-ng[2251]:  warning: fence_scsi[20020]<br>

> stderr: [ WARNING:root:Parse error: Ignoring unknown option 'port=b015-cl' ]<br>

> Mar 24 16:35:31 b015 stonith-ng[2251]:  warning: fence_scsi[20020]<br>

> stderr: [  ]<br>

> Mar 24 16:35:31 b015 stonith-ng[2251]:  warning: fence_scsi[20020]<br>

> stderr: [ ERROR:root:Failed: keys cannot be same. You can not fence<br>

> yourself. ]<br>

> Mar 24 16:35:31 b015 stonith-ng[2251]:  warning: fence_scsi[20020]<br>

> stderr: [  ]<br>

> Mar 24 16:35:31 b015 stonith-ng[2251]:  warning: fence_scsi[20020]<br>

> stderr: [ Failed: keys cannot be same. You can not fence yourself. ]<br>

> Mar 24 16:35:31 b015 stonith-ng[2251]:  warning: fence_scsi[20020]<br>

> stderr: [  ]<br>

> Mar 24 16:35:31 b015 stonith-ng[2251]:  warning: fence_scsi[20020]<br>

> stderr: [ ERROR:root:Please use '-h' for usage ]<br>

> Mar 24 16:35:31 b015 stonith-ng[2251]:  warning: fence_scsi[20020]<br>

> stderr: [  ]<br>

> Mar 24 16:35:31 b015 stonith-ng[2251]:  warning: fence_scsi[20020]<br>

> stderr: [ Please use '-h' for usage ]<br>

> Mar 24 16:35:31 b015 stonith-ng[2251]:  warning: fence_scsi[20020]<br>

> stderr: [  ]<br>

><br>

><br>

><br>

> Software versions:<br>

> corosync                           2.3.5-3ubuntu1<br>

> pacemaker-common         1.1.14-2ubuntu1.1<br>

> pcs                                    0.9.149-1ubuntu1<br>

> libqb0:amd64                    1.0-1ubuntu1<br>

> gfs2-utils                            3.1.6-0ubuntu3<br></div></div>

<br>

--<br>

Digimer<br>

Papers and Projects: <a href="https://alteeve.com/w/" rel="noreferrer" target="_blank">https://alteeve.com/w/</a><br>

"I am, somehow, less interested in the weight and convolutions of<br>

Einstein’s brain than in the near certainty that people of equal talent<br>

have lived and died in cotton fields and sweatshops." - Stephen Jay Gould<br>

<br>

______________________________<wbr>_________________<br>

Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

<a href="http://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/<wbr>mailman/listinfo/users</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/<wbr>doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>

</blockquote></div><br></div></div>