<div dir="ltr">Hi,<br><div><div class="gmail_extra"><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><pre style="white-space:pre-wrap;color:rgb(0,0,0);font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px">><i> As can be seen below from uptime, the node-1 is not shutdown by `pcs
</i>><i> cluster stop node-1` executed on itself.
</i>><i> I found some discussions on <a href="http://clusterlabs.org/mailman/listinfo/users">users at clusterlabs.org</a> about whether a node
</i>><i> running SBD resource can fence itself,
</i>><i> but the conclusion was not clear to me.
</i>><i>
</i>
I am not familiar with pcs, but stopping pacemaker services manually
makes node leave cluster in controlled manner, and does not result in
fencing, at least in my experience.
</pre></div></div></blockquote><div>I confirm that killing corosync on node-1 results in fencing of node-1, but in a reboot instead of my desired shutdown:<br>[root@node-1 ~]# killall -15 corosync<br>Broadcast message from systemd-journald@node-1 (Sat 2016-06-25 21:55:07 EDT):<br><br>sbd[4761]: /dev/sdb1: emerg: do_exit: Rebooting system: off<br><br></div><div>So the next is question 6: how to setup fence_sbd for the fenced node to shutdown?<br>Both action=off or mode=onoff action=off options passed to fence_sbd when creating the MyStonith resource result in a reboot.<br></div><div><br>[root@node-2 ~]# pcs stonith show MyStonith<br> Resource: MyStonith (class=stonith type=fence_sbd)<br> Attributes: devices=/dev/sdb1 power_timeout=21 action=off <br> Operations: monitor interval=60s (MyStonith-monitor-interval-60s)<br><br><br><br><br>Another question (the question 4 from my first post): The cluster is now in the state listed below.<br><br>[root@node-2 ~]# pcs status<br>Cluster name: mycluster<br>Last updated: Sat Jun 25 22:06:51 2016 Last change: Sat Jun 25 15:41:09 2016 by root via cibadmin on node-1<br>Stack: corosync<br>Current DC: node-2 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum<br>3 nodes and 1 resource configured<br><br>Online: [ node-2 node-3 ]<br>OFFLINE: [ node-1 ]<br><br>Full list of resources:<br><br> MyStonith (stonith:fence_sbd): Started node-2<br><br>PCSD Status:<br> node-1: Online<br> node-2: Online<br> node-3: Online<br><br>Daemon Status:<br> corosync: active/disabled<br> pacemaker: active/disabled<br> pcsd: active/enabled<br>[root@node-2 ~]# sbd -d /dev/sdb1 list<br>0 node-3 clear <br>1 node-2 clear <br>2 node-1 off node-2<br><br><br></div><div>What is the proper way of operating a cluster with SBD?<br></div><div>I found that executing sbd watch on node-1 clears the SBD status:<br></div><div>[root@node-1 ~]# sbd -d /dev/sdb1 watch<br>[root@node-1 ~]# sbd -d /dev/sdb1 list<br>0 node-3 clear <br>1 node-2 clear <br>2 node-1 clear <br></div><div>After making sure that sbd is not running on node-1 (I can do that because node-1 is currently not part of the cluster)<br>[root@node-1 ~]# killall -15 sbd <br>I can join node-1 to the cluster from node-2:<br>[root@node-2 ~]# pcs cluster start node-1<br><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><pre style="white-space:pre-wrap;color:rgb(0,0,0);font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px">><i> Question 3:
</i>><i> Neither node-1 is fenced by `stonith_admin -F node-1` executed on node-2,
</i>><i> despite the fact
</i>><i> /var/log/messages on node-2 (the one currently running MyStonith) reporting:
</i>><i> ...
</i>><i> notice: Operation 'off' [3309] (call 2 from stonith_admin.3288) for host
</i>><i> 'node-1' with device 'MyStonith' returned: 0 (OK)
</i>><i> ...
</i>><i> What is happening here?
</i>><i>
</i>
Do you have sbd daemon running? SBD is based on self-fencing - the only
thing that fence agent does is to place request for another node to kill
itself. It is expected that sbd running on another node will respond to
this request by committing suicide.
</pre><br></div></div></blockquote><div><br></div><div>it looks to me that, as expected, sbd is integrated with corosync and<i> by doing<br>`pcs</i><i> cluster stop node-1` I stopped also sbd on node-1, so node-1 did not respond to the fence request from node-2.<br><br></i></div><div><i>Now, back to question 6: with sbd running on node-1 and node-1 being part of the cluster<br>[root@node-2 ~]# stonith_admin -F node-1<br></i></div><div><i>results in a reboot of node-1 instead of shutdown.<br><br></i></div><div><i>/var/log/messages after the last command show "reboot" on node-2<br>... <br>Jun 25 22:36:46 localhost stonith-ng[3102]: notice: Client crmd.3106.b61d09b8 wants to fence (reboot) 'node-1' with device '(any)'<br>Jun 25 22:36:46 localhost stonith-ng[3102]: notice: Initiating remote operation reboot for node-1: f29ba740-4929-4755-a3f5-3aca9ff3c3ff (0)<br>Jun 25 22:36:46 localhost stonith-ng[3102]: notice: MyStonith can fence (reboot) node-1: dynamic-list<br>Jun 25 22:36:46 localhost stonith-ng[3102]: notice: watchdog can not fence (reboot) node-1: static-list<br>Jun 25 22:36:46 localhost stonith-ng[3102]: notice: MyStonith can fence (reboot) node-1: dynamic-list<br>Jun 25 22:36:46 localhost stonith-ng[3102]: notice: watchdog can not fence (reboot) node-1: static-list<br>Jun 25 22:36:59 localhost stonith-ng[3102]: notice: Operation 'off' [10653] (call 2 from stonith_admin.10640) for host 'node-1' with device 'MyStonith' returned: 0 (OK)<br>Jun 25 22:36:59 localhost stonith-ng[3102]: notice: Operation off of node-1 by node-2 for stonith_admin.10640@node-2.05923fc7: OK<br>Jun 25 22:37:00 localhost stonith-ng[3102]: notice: Operation 'reboot' [10693] (call 4 from crmd.3106) for host 'node-1' with device 'MyStonith' returned: 0 (OK)<br>Jun 25 22:37:00 localhost stonith-ng[3102]: notice: Operation reboot of node-1 by node-2 for crmd.3106@node-2.f29ba740: OK<br>...<br><br></i></div><div><i>This may seems strange, but when sbd is not running on node-1 I'm consistently<br></i></div><div><i>getting "(off)" instread of "(reboot)" in node-2:/var/log/messages after issuing:<br></i><i>[root@node-2 ~]# stonith_admin -F node-1</i></div><div><i>and in this case there is of course no response from node-1 to the fencing request.<br></i></div><div><i><br></i></div><div><i>Cheers,<br></i></div><div><i><br></i></div><div><i>Marcin<br></i></div><div><i><br></i></div><div> </div></div></div></div></div>