[ClusterLabs] Recoveing from node failure

Fri Dec 11 05:23:29 EST 2020

Hi, I finally could manage stonith with IPMI in my 2 nodes XStreamOS/illumos storage cluster.
I have NFS IPs and shared storage zpool moving from one node or the other, and stonith controllin ipmi powering off when something is not clear.

What happens now is that if I shutdown 2nd node, I see the OFFLINE status from node 1 and everything is up and running, and this is ok:

Online: [ xstha1 ]
OFFLINE: [ xstha2 ]
Full list of resources:
 xstha1_san0_IP      (ocf::heartbeat:IPaddr):        Started xstha1
 xstha2_san0_IP      (ocf::heartbeat:IPaddr):        Started xstha1
 xstha1-stonith      (stonith:external/ipmi):        Started xstha1
 xstha2-stonith      (stonith:external/ipmi):        Started xstha1
 zpool_data  (ocf::heartbeat:ZFS):   Started xstha1
But if also reboot 1st node, it starts with the UNCLEAN state, nothing is running, so I clearstate of node 2, but resources are not started:

Online: [ xstha1 ]
OFFLINE: [ xstha2 ]
Full list of resources:
 xstha1_san0_IP      (ocf::heartbeat:IPaddr):        Stopped
 xstha2_san0_IP      (ocf::heartbeat:IPaddr):        Stopped
 xstha1-stonith      (stonith:external/ipmi):        Stopped
 xstha2-stonith      (stonith:external/ipmi):        Stopped
 zpool_data  (ocf::heartbeat:ZFS):   Stopped
I tried restarting zpool_data or other resources:
# crm resource start zpool_data
but nothing happens!
How can I recover from this state? Node2 needs to stay down, but I want node1 to work.
Thanks!
Gabriele 

Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20201211/5d9a1ebc/attachment.htm>