[ClusterLabs] Recoveing from node failure
Gabriele Bulfon
gbulfon at sonicle.com
Fri Dec 11 05:23:29 EST 2020
Hi, I finally could manage stonith with IPMI in my 2 nodes XStreamOS/illumos storage cluster.
I have NFS IPs and shared storage zpool moving from one node or the other, and stonith controllin ipmi powering off when something is not clear.
What happens now is that if I shutdown 2nd node, I see the OFFLINE status from node 1 and everything is up and running, and this is ok:
Online: [ xstha1 ]
OFFLINE: [ xstha2 ]
Full list of resources:
xstha1_san0_IP (ocf::heartbeat:IPaddr): Started xstha1
xstha2_san0_IP (ocf::heartbeat:IPaddr): Started xstha1
xstha1-stonith (stonith:external/ipmi): Started xstha1
xstha2-stonith (stonith:external/ipmi): Started xstha1
zpool_data (ocf::heartbeat:ZFS): Started xstha1
But if also reboot 1st node, it starts with the UNCLEAN state, nothing is running, so I clearstate of node 2, but resources are not started:
Online: [ xstha1 ]
OFFLINE: [ xstha2 ]
Full list of resources:
xstha1_san0_IP (ocf::heartbeat:IPaddr): Stopped
xstha2_san0_IP (ocf::heartbeat:IPaddr): Stopped
xstha1-stonith (stonith:external/ipmi): Stopped
xstha2-stonith (stonith:external/ipmi): Stopped
zpool_data (ocf::heartbeat:ZFS): Stopped
I tried restarting zpool_data or other resources:
# crm resource start zpool_data
but nothing happens!
How can I recover from this state? Node2 needs to stay down, but I want node1 to work.
Thanks!
Gabriele
Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20201211/5d9a1ebc/attachment.htm>
More information about the Users
mailing list