[ClusterLabs] Antw: [EXT] Recoveing from node failure
Gabriele Bulfon
gbulfon at sonicle.com
Fri Dec 11 05:53:05 EST 2020
That's what I suspect:
sonicle at xstorage1:/sonicle/home$ pfexec crm_mon -1Arfj
Stack: corosync
Current DC: xstha1 (version 1.1.15-e174ec8) - partition WITHOUT quorum
Last updated: Fri Dec 11 11:49:50 2020 Last change: Fri Dec 11 11:00:38 2020 by hacluster via cibadmin on xstha1
2 nodes and 5 resources configured
Online: [ xstha1 ]
OFFLINE: [ xstha2 ]
Full list of resources:
xstha1_san0_IP (ocf::heartbeat:IPaddr): Stopped
xstha2_san0_IP (ocf::heartbeat:IPaddr): Stopped
xstha1-stonith (stonith:external/ipmi): Stopped
xstha2-stonith (stonith:external/ipmi): Stopped
zpool_data (ocf::heartbeat:ZFS): Stopped
Node Attributes:
* Node xstha1:
Migration Summary:
* Node xstha1:
Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
----------------------------------------------------------------------------------
Da: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
A: users at clusterlabs.org
Data: 11 dicembre 2020 11.35.44 CET
Oggetto: [ClusterLabs] Antw: [EXT] Recoveing from node failure
Hi!
Did you take care for special "two node" settings (quorum I mean)?
When I use "crm_mon -1Arfj", I see something like
" * Current DC: h19 (version 2.0.4+20200616.2deceaa3a-3.3.1-2.0.4+20200616.2deceaa3a) - partition with quorum"
What do you see?
Regards,
Ulrich
>>> Gabriele Bulfon <gbulfon at sonicle.com> schrieb am 11.12.2020 um 11:23 in
Nachricht <350849824.6300.1607682209284 at www>:
> Hi, I finally could manage stonith with IPMI in my 2 nodes XStreamOS/illumos
> storage cluster.
> I have NFS IPs and shared storage zpool moving from one node or the other,
> and stonith controllin ipmi powering off when something is not clear.
>
> What happens now is that if I shutdown 2nd node, I see the OFFLINE status
> from node 1 and everything is up and running, and this is ok:
>
> Online: [ xstha1 ]
> OFFLINE: [ xstha2 ]
> Full list of resources:
> xstha1_san0_IP (ocf::heartbeat:IPaddr): Started xstha1
> xstha2_san0_IP (ocf::heartbeat:IPaddr): Started xstha1
> xstha1-stonith (stonith:external/ipmi): Started xstha1
> xstha2-stonith (stonith:external/ipmi): Started xstha1
> zpool_data (ocf::heartbeat:ZFS): Started xstha1
> But if also reboot 1st node, it starts with the UNCLEAN state, nothing is
> running, so I clearstate of node 2, but resources are not started:
>
> Online: [ xstha1 ]
> OFFLINE: [ xstha2 ]
> Full list of resources:
> xstha1_san0_IP (ocf::heartbeat:IPaddr): Stopped
> xstha2_san0_IP (ocf::heartbeat:IPaddr): Stopped
> xstha1-stonith (stonith:external/ipmi): Stopped
> xstha2-stonith (stonith:external/ipmi): Stopped
> zpool_data (ocf::heartbeat:ZFS): Stopped
> I tried restarting zpool_data or other resources:
> # crm resource start zpool_data
> but nothing happens!
> How can I recover from this state? Node2 needs to stay down, but I want
> node1 to work.
> Thanks!
> Gabriele
>
>
> Sonicle S.r.l. : http://www.sonicle.com
> Music: http://www.gabrielebulfon.com
> eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
>
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20201211/815bcbe6/attachment-0001.htm>
More information about the Users
mailing list