[ClusterLabs] Antw: [EXT] Recoveing from node failure

Fri Dec 11 05:53:05 EST 2020

That's what I suspect:

sonicle at xstorage1:/sonicle/home$ pfexec crm_mon -1Arfj
Stack: corosync
Current DC: xstha1 (version 1.1.15-e174ec8) - partition WITHOUT quorum
Last updated: Fri Dec 11 11:49:50 2020          Last change: Fri Dec 11 11:00:38 2020 by hacluster via cibadmin on xstha1
2 nodes and 5 resources configured
Online: [ xstha1 ]
OFFLINE: [ xstha2 ]
Full list of resources:
 xstha1_san0_IP (ocf::heartbeat:IPaddr):        Stopped
 xstha2_san0_IP (ocf::heartbeat:IPaddr):        Stopped
 xstha1-stonith (stonith:external/ipmi):        Stopped
 xstha2-stonith (stonith:external/ipmi):        Stopped
 zpool_data     (ocf::heartbeat:ZFS):   Stopped
Node Attributes:
* Node xstha1:
Migration Summary:
* Node xstha1:

Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets

----------------------------------------------------------------------------------

Da: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
A: users at clusterlabs.org 
Data: 11 dicembre 2020 11.35.44 CET
Oggetto: [ClusterLabs] Antw: [EXT] Recoveing from node failure

Hi!

Did you take care for special "two node" settings (quorum I mean)?
When I use "crm_mon -1Arfj", I see something like
" * Current DC: h19 (version 2.0.4+20200616.2deceaa3a-3.3.1-2.0.4+20200616.2deceaa3a) - partition with quorum"

What do you see?

Regards,
Ulrich

>>> Gabriele Bulfon <gbulfon at sonicle.com> schrieb am 11.12.2020 um 11:23 in
Nachricht <350849824.6300.1607682209284 at www>:
> Hi, I finally could manage stonith with IPMI in my 2 nodes XStreamOS/illumos 
> storage cluster.
> I have NFS IPs and shared storage zpool moving from one node or the other, 
> and stonith controllin ipmi powering off when something is not clear.
> 
> What happens now is that if I shutdown 2nd node, I see the OFFLINE status 
> from node 1 and everything is up and running, and this is ok:
> 
> Online: [ xstha1 ]
> OFFLINE: [ xstha2 ]
> Full list of resources:
> xstha1_san0_IP (ocf::heartbeat:IPaddr): Started xstha1
> xstha2_san0_IP (ocf::heartbeat:IPaddr): Started xstha1
> xstha1-stonith (stonith:external/ipmi): Started xstha1
> xstha2-stonith (stonith:external/ipmi): Started xstha1
> zpool_data (ocf::heartbeat:ZFS): Started xstha1
> But if also reboot 1st node, it starts with the UNCLEAN state, nothing is 
> running, so I clearstate of node 2, but resources are not started:
> 
> Online: [ xstha1 ]
> OFFLINE: [ xstha2 ]
> Full list of resources:
> xstha1_san0_IP (ocf::heartbeat:IPaddr): Stopped
> xstha2_san0_IP (ocf::heartbeat:IPaddr): Stopped
> xstha1-stonith (stonith:external/ipmi): Stopped
> xstha2-stonith (stonith:external/ipmi): Stopped
> zpool_data (ocf::heartbeat:ZFS): Stopped
> I tried restarting zpool_data or other resources:
> # crm resource start zpool_data
> but nothing happens!
> How can I recover from this state? Node2 needs to stay down, but I want 
> node1 to work.
> Thanks!
> Gabriele 
> 
> 
> Sonicle S.r.l. : http://www.sonicle.com 
> Music: http://www.gabrielebulfon.com 
> eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets 
> 

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20201211/815bcbe6/attachment-0001.htm>