[ClusterLabs] Antw: [EXT] Recoveing from node failure

Gabriele Bulfon gbulfon at sonicle.com
Fri Dec 11 10:37:59 EST 2020


I found I can do this temporarily:
 
crm config property cib-bootstrap-options: no-quorum-policy=ignore
 
then once node 2 is up again:
 
crm config property cib-bootstrap-options: no-quorum-policy=stop
 
so that I make sure nodes will not mount in another strange situation.
 
Is there any better way? (such as ignore until everything is back to normal then conisder top again)
 
Gabriele
 
 
Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
 

 


Da: Gabriele Bulfon <gbulfon at sonicle.com>
A: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Data: 11 dicembre 2020 15.51.28 CET
Oggetto: Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure



 
I cannot "use wait_for_all: 0", cause this would move automatically a powered off node from UNCLEAN to OFFLINE and mount the ZFS pool (total risk!): I want to manually move from UNCLEAN to OFFLINE, when I know that 2nd node is actually off!
 
Actually with wait_for_all to default (1) that was the case, so node1 would wait for my intervention when booting and node2 is down.
So what think I need is some way to manually override the quorum in such a case (node 2 down for maintenance, node 1 reboot), so I would manually turn OFFLINE node2 from UNCLEAN, manually override quorum and have zpool mount and NFS ip up.
 
Any idea?
 
 
Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
 




----------------------------------------------------------------------------------

Da: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
A: users at clusterlabs.org 
Data: 11 dicembre 2020 11.35.44 CET
Oggetto: [ClusterLabs] Antw: [EXT] Recoveing from node failure


Hi!

Did you take care for special "two node" settings (quorum I mean)?
When I use "crm_mon -1Arfj", I see something like
" * Current DC: h19 (version 2.0.4+20200616.2deceaa3a-3.3.1-2.0.4+20200616.2deceaa3a) - partition with quorum"

What do you see?

Regards,
Ulrich

>>> Gabriele Bulfon <gbulfon at sonicle.com> schrieb am 11.12.2020 um 11:23 in
Nachricht <350849824.6300.1607682209284 at www>:
> Hi, I finally could manage stonith with IPMI in my 2 nodes XStreamOS/illumos 
> storage cluster.
> I have NFS IPs and shared storage zpool moving from one node or the other, 
> and stonith controllin ipmi powering off when something is not clear.
> 
> What happens now is that if I shutdown 2nd node, I see the OFFLINE status 
> from node 1 and everything is up and running, and this is ok:
> 
> Online: [ xstha1 ]
> OFFLINE: [ xstha2 ]
> Full list of resources:
> xstha1_san0_IP (ocf::heartbeat:IPaddr): Started xstha1
> xstha2_san0_IP (ocf::heartbeat:IPaddr): Started xstha1
> xstha1-stonith (stonith:external/ipmi): Started xstha1
> xstha2-stonith (stonith:external/ipmi): Started xstha1
> zpool_data (ocf::heartbeat:ZFS): Started xstha1
> But if also reboot 1st node, it starts with the UNCLEAN state, nothing is 
> running, so I clearstate of node 2, but resources are not started:
> 
> Online: [ xstha1 ]
> OFFLINE: [ xstha2 ]
> Full list of resources:
> xstha1_san0_IP (ocf::heartbeat:IPaddr): Stopped
> xstha2_san0_IP (ocf::heartbeat:IPaddr): Stopped
> xstha1-stonith (stonith:external/ipmi): Stopped
> xstha2-stonith (stonith:external/ipmi): Stopped
> zpool_data (ocf::heartbeat:ZFS): Stopped
> I tried restarting zpool_data or other resources:
> # crm resource start zpool_data
> but nothing happens!
> How can I recover from this state? Node2 needs to stay down, but I want 
> node1 to work.
> Thanks!
> Gabriele 
> 
> 
> Sonicle S.r.l. : http://www.sonicle.com 
> Music: http://www.gabrielebulfon.com 
> eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets 
> 




_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________Manage your subscription:https://lists.clusterlabs.org/mailman/listinfo/usersClusterLabs home: https://www.clusterlabs.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20201211/bbebe1bc/attachment-0001.htm>


More information about the Users mailing list