[ClusterLabs] SuSE12SP3 HAE SBD Communication Issue

Klaus Wenninger kwenning at redhat.com
Mon Feb 11 10:51:36 UTC 2019


On 02/11/2019 09:49 AM, Fulong Wang wrote:
> Thanks Yan,
>
> You gave me more valuable hints on the SBD operation!
> Now, i can see the verbose output after service restart.
>
>
> >Be aware since pacemaker integration (-P) is enabled by default, which 
> >means despite the sbd failure, if the node itself was clean and 
> >"healthy" from pacemaker's point of view and if it's in the cluster 
> >partition with the quorum, it wouldn't self-fence -- meaning a node just 
> >being unable to fence doesn't necessarily need to be fenced.
>
> >As described in sbd man page, "this allows sbd to survive temporary 
> >outages of the majority of devices. However, while the cluster is in 
> >such a degraded state, it can neither successfully fence nor be shutdown 
> >cleanly (as taking the cluster below the quorum threshold will 
> >immediately cause all remaining nodes to self-fence). In short, it will 
> >not tolerate any further faults.  Please repair the system before 
> >continuing."
>
> Yes, I can see the "pacemaker integration" was enabled in my sbd
> config file by default.
> So, you mean in some sbd failure cases, if the node was considered as
> "healthy" from pacemaker's poinit of view, it still wouldn't sel-fence.   
>
> Honestly speaking, i didn't get you at this point. I have
> "no-quorum-policy=ignore" setting in my setup and it's a two node
> cluster. 
> Can you show me a sample situation for this?

When using sbd with 2-node-clusters and pacemaker-integration you might
check
https://github.com/ClusterLabs/sbd/commit/4bd0a66da3ac9c9afaeb8a2468cdd3ed51ad3377
to be included in your sbd-version.
This is relevant when 2-node is configured in corosync.

Regards,
Klaus

>
> Many Thanks!!!
>
>
>
>
> Reagards
> Fulong
>
>
>
> ------------------------------------------------------------------------
> *From:* Gao,Yan <ygao at suse.com>
> *Sent:* Thursday, January 3, 2019 20:43
> *To:* Fulong Wang; Cluster Labs - All topics related to open-source
> clustering welcomed
> *Subject:* Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue
>  
> On 12/24/18 7:10 AM, Fulong Wang wrote:
> > Yan, klaus and Everyone,
> >
> >
> >   Merry Christmas!!!
> >
> >
> >
> > Many thanks for your advice!
> > I added the "-v" param in "SBD_OPTS", but didn't see any apparent
> change
> > in the system message log,  am i looking at a wrong place?
> Did you restart all cluster services, for example by "crm cluster stop"
> and then "crm cluster start"? Basically sbd.service needs to be
> restarted. Be aware "systemctl restart pacemaker" only restarts pacemaker.
>
> SBD daemons log into syslog. When a sbd watcher receives a "test"
> command, there should be a syslog like this showing up:
>
> "servant: Received command test from ..."
>
> sbd won't actually do anything about a "test" command but logging a
> message.
>
> If you are not running a late version of sbd (maintenance update) yet, a
> single "-v" will make sbd too verbose already. But of course you could
> use grep.
>
> >
> > By the way, we want to test when the disk access paths (multipath
> > devices) lost, the sbd can fence the node automatically.
> Be aware since pacemaker integration (-P) is enabled by default, which
> means despite the sbd failure, if the node itself was clean and
> "healthy" from pacemaker's point of view and if it's in the cluster
> partition with the quorum, it wouldn't self-fence -- meaning a node just
> being unable to fence doesn't necessarily need to be fenced.
>
> As described in sbd man page, "this allows sbd to survive temporary
> outages of the majority of devices. However, while the cluster is in
> such a degraded state, it can neither successfully fence nor be shutdown
> cleanly (as taking the cluster below the quorum threshold will
> immediately cause all remaining nodes to self-fence). In short, it will
> not tolerate any further faults.  Please repair the system before
> continuing."
>
> Regards,
>    Yan
>
>
> > what's your recommendation for this scenario?
> >
> >
> >
> >
> >
> >
> >
> > The "crm node fence"  did the work.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Regards
> > Fulong
> >
> > ------------------------------------------------------------------------
> > *From:* Gao,Yan <ygao at suse.com>
> > *Sent:* Friday, December 21, 2018 20:43
> > *To:* kwenning at redhat.com; Cluster Labs - All topics related to
> > open-source clustering welcomed; Fulong Wang
> > *Subject:* Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue
> > First thanks for your reply, Klaus!
> >
> > On 2018/12/21 10:09, Klaus Wenninger wrote:
> >> On 12/21/2018 08:15 AM, Fulong Wang wrote:
> >>> Hello Experts,
> >>>
> >>> I'm New to this mail lists.
> >>> Pls kindlyforgive me if this mail has disturb you!
> >>>
> >>> Our Company recently is evaluating the usage of the SuSE HAE on x86
> >>> platform.
> >>> Wen simulating the storage disaster fail-over, i finally found that
> >>> the SBD communication functioned normal on SuSE11 SP4 but abnormal on
> >>> SuSE12 SP3.
> >>
> >> I have no experience with SBD on SLES but I know that handling of the
> >> logging verbosity-levels has changed recently in the upstream-repo.
> >> Given that it was done by Yan Gao iirc I'd assume it went into SLES.
> >> So changing the verbosity of the sbd-daemon might get you back
> >> these logs.
> > Yes, I think it's the issue. Could you please retrieve the latest
> > maintenance update for SLE12SP3 and try? Otherwise of course you could
> > temporarily enable verbose/debug logging by adding a couple of "-v" into
> >    "SBD_OPTS" in /etc/sysconfig/sbd.
> >
> > But frankly, it makes more sense to manually trigger fencing for example
> > by "crm node fence" and see if it indeed works correctly.
> >
> >> And of course you can use the list command on the other node
> >> to verify as well.
> > The "test" message in the slot might get overwritten soon by a "clear"
> > if the sbd daemon is running.
> >
> > Regards,
> >     Yan
> >
> >
> >>
> >> Klaus
> >>
> >>> The SBD device was added during the initialization of the first
> >>> cluster node.
> >>>
> >>> I have requested help from SuSE guys, but they didn't give me any
> >>> valuable feedback yet now!
> >>>
> >>>
> >>> Below are some screenshots to explain what i have encountered.
> >>>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>>
> >>> on a SuSE11 SP4 HAE cluster,  i  run the sbd test command as below:
> >>>
> >>>
> >>> then there will be some information showed up in the local system
> >>> message log
> >>>
> >>>
> >>>
> >>> on the second node,  we can found that the communication is normal by
> >>>
> >>>
> >>>
> >>> but when i turn to a SuSE12 SP3 HAE cluster,  ran the same command as
> >>> above:
> >>>
> >>>
> >>>
> >>> I didn't get any  response in the system message log.
> >>>
> >>>
> >>> "systemctl status sbd" also doesn't give me any clue on this.
> >>>
> >>>
> >>>
> >>>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>>
> >>> What could be the reason for this abnormal behavior?  Is there any
> >>> problems with my setup?
> >>> Any suggestions are appreciate!
> >>>
> >>> Thanks!
> >>>
> >>>
> >>> Regards
> >>> FuLong
> >>>
> >>>
> >>> _______________________________________________
> >>> Users mailing list:Users at clusterlabs.org
> >>> https://lists.clusterlabs.org/mailman/listinfo/users
> >>>
> >>> Project Home:http://www.clusterlabs.org
> >>> Getting
> started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs:http://bugs.clusterlabs.org
> >>
> >>
> >>
> >> _______________________________________________
> >> Users mailing list: Users at clusterlabs.org
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> >> Bugs: http://bugs.clusterlabs.org
> >>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190211/0b7e38a7/attachment-0001.html>


More information about the Users mailing list