[Pacemaker] OCFS2 fencing regulated by Pacemaker?

Dejan Muhamedagic dejanmm at fastmail.fm
Thu Feb 11 12:22:25 EST 2010


Hi,

On Thu, Feb 11, 2010 at 03:35:53PM -0000, Darren.Mansell at opengi.co.uk wrote:
> Once again, I apologise for the top-posting. I wish I could use a real
> mail client but nothing apart from Outlook works properly with Exchange
> :(.
> 
> Anyway - Yes We've had a really hard time with our 3-node SAN based
> cluster. We implemented OCFS2 on top of a shared disk using a o2cb and
> dlm clones. It seemed to work in the test environment but then when live
> it's been a real nightmare. It seems if you even breathe on it it will
> start a shootout, but as it's now a production system I can't do much
> about it.
> 
> Some mornings we arrive in and see that all 3 servers got STONITHd
> overnight but we can't see any reason why. We would disable STONITH to
> see what state the cluster gets in before fencing but the worst that
> happens is we get 10 mins of service unavailability, which is a lot
> better than 12 hours.
> 
> To complicate matters further, the apps we are using on the cluster /
> shared storage are Tomcat based and allegedly don't work too well with
> other file locking mechanisms. This is developer hearsay though, I can't
> substantiate it. The only leads I have are that the dlm seems to lose
> quorum and sets the fencing ops off. The logs never seem to tie up

Not being a native English speaker, I wonder what does it mean
"logs don't tie up".

> though, so it's very difficult to fault find.
> 
> With all this in mind, I haven't been able to file any bugs or make
> support requests to Novell due to not knowing exactly what is causing
> the issue. At the moment, if we leave well alone it performs well. If I
> was to have to reboot a node, I would expect the others get to be fenced
> afterwards.

I'd just prepare everything using hb_report and that other OS
support utility, open a bugzilla titled sth like "ocfs2: node
reboot results in other nodes being fenced", and let the
engineers figure out what's going on. If the information's not
sufficient, it's up to them to figure out why, etc. Please do
file a bugzilla.

Cheers,

Dejan

> Thanks for the help
> Darren
> 
> -----Original Message-----
> From: Dejan Muhamedagic [mailto:dejanmm at fastmail.fm] 
> Sent: 11 February 2010 14:12
> To: pacemaker at oss.clusterlabs.org; mail at sandervanvugt.nl
> Subject: Re: [Pacemaker] OCFS2 fencing regulated by Pacemaker?
> 
> Hi,
> 
> On Thu, Feb 11, 2010 at 01:16:20PM +0100, Sander van Vugt wrote:
> > On Thu, 2010-02-11 at 13:03 +0100, Dejan Muhamedagic wrote:
> > > Hi,
> > > 
> > > On Thu, Feb 11, 2010 at 10:11:33AM -0000,
> Darren.Mansell at opengi.co.uk wrote:
> > > > Hello.
> > > > 
> > > > Yes, we get the same kind of thing. SLES11 HAE 64-bit.
> > > 
> > > Is there a bugzilla for this?
> > > 
> > Nope. Before filing a bug, I'd first like to be as sure as possible
> that
> > it really is a bug and not a problem behind the keyboard. 
> 
> If you have strong doubts, closing a bugzilla is easy :) BTW,
> this was meant for Darren actually, as it seemed like he was
> having really hard time dealing with his cluster.
> 
> > BTW: I don't see where on bugzilla.novell.com I should enter a bug for
> > something that is in the SLES HAE (and the Bugzilla FAQ didn't help
> > me). 
> 
> Use "SUSE Linux Enterprise High Availability Extension" for the
> product line.
> 
> Thanks,
> 
> Dejan
> 
> > Thanks,
> > Sander
> > 
> > 
> > 
> > _______________________________________________
> > Pacemaker mailing list
> > Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker




More information about the Pacemaker mailing list