[Pacemaker] node1 fencing itself after node2 being fenced

Tue Feb 11 10:26:39 EST 2014

> -----Original Message-----
> From: Vladislav Bogdanov [mailto:bubble at hoster-ok.com]
> Sent: 11 February 2014 03:44
> To: pacemaker at oss.clusterlabs.org
> Subject: Re: [Pacemaker] node1 fencing itself after node2 being fenced
> 
> Nope, it's Centos6. In few words, It is probably safer for you to stay
with
> cman, especially if you need GFS2. gfs_controld is not officially ported
to
> corosync2 and is obsolete in EL7 because communication between
> gfs2 and dlm is moved to kernelspace there.
> 

OK thanks, I may do some searching on how to compile corosync2 on centos 6
for a different cluster I need to setup that does not have the gfs2
requirement, thanks for the info.

> 
> You need to fix that for sure.
> 

I ended up rebuilding all my nodes and adding a third one to see if quorum
may have been the issue, but the symtoms are still the same, I ended up
stracing clvmd and it looks like it tries to write to /dev/misc/dlm_clvmd
which doesn't exist on the "failed" node.
I ended up attaching the trace to an existing bug listed in the CentOS bug
tracker:  http://bugs.centos.org/view.php?id=6853
This looks like something to do with clvmd and its locks, but dlm appears to
be operating fine for me, I don't see any kern_stop flags for clvmd at all
when the node is being fenced. It is a strange one because if I shutdown and
reboot any of the nodes cleanly then everything comes back up ok, however,
when I simulate failure, this is where the issue comes in.

> 
> Strange message, looks like something is bound to that port already.
> You may want to try dlm in tcp mode btw.
> 

I was unable to run dlm in tcp mode as I have dual-homed interfaces, so dlm
won't run in tcp mode in this case :) Thanks for recommendation though