[Pacemaker] node1 fencing itself after node2 being fenced
Asgaroth
lists at blueface.com
Mon Feb 17 18:52:37 UTC 2014
> -----Original Message-----
> From: Andrew Beekhof [mailto:andrew at beekhof.net]
> Sent: 17 February 2014 00:55
> To: lists at blueface.com; The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] node1 fencing itself after node2 being fenced
>
>
> If you have configured cman to use fence_pcmk, then all cman/dlm/clvmd
> fencing operations are sent to Pacemaker.
> If you aren't running pacemaker, then you have a big problem as no-one can
> perform fencing.
I have configured pacemaker as the resource manager and I have it enabled to
start on boot-up too as follows:
chkconfig cman on
chkconfig clvmd on
chkconfig pacemaker on
>
> I don't know if you are testing without pacemaker running, but if so you
> would need to configure cman with real fencing devices.
>
I have been testing with pacemaker running and the fencing appears to be
operating fine, the issue I seem to have is that clvmd is unable re-acquire
its locks when attempting to rejoin the cluster after a fence operation, so
it looks like clvmd just hangs when the startup script fires it off on
boot-up. When the 3rd node is in this state (hung clvmd), then the other 2
nodes are unable to obtain locks from the third node as clvmd has hung, as
an example, this is what happens when the 3rd node is hung at the clvmd
startup phase after pacemaker has issued a fence operation (running pvs on
node1)
[root at test01 ~]# pvs
Error locking on node test03: Command timed out
Unable to obtain global lock.
The dlm elements look fine to me here too:
[root at test01 ~]# dlm_tool ls
dlm lockspaces
name cdr
id 0xa8054052
flags 0x00000008 fs_reg
change member 2 joined 0 remove 1 failed 1 seq 2,2
members 1 2
name clvmd
id 0x4104eefa
flags 0x00000000
change member 3 joined 1 remove 0 failed 0 seq 3,3
members 1 2 3
So it looks like cman/dlm are operating properly, however, clvmd hangs and
never exits so pacemaker never starts on the 3rd node. So the 3rd node is in
"pending" state while clvmd is hung:
[root at test02 ~]# crm_mon -Afr -1
Last updated: Mon Feb 17 15:52:28 2014
Last change: Mon Feb 17 15:43:16 2014 via cibadmin on test01
Stack: cman
Current DC: test02 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
3 Nodes configured
15 Resources configured
Node test03: pending
Online: [ test01 test02 ]
Full list of resources:
fence_test01 (stonith:fence_vmware_soap): Started test01
fence_test02 (stonith:fence_vmware_soap): Started test02
fence_test03 (stonith:fence_vmware_soap): Started test01
Clone Set: fs_cdr-clone [fs_cdr]
Started: [ test01 test02 ]
Stopped: [ test03 ]
Resource Group: sftp01-vip
vip-001 (ocf::heartbeat:IPaddr2): Started test01
vip-002 (ocf::heartbeat:IPaddr2): Started test01
Resource Group: sftp02-vip
vip-003 (ocf::heartbeat:IPaddr2): Started test02
vip-004 (ocf::heartbeat:IPaddr2): Started test02
Resource Group: sftp03-vip
vip-005 (ocf::heartbeat:IPaddr2): Started test02
vip-006 (ocf::heartbeat:IPaddr2): Started test02
sftp01 (lsb:sftp01): Started test01
sftp02 (lsb:sftp02): Started test02
sftp03 (lsb:sftp03): Started test02
Node Attributes:
* Node test01:
* Node test02:
* Node test03:
Migration summary:
* Node test03:
* Node test02:
* Node test01:
More information about the Pacemaker
mailing list