[Pacemaker] node1 fencing itself after node2 being fenced

Wed Feb 5 16:12:05 UTC 2014

You say it's working now? If so, excellent. If you have any troubles 
though, please share your cluster.conf and 'pcs config show'.

Cheers!

On 05/02/14 10:53 AM, Asgaroth wrote:
> On 05/02/2014 13:44, Nikita Staroverov wrote:
>> Your setup is completely wrong, sorry. You must use RHEL6
>> documentation not RHEL7.
>> in short, you should create cman cluster according to RHEL6 docs, but
>> use pacemaker instead of rgmanager and fence_pcmk as fence agent for cman.
>
> Thanks, for the info, however, I am already currently using cman for
> cluster management and pacemaker as the resource manager, this is how I
> created the cluster and it appears to be working ok, please let me know
> if this is not the correct method for CentOS/RHEL 6.5
>
> ---
> ccs -f /etc/cluster/cluster.conf --createcluster sftp-cluster
> ccs -f /etc/cluster/cluster.conf --addnode test01
> ccs -f /etc/cluster/cluster.conf --addalt test01 test01-alt
> ccs -f /etc/cluster/cluster.conf --addnode test02
> ccs -f /etc/cluster/cluster.conf --addalt test02 test02-alt
> ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
> ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect test01
> ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect test02
> ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk test01
> pcmk-redirect port=test01
> ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk test02
> pcmk-redirect port=test02
> ccs -f /etc/cluster/cluster.conf --setcman
> keyfile="/etc/corosync/authkey" transport="udpu" port="5405"
> ccs -f /etc/cluster/cluster.conf --settotem rrp_mode="active"
> sed -i.bak "s/.*CMAN_QUORUM_TIMEOUT=.*/CMAN_QUORUM_TIMEOUT=0/g"
> /etc/sysconfig/cman
>
> pcs stonith create fence_test01 fence_vmware_soap login="user"
> passwd="password" action="reboot" ipaddr="vcenter_host" port="TEST01"
> ssl="1" pcmk_host_list="test01" delay="15"
> pcs stonith create fence_test02 fence_vmware_soap login="user"
> passwd="password" action="reboot" ipaddr="vcenter_host" port="TEST02"
> ssl="1" pcmk_host_list="test02"
>
> pcs property set no-quorum-policy="ignore"
> pcs property set stonith-enabled="true"
> ---
>
> The above is taken directly from the pacemaker RHEL 6 2 node cluster
> quick start quide (except for the fence agent definitions).
>
> At this point the cluster comes up and cman_tool sees the two hosts as
> joined and cluster is communicating over the two rings defined. I
> couldnt find the equivilent "pcs" syntax to perform the above
> configuration, looking at the man page of pcs I couldnt track down how
> to, for example, set the security key file using pcs syntax.
>
> The DLM/CLVMD/GFS2 configuration was taken from the RHEL7 documentation
> as it illustrated how to set it up using pcs syntax, the configuration
> commands appear to work fine and the services appear to be configured
> correctly as pacemaker starts services properly, the cluster appears to
> work properly if enable/disable the services using pcs sytax, and, if i
> manually stop/start the pacemaker service, or perform a clean
> shutdown/restart of the second node. The issue comes in when I test a
> crash of the second node, which is where I find the particular issue
> with fencing.
>
> Reading some archives of this mailing list there seem to be suggestions
> that dlm may be waiting on pacemaker to fence a node, which then cause a
> temporary "freeze" of the clvmd/gfs2 configuration, I underatand this is
> by design. However, when I test the 2nd node hand by doing a "echo c >
> /proc/sysrq-trigger", then i can see that stonithd begins fencing
> procedures around node2, att his point according to crm_mon the dlm
> service is stopped on node2 and started on node1, clvmd then goes in to
> a failed state, I presume, because of a possible timeout (I could be
> wrong), or, potentially, because it cannot communicate with clvmd on
> node2. When clvmd goes in to a failed state, this is when stonithd
> attempts to fence node1, and it does it successfully by shutting it down.
>
> Some archive messages seem to suggest that clvmd should be started
> outside of the cluster at system boot (cman -> clvmd -> pacemaker),
> however, my personal preference would be to have these services managed
> by the cluster infrastructure, which is why I am attempting to set it up
> in this manner.
>
> Is there anyone else out there that may be running a similar
> configuration dlm/clvmd/[gfs/gfs2/ocfs] under pacemaker control?
>
> Again, thanks for the info, I will do some more reading to ensure that I
> am using the correct syntax for pcs to configure these services.
>
> Thanks
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?