[Pacemaker] node1 fencing itself after node2 being fenced

Wed Feb 5 11:38:57 UTC 2014

Hi All,

First of all, thanks for the brilliant documentation at clusterlabs and 
the alteeva.ca tutorials! They helped me out a lot.

I am relatively new to pacemaker but come from a Solaris background with 
cluster experience, I am now trying to get on board with pacemaker

I have setup a 2 node cluster with a shared lun using pacemaker, cman, 
dlm, clvmd and gfs. I have configured 2 stonith devices each to fence 
either node.

The issue I have is that when i test an unclean shutdown of the 2nd 
node, pacemaker goes ahead and fences the second node, but clvmd then 
goes in to a failed state on node 1 and then it fences itself (shuts 
down node 1).

I suspect it has something to do with me setting the on-fail=fence for 
the dlm/clvmd services/RA's. DLM appears to be fine, but clvmd is the 
one that goes in to a failed state. I suspect I have an issue with 
timeouts here, but, being new to pacemaker I cannot see where, I am 
hoping a new pair of eyes can see where I am going wrong here.

I am running, CentOS 6.5 in vmware, using the fence_vmware_soap stonith 
agents. Pacemaker is at version 1.1.10-14, CMAN is at version 3.0.12.1-59.

I used the following tutorial to assist me in setting up dlm/clmvd/gfs2 
on CentOS 6.5 (if it helps in the debugging)

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/7-Beta/html/Global_File_System_2/ch-clustsetup-GFS2.html 

Any assistance, tips, tricks, comments, criticisms are all welcome

I have attached my cluster.conf if required, some node name obfuscation 
has been done. If you need any additional info, please dont hesitate to 
ask.

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140205/f6fdfeb2/attachment-0003.html>
-------------- next part --------------
<cluster config_version="12" name="sftp-cluster">
  <fence_daemon/>
  <clusternodes>
    <clusternode name="test01" nodeid="1">
      <altname name="test01-alt"/>
      <fence>
        <method name="pcmk-redirect">
          <device delay="15" name="pcmk" port="test01"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="test02" nodeid="2">
      <altname name="test02-alt"/>
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="test02"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <cman keyfile="/etc/corosync/authkey" port="5405" transport="udpu"/>
  <fencedevices>
    <fencedevice agent="fence_pcmk" name="pcmk"/>
  </fencedevices>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
  <totem rrp_mode="active"/>
</cluster>