[Pacemaker] [DRBD-user] DRBD active/passive on Pacemaker+CMAN cluster unexpectedly performs STONITH when promoting

Giuseppe Ragusa giuseppe.ragusa at hotmail.com
Fri Jul 4 16:04:12 UTC 2014


> > The setup "almost" works (all seems ok with: "pcs status", "crm_mon
> > -Arf1", "corosync-cfgtool -s", "corosync-objctl | grep member") , but
> > every time it needs a resource promotion (to Master, i.e. becoming
> > primary) it either fails or fences the other node (the one supposed to
> > become Slave i.e. secondary) and only then succeeds.
> >
> > It happens, for example both on initial resource definition (when
> > attempting first start) and on node entering standby (when trying to
> > automatically move the resources by stopping then starting them).
> > 
> > I collected a full "pcs cluster report" and I can provide a CIB dump,
> > but I will initially paste here an excerpt from my configuration just
> > in case it happens to be a simple configuration error that someone can
> > spot on the fly ;> (hoping...)
> > 
> > Keep in mind that the setup has separated redundant network
> > connections for LAN (1 Gib/s LACP to switches), Corosync (1 Gib/s
> > roundrobin back-to-back) and DRBD (10 Gib/s roundrobin back-to-back)
> > and that FQDNs are correctly resolved through /etc/hosts
> 
> Make sure youre DRBD are "Connected UpToDate/UpToDate"
> before you let the cluster take over control of who is master.

Thanks for your important reminder.

Actually they had been "Connected UpToDate/UpToDate", and I subsequently had all manually demoted to secondary
then down-ed before eventually stopping the (manually started) DRBD service.

Only at the end did I start/configure the cluster.

The problem is now resolved and it seems that my improper use of rhcs_fence as fence-peer was the culprit (now switched to crm-fence-peer.sh), but I still do not understand why rhcs_fence was called at all in the beginning (once called, it may have caused unforeseen consequences, I admit) since DRBD docs clearly state that communication disruption must be involved in order to call fence-peer into action.

Many thanks again.

Regards,
Giuseppe

 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140704/579cf099/attachment.htm>


More information about the Pacemaker mailing list