[Pacemaker] Pull the plug on one node, the resource doesn't promote the other node to master

David Coulson david at davidcoulson.net
Fri Jul 20 21:25:00 EDT 2012


Use ping to set an attribute, then add a location.

primitive re-ping-core ocf:pacemaker:ping \
         meta failure-timeout="60" \
         params name="at-ping-core" host_list="10.250.52.1" 
multiplier="100" attempts="5" \
         op monitor interval="20" timeout="15" \
         op start interval="0" timeout="5" \
         op stop interval="0" timeout="5"
location lo-named-ping cl-named \
         rule $id="cl-named-ping-rule" -inf: not_defined at-ping-core or 
at-ping-core lte 0


On 7/20/12 7:45 PM, Matteo Bignotti wrote:
> Thank you man,
>
> I realized what it was, We're running pacemaker held up by a third 
> item (r-series) which through a usb connection and a resource agent 
> controls the cluster. When I unplug the network cable he doesn't 
> understand that it's disconnected because the usb is still up if I 
> unplug both it works.
>
> Now, let me try and ask...
>
> in my pacemaker I have this configuration
>
> --------------------------------------------------------------------------
>
> *primitive GatewayStatus ocf:pacemaker:ping \
>     params host_list="10.10.0.1" multiplier="100" \
>     op monitor interval="5" timeout="10"*
> primitive ResourceCustom ocf:Company:resourcecustom \
>     op monitor interval="10" timeout="20" \
>     op stop interval="0" timeout="15" on-fail="standby" \
>     meta migration-threshold="2" failure-timeout="30s" 
> target-role="Started"
> primitive rseries ocf:Company:rseries \
>     params tty="/dev/rseries0" \
>     op monitor interval="10" role="Master" timeout="30s" \
>     op monitor interval="60" role="Slave" timeout="30s"
> ms rseries_ms rseries \
>     meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" target-role="Master"
> clone GatewayStatusClone GatewayStatus
>
> colocation rsc-like-master inf: ResourceCustom rseries_ms:Master
> order rsc-after-rseries inf: rseries_ms:promote ResourceCustom:start
>
> -------------------------------------------------------------------------------------------------
>
> ============
> Last updated: Fri Jul 20 16:43:50 2012
> Last change: Fri Jul 20 16:11:15 2012 via cibadmin on cluster_01
> Stack: openais
> Current DC: cluster_01 - partition with quorum
> Version: 1.1.6.1-3.el5-0c7312c689715e096b716419e2ebc12b57962052
> 2 Nodes configured, 2 expected votes
> 5 Resources configured.
> ============
>
> Online: [ cluster_01 cluster_02 ]
>
>  ResourceCustom    (ocf::Company:resourcecustom):    Started cluster_01
>  Master/Slave Set: rseries_ms [rseries]
>      Masters: [ cluster_01 ]
>      Slaves: [ cluster_02 ]
>  Clone Set: GatewayStatusClone [GatewayStatus]
>      Started: [ cluster_01 cluster_02 ]
>
>
> ------------------------------------------------------------------------------------------
>
>
> Now since I have the GatewayStatus, can I configure the cluster 
> telling it that IF the network fails I want ResourceCustom, 
> rseries:Master and Gateway to be together? I ask because when I tried 
> putting GatewayStatusClone in the colocation... havoc happened... and 
> nothing worked. did I do something wrong and it should be the right 
> configuration?
>
> thanks
>
> On 07/20/2012 02:14 PM, David Vossel wrote:
>> ----- Original Message -----
>>> From: "Matteo Bignotti"<mbignotti at switchvox.com>
>>> To:pacemaker at oss.clusterlabs.org
>>> Sent: Friday, July 20, 2012 4:01:26 PM
>>> Subject: [Pacemaker] Pull the plug on one node, the resource doesn't promote the other node to master
>>>
>>> Hi guys,
>>>
>>> I am having trouble understanding why this happens:
>>>
>>> I have this cluster with 2 nodes, when I put one node in standby or
>>> crash the resource, it correctly promotes on the second machine as
>>> master.
>>> But if I UNPLUG the first machine from the network, the resource
>>> won't
>>> promote the other one to master.
>>>
>>> I am using corosync/pacemaker
>>>
>>> why is that?
>> Are you using stonith?  If node1 disappears, pacemaker has to have some sort of guarantee from a fencing device that it is in fact gone before the promotion happens.
>>
>> This situation (unplugging node) differs from standby and the rsc crash because pacemaker still knows the state of node1 with after the standby/rsc crash.  If the state of node1 can't be verified (unplugged from network), it needs to be fenced before the cluster can continue.
>>
>> If you are still having trouble, upload a crm_report and we'll have a look.
>>
>> -- Vossel
>>
>>> thanks
>>>
>>> _______________________________________________
>>> Pacemaker mailing list:Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home:http://www.clusterlabs.org
>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs:http://bugs.clusterlabs.org
>>>
>> _______________________________________________
>> Pacemaker mailing list:Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home:http://www.clusterlabs.org
>> Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:http://bugs.clusterlabs.org
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120720/9d0fc05b/attachment-0003.html>


More information about the Pacemaker mailing list