[Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

Mon Aug 27 17:06:26 EDT 2012

Andrew, 

Snipped off the old stuff as this is getting quite large for those that read via mobile devices ;-) 

To answer your question - no not that I am aware of. Maybe someone else on the list with a better option will speak up. 

However if your really only looking to make sure the node running the resources is able to ping at least one of the IP's but don't actually care if it can't ping both IP's (are they connected via the same network infrastructure?) then I have a solution which is how I utilize ping. I want to make sure the node running resources has connectivity to the client network. I don't care if it can't ping both IP (I use 2) since they are connected via the same infrastructure; only if the node can't ping both IP's do I assume the node is isolated and move resource off of it to the other. I also have the requirement to never move a resource that's running unless absolutely necessary hence I only move if isolated - ymmv for your use case. If what I said seems like it would be good enough for you this is the location statement I would recommend: 

location loc_run_on_most_connected g_vm \ 
rule $id="loc_run_on_connected-rule" -inf: not_defined pingd or pingd lte 0 

This will make sure your resources can't run on a node that doesn't have a pingd value of at least 1 - if a node loses ping to both/all IP's then resources migrate. Since it doesn't use the value of pingd as a score it doesn't give preference between 1000 or 2000 for your case and therefore no movement from loss of one ping destination. 

HTH 

Jake 
----- Original Message -----

From: "Andrew Martin" <amartin at xes-inc.com> 
To: "Jake Smith" <jsmith at argotec.com> 
Cc: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org> 
Sent: Monday, August 27, 2012 4:44:51 PM 
Subject: Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart? 

Hi Jake, 

Thank you for the detailed analysis of this problem. The original reason I was utilizing ocf:pacemaker:ping was to ensure that the node with the best network connectivity (network connectivity being judged by the ability to communicate with 192.168.0.128 and 192.168.0.129) would be the one running the resources. However, it is possible that that either of these IPs could be down for maintenance or a hardware failure, and the cluster should not be affected by this. It seems that a synchronous ping check from all of the nodes would ensure this behavior without this unfortunate side-effect. 

Is there another way to achieve the same network connectivity check instead of using ocf:pacemaker:ping? I know the other *ping* resource agents are deprecated. 

Thanks, 

Andrew 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120827/166857c5/attachment-0003.html>