[Pacemaker] resource stickiness and preventing stonith on failback

Wed Aug 24 11:32:14 EDT 2011

Hello Brian,

On 08/23/2011 10:56 PM, Brian J. Murrell wrote:
> Hi All,
>
> I am trying to configure pacemaker (1.0.10) to make a single filesystem
> highly available by two nodes (please don't be distracted by the dangers
> of multiply mounted filesystems and clustering filesystems, etc., as I
> am absolutely clear about that -- consider that I am using a filesystem
> resource as just an example if you wish).  Here is my filesystem
> resource description:
>
> node foo1
> node foo2 \
> 	attributes standby="off"
> primitive OST1 ocf:heartbeat:Filesystem \
> 	meta target-role="Started" \
> 	operations $id="BAR1-operations" \
> 	op monitor interval="120" timeout="60" \
> 	op start interval="0" timeout="300" \
> 	op stop interval="0" timeout="300" \
> 	params device="/dev/disk/by-uuid/8c500092-5de6-43d7-b59a-ef91fa9667b9"
> directory="/mnt/bar1" fstype="ext3"
> primitive st-pm stonith:external/powerman \
> 	params serverhost="192.168.122.1:10101" poweroff="0"
> clone fencing st-pm
> property $id="cib-bootstrap-options" \
> 	dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
> 	cluster-infrastructure="openais" \
> 	expected-quorum-votes="1" \
> 	no-quorum-policy="ignore" \
> 	last-lrm-refresh="1306783242" \
> 	default-resource-stickiness="1000"
> rsc_defaults $id="rsc-options" \
> 	resource-stickiness="100"
>
> The two problems I have run into are:
>
> 1. preventing the resource from failing back to the node it was
>     previously on after it has failed over and the previous node has
>     been restored.  Basically what's documented at
>
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch05s03s02.html
>
> 2. preventing the active node from being STONITHed when the resource
>     is moved back to it's failed-and-restored node after a failover.
>     IOW: BAR1 is available on foo1, which fails and the resource is moved
>     to foo2.  foo1 returns and the resource is failed back to foo1, but
>     in doing that foo2 is STONITHed.
>
> For #1, as you can see, I tried setting the default resource stickiness
> to 100.  That didn't seem to work.  When I stopped corosync on the
> active node, the service failed over but it promptly failed back when I
> started corosync again, contrary to the example on the referenced URL.
>
> Subsequently I (think I) tried adding the specific resource stickiness
> of 1000.  That didn't seem to help either.

I had the same question some time ago, see here for it and Andrews response:

http://www.gossamer-threads.com/lists/linuxha/pacemaker/59471

So basically you should check the current score and then increase the 
stickiness above that. Though I'm surprised that 1000 does not seem to help.

>
> As for #2, the issue with STONITHing foo2 when failing back to foo1 is
> that foo1 and foo2 are an active/active pair of servers.  STONITHing
> foo2 just to restore foo1's services puts foo2's services out of service,
>
> I do want a node that is believed to be dead to be STONITHed before it's
> resource(s) are failed over though.
>
> Any hints on what I am doing wrong?

Basically a stonith only will happen if the stop action fails, as 
pacemaker then does not know if the resource really stopped. To bring 
back the system into a known state it simply kills the node that fails 
to stop a resource. There also was a stop bug in several pacemaker 
releases, I don't remember any more if it already got fixed in 1.0.10 or 
if I simply back ported the patches (right now I'm not doing anything 
with pacemaker anymore...).
  You will need to check your logs why it does so.
As you might have noticed, logs in pacemaker often also contain quite 
some debug messages (reminds me to Lustre ;) ) and for DDN systems I 
therefore set up rather complex syslog-ng filter rules. One of the last 
things I did for DDN was to send several ha-logd patches upstream, which 
then also got integrated. So log filtering should be more easy now. 
Additionally, the Lustre server RA I then wrote based on a stripped down 
filesystem RA also does lots of more logging what actually fails. So if 
you would use that one (an older version is in lustre-2.0 I think), 
update the type to ext3 and remove the lustre_health check, you should 
get a better idea what is actually going on. Somewhere on my desktop 
system at home I also still should have the syslog-ng rules.

Cheers,
Bernd