[Pacemaker] Upgrading to Pacemaker 1.1.7. Issue: sticky resources failing back after reboot
Parshvi
parshvi.17 at gmail.com
Mon Sep 10 04:06:51 EDT 2012
David Vossel <dvossel at ...> writes:
> > Hi,
> > We have upgraded pacemaker version 1.0.12 to 1.1.7
> > The upgrade was done since resources failed to recover after a
> > timeout
> > (monitor|stop[unmanaged]) and logs observed are:
> >
> > WARN: print_graph: Synapse 6 is pending (priority: 0)
> > Sep 03 16:55:18 CSS-FU-2 crmd: [25200]: WARN: print_elem: [Action
> > 103]: Pending
> > (id: SnmpAgent_monitor_5000, loc: CSS-FU-2, priority: 0)
> > Sep 03 16:55:18 CSS-FU-2 crmd: [25200]: WARN: print_elem: * [Input
> > 102]: Pending
> > (id: SnmpAgent_start_0, loc: CSS-FU-2, priority: 0)
> > Sep 03 16:55:18 CSS-FU-2 crmd: [25200]: WARN: print_graph: Synapse 7
> > is pending
> > (priority: 0)
> >
> > Reading through the forum mails, it was inferred that this issue is
> > fixed in
> > 1.1.7
> >
> > Platform OS: OEL 5.8
> > Pacemaker Version: 1.1.7
> > Corosync version: 1.4.3
> >
> > Pacemaker and all its dependent packages were built from source
> > (tarball:github).
> > glib version used for build: 2.32.2
> >
> > The following issue is observed in Pacemaker 1.1.7:
> > 1) There is a two-node cluster.
> > 2) When primary node is rebooted/or pacemaker is restarted, the
> > resources fail-
> > over to secondary.
> > 3) There are 4 group of services.
> > 2 group are not sticky.
> > 1 group is master/slave multi-state resource
> > 1 group is STICKY
> > 4) When primary node comes online, even the sticky resources fail
> > back to
> > primary node (Issue)
> > 5) Now, if the secondary node is rebooted, the resources fail over to
> > primary
> > node.
> > 6) Once the secondary node is up, only non-sticky resources
> > fail-back. Sticky
> > resources remain on primary node.
> >
> > 7) Even if Location preference of sticky resources is set for
> > Node-2(the
> > secondary node), still sticky resources fail-back on Node-1.
> >
> > We're using pacemaker 1.0.12 on Production. We're facing issues of
> > IPaddr and
> > other resources monitor operation timing out and pacemaker not
> > recovering from
> > it (shared above).
> >
> > Any help is welcome.
> >
> > PS: Please mention, if any logs or configuration needs to be shared.
>
> My guess is that this is an issue with node scores for the resources in
question. Stickiness and location
> constraints work in a similar way. You could really think of resource
stickiness as a temporary location
> constraint on a resource that changes depending on what node it is on.
>
> If you have a resource with stickiness enabled and you want the resource to
stay put, the stickiness score
> has to out weigh all the location constraints for that resource on other
nodes. If you are using colocation
> constraints, this becomes increasingly complicated as a resources per node
location score could change
> based on the location of another resource.
>
> For specific advice on your scenario, there is little we can offer without
seeing your exact configuration.
>
Hi David,
Thanks for a quick response.
I have shared the configuration on the following path:
https://dl.dropbox.com/u/20096935/cib.txt
The issue has been observed for the following group:
1) Rsc_Ms1
2) Rsc_S
3) Rsc_T
4) Rsc_TGroupClusterIP
Colocation: Resources 1) 2) and 3) have been colocated with resource 4)
Location preference: Resource 4) prefers a one of the nodes in the cluster
Ordering: Resources 1) 2) and 3) would be started (no sequential ordering
between these resources) when rsc 4) is started.
Thanks,
Parshvi
More information about the Pacemaker
mailing list