[ClusterLabs] Resource-stickiness is not working

Wed Jun 6 01:36:11 UTC 2018

On Wed, 2018-06-06 at 07:47 +0800, Confidential Company wrote:
> On Sat, 2018-06-02 at 22:14 +0800, Confidential Company wrote:
> > On Fri, 2018-06-01 at 22:58 +0800, Confidential Company wrote:
> > > Hi,
> > >?
> > > I have two-node active/passive setup. My goal is to failover a
> > > resource once a Node goes down with minimal downtime as possible.
> > > Based on my testing, when Node1 goes down it failover to Node2.
> If
> > > Node1 goes up after link reconnection (reconnect physical cable),
> > > resource failback to Node1 even though I configured resource-
> > > stickiness. Is there something wrong with configuration below?
> > >?
> > > #service firewalld stop
> > > #vi /etc/hosts --> 192.168.10.121 (Node1) / 192.168.10.122
> (Node2)
> > --
> > > ----------- Private Network (Direct connect)
> > > #systemctl start pcsd.service
> > > #systemctl enable pcsd.service
> > > #passwd hacluster --> define pw
> > > #pcs cluster auth Node1 Node2
> > > #pcs setup --name Cluster Node1 Node2
> > > #pcs cluster start -all
> > > #pcs property set stonith-enabled=false
> > > #pcs resource create ClusterIP ocf:heartbeat:IPaddr2
> > > ip=192.168.10.123 cidr_netmask=32 op monitor interval=30s
> > > #pcs resource defaults resource-stickiness=100
> > >?
> > > Regards,
> > > imnotarobot
> > 
> > Your configuration is correct, but keep in mind scores of all kinds
> > will be added together to determine where the final placement is.
> > 
> > In this case, I'd check that you don't have any constraints with a
> > higher score preferring the other node. For example, if you
> > previously?
> > did a "move" or "ban" from the command line, that adds a constraint
> > that has to be removed manually if you no longer want it.
> > --?
> > Ken Gaillot <kgaillot at redhat.com>
> > 
> > 
> > >>>>>>>>>>
> > I'm confused. constraint from what I think means there's a
> preferred
> > node. But if I want my resources not to have a preferred node is
> that
> > possible?
> > 
> > Regards,
> > imnotarobot
> 
> Yes, that's one type of constraint -- but you may not have realized
> you
> added one if you ran something like "pcs resource move", which is a
> way
> of saying there's a preferred node.
> 
> There are a variety of other constraints. For example, as you add
> more
> resources, you might say that resource A can't run on the same node
> as
> resource B, and if that constraint's score is higher than the
> stickiness, A might move if B starts on its node.
> 
> To see your existing constraints using pcs, run "pcs constraint
> show".
> If there are any you don't want, you can remove them with various pcs
> commands.
> -- 
> Ken Gaillot <kgaillot at redhat.com>
> 
> 
> >>>>>>>>>>
> Correct me if I'm wrong. So resource-stickiness policy can not be
> used alone. A constraint configuration should be setup in order to
> make it work but will also be dependent on the level of scores that
> was setup between the two. Can you suggest what type of constraint
> configuration should i set to achieve the simple goal above?

Not quite -- stickiness can be used alone. However, scores from all
sources are combined and compared when placing resources, so anything
else in the configuration that generates a score (like constraints)
will have an effect, if present.

Looking at your test scenario again, I see the problem is the lack of
stonith, and has nothing to do with stickiness.

When you pull the cable, neither node can see the other. The isolated
node is still running the IP address, even though it can't do anything
with it. The failover node thinks it is the only node remaining, and
brings up the IP address there as well. This is a split-brain
situation.

When you reconnect the cable, the nodes can see each other again, and
*both are already running the IP*. The cluster detects this, and stops
the IP on both nodes, and brings it up again on one node. Since the IP
is not running at that point, stickiness doesn't come into play.

If stonith were configured, one of the two nodes would kill the other,
so only one would be running the IP at any time. If the dead node came
back up and rejoined, it would not be running the IP, and stickiness
would keep the IP where it was.

Which node kills the other is a bit tricky in a two-node situation. If
you're interested mainly in IP availability, you can use
fence_heuristic_ping to keep a node with a nonfunctioning network from
killing the other. Another possibility is to use qdevice on a third
node as a tie-breaker.

In any case, stonith is how to avoid a split-brain situation.

> 
> Regards,
> imnotarobot
> 
Ken Gaillot <kgaillot at redhat.com>