[Pacemaker] Avoid one node from being a target for resources migration

Dmitry Koterov dmitry.koterov at gmail.com
Wed Jan 14 13:43:20 UTC 2015


Sorry!

Pacemaker 1.1.10
Corosync 2.3.30

BTW I removed quorum.two_node:1 from corosync.conf, and it helped! Now
isolated node stops its services in 3-node cluster. Was it the right
solution?

On Wednesday, January 14, 2015, Andrew Beekhof <andrew at beekhof.net> wrote:

>
> > On 14 Jan 2015, at 12:06 am, Dmitry Koterov <dmitry.koterov at gmail.com
> <javascript:;>> wrote:
> >
> >
> > > Then I see that, although node2 clearly knows it's isolated (it
> doesn't see other 2 nodes and does not have quorum)
> >
> > we don't know that - there are several algorithms for calculating quorum
> and the information isn't included in your output.
> > are you using cman, or corosync underneath pacemaker? corosync version?
> pacemaker version? have you set no-quorum-policy?
> >
> > no-quorum-policy is not set, so, according to
> http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-cluster-options.html
> , it is "stop - stop all resources in the affected cluster parition". I
> suppose this is the right option, but why the resources are not stopped on
> the node when this one node of three becomes isolated and the node clearly
> sees other nodes as offline (so it knows it's isolated)? What should I
> configure in addition?
> >
> > I'm using corosync+pacemaker, no cman. Below (in quotes) is output of
> "crm configure show". Versions are from Ubuntu 14.04, so almost new.
>
> I don't have Ubuntu installed.  You'll have to be more specific as to what
> package versions you have.
>
> >
> >
> > > , it does not stop its services:
> > >
> > > root at node2:~# crm status
> > > Online: [ node2 ]
> > > OFFLINE: [ node1 node3 ]
> > > Master/Slave Set: ms_drbd [drbd]
> > >      Masters: [ node2 ]
> > >      Stopped: [ node1 node3 ]
> > > Resource Group: server
> > >      fs       (ocf::heartbeat:Filesystem):    Started node2
> > >      postgresql       (lsb:postgresql):       Started node2
> > >      bind9    (lsb:bind9):    Started node2
> > >      nginx    (lsb:nginx):    Started node2
> > >
> > > So is there a way to say pacemaker to shutdown nodes' services when
> they become isolated?
> > >
> > >
> > >
> > > On Mon, Jan 12, 2015 at 8:25 PM, David Vossel <dvossel at redhat.com
> <javascript:;>> wrote:
> > >
> > >
> > > ----- Original Message -----
> > > > Hello.
> > > >
> > > > I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and
> Node2 are
> > > > DRBD master-slave, also they have a number of other services
> installed
> > > > (postgresql, nginx, ...). Node3 is just a corosync node (for
> quorum), no
> > > > DRBD/postgresql/... are installed at it, only corosync+pacemaker.
> > > >
> > > > But when I add resources to the cluster, a part of them are somehow
> moved to
> > > > node3 and since then fail. Note than I have a "colocation" directive
> to
> > > > place these resources to the DRBD master only and "location" with
> -inf for
> > > > node3, but this does not help - why? How to make pacemaker not run
> anything
> > > > at node3?
> > > >
> > > > All the resources are added in a single transaction: "cat config.txt
> | crm -w
> > > > -f- configure" where config.txt contains directives and "commit"
> statement
> > > > at the end.
> > > >
> > > > Below are "crm status" (error messages) and "crm configure show"
> outputs.
> > > >
> > > >
> > > > root at node3:~# crm status
> > > > Current DC: node2 (1017525950) - partition with quorum
> > > > 3 Nodes configured
> > > > 6 Resources configured
> > > > Online: [ node1 node2 node3 ]
> > > > Master/Slave Set: ms_drbd [drbd]
> > > > Masters: [ node1 ]
> > > > Slaves: [ node2 ]
> > > > Resource Group: server
> > > > fs (ocf::heartbeat:Filesystem): Started node1
> > > > postgresql (lsb:postgresql): Started node3 FAILED
> > > > bind9 (lsb:bind9): Started node3 FAILED
> > > > nginx (lsb:nginx): Started node3 (unmanaged) FAILED
> > > > Failed actions:
> > > > drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,
> > > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not
> > > > installed
> > > > postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,
> > > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms):
> unknown
> > > > error
> > > > bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,
> > > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms):
> unknown
> > > > error
> > > > nginx_stop_0 (node=node3, call=767, rc=5, status=complete,
> last-rc-change=Mon
> > > > Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed
> > >
> > > Here's what is going on. Even when you say "never run this resource on
> node3"
> > > pacemaker is going to probe for the resource regardless on node3 just
> to verify
> > > the resource isn't running.
> > >
> > > The failures you are seeing "monitor_0 failed" indicate that pacemaker
> failed
> > > to be able to verify resources are running on node3 because the related
> > > packages for the resources are not installed. Given pacemaker's default
> > > behavior I'd expect this.
> > >
> > > You have two options.
> > >
> > > 1. install the resource related packages on node3 even though you
> never want
> > > them to run there. This will allow the resource-agents to verify the
> resource
> > > is in fact inactive.
> > >
> > > 2. If you are using the current master branch of pacemaker, there's a
> new
> > > location constraint option called
> 'resource-discovery=always|never|exclusive'.
> > > If you add the 'resource-discovery=never' option to your location
> constraint
> > > that attempts to keep resources from node3, you'll avoid having
> pacemaker
> > > perform the 'monitor_0' actions on node3 as well.
> > >
> > > -- Vossel
> > >
> > > >
> > > > root at node3:~# crm configure show | cat
> > > > node $id="1017525950" node2
> > > > node $id="13071578" node3
> > > > node $id="1760315215" node1
> > > > primitive drbd ocf:linbit:drbd \
> > > > params drbd_resource="vlv" \
> > > > op start interval="0" timeout="240" \
> > > > op stop interval="0" timeout="120"
> > > > primitive fs ocf:heartbeat:Filesystem \
> > > > params device="/dev/drbd0" directory="/var/lib/vlv.drbd/root"
> > > > options="noatime,nodiratime" fstype="xfs" \
> > > > op start interval="0" timeout="300" \
> > > > op stop interval="0" timeout="300"
> > > > primitive postgresql lsb:postgresql \
> > > > op monitor interval="10" timeout="60" \
> > > > op start interval="0" timeout="60" \
> > > > op stop interval="0" timeout="60"
> > > > primitive bind9 lsb:bind9 \
> > > > op monitor interval="10" timeout="60" \
> > > > op start interval="0" timeout="60" \
> > > > op stop interval="0" timeout="60"
> > > > primitive nginx lsb:nginx \
> > > > op monitor interval="10" timeout="60" \
> > > > op start interval="0" timeout="60" \
> > > > op stop interval="0" timeout="60"
> > > > group server fs postgresql bind9 nginx
> > > > ms ms_drbd drbd meta master-max="1" master-node-max="1" clone-max="2"
> > > > clone-node-max="1" notify="true"
> > > > location loc_server server rule $id="loc_server-rule" -inf: #uname
> eq node3
> > > > colocation col_server inf: server ms_drbd:Master
> > > > order ord_server inf: ms_drbd:promote server:start
> > > > property $id="cib-bootstrap-options" \
> > > > stonith-enabled="false" \
> > > > last-lrm-refresh="1421079189" \
> > > > maintenance-mode="false"
> > > >
> > > > _______________________________________________
> > > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org <javascript:;>
> > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > >
> > > > Project Home: http://www.clusterlabs.org
> > > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > > Bugs: http://bugs.clusterlabs.org
> > > >
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org <javascript:;>
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org <javascript:;>
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org <javascript:;>
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org <javascript:;>
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org <javascript:;>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20150114/02a5efb9/attachment.htm>


More information about the Pacemaker mailing list