[Pacemaker] Avoid one node from being a target for resources migration

Mon Jan 12 21:22:02 EST 2015

> On 13 Jan 2015, at 7:56 am, Dmitry Koterov <dmitry.koterov at gmail.com> wrote:
> 
> 1. install the resource related packages on node3 even though you never want
> them to run there. This will allow the resource-agents to verify the resource
> is in fact inactive.
> 
> Thanks, your advise helped: I installed all the services at node3 as well (including DRBD, but without it configs) and stopped+disabled them. Then I added the following line to my configuration:
> 
> location loc_drbd drbd rule -inf: #uname eq node3
> 
> So node3 is never a target for DRBD, and this helped: "crm nodr standby node1" doesn't tries to use node3 anymore.
> 
> But I have another (related) issue. If some node (e.g. node1) becomes isolated from other 2 nodes, how to force it to shutdown its services? I cannot use IPMB-based fencing/stonith, because there are no reliable connections between nodes at all (the nodes are in geo-distributed datacenters), and IPMI call to shutdown a node from another node is impossible.
> 
> E.g. initially I have the following:
> 
> # crm status
> Online: [ node1 node2 node3 ]
> Master/Slave Set: ms_drbd [drbd]
>      Masters: [ node2 ]
>      Slaves: [ node1 ]
> Resource Group: server
>      fs (ocf::heartbeat:Filesystem):    Started node2
>      postgresql (lsb:postgresql):       Started node2
>      bind9      (lsb:bind9):    Started node2
>      nginx      (lsb:nginx):    Started node2
> 
> Then I turn on firewall on node2 to isolate it from the outside internet:
> 
> root at node2:~# iptables -A INPUT -p tcp --dport 22 -j ACCEPT
> root at node2:~# iptables -A OUTPUT -p tcp --sport 22 -j ACCEPT
> root at node2:~# iptables -A INPUT -i lo -j ACCEPT
> root at node2:~# iptables -A OUTPUT -o lo -j ACCEPT
> root at node2:~# iptables -P INPUT DROP; iptables -P OUTPUT DROP
> 
> Then I see that, although node2 clearly knows it's isolated (it doesn't see other 2 nodes and does not have quorum)

we don't know that - there are several algorithms for calculating quorum and the information isn't included in your output.
are you using cman, or corosync underneath pacemaker? corosync version? pacemaker version? have you set no-quorum-policy?

> , it does not stop its services:
> 
> root at node2:~# crm status
> Online: [ node2 ]
> OFFLINE: [ node1 node3 ]
> Master/Slave Set: ms_drbd [drbd]
>      Masters: [ node2 ]
>      Stopped: [ node1 node3 ]
> Resource Group: server
>      fs	(ocf::heartbeat:Filesystem):	Started node2
>      postgresql	(lsb:postgresql):	Started node2
>      bind9	(lsb:bind9):	Started node2
>      nginx	(lsb:nginx):	Started node2
> 
> So is there a way to say pacemaker to shutdown nodes' services when they become isolated?
> 
> 
> 
> On Mon, Jan 12, 2015 at 8:25 PM, David Vossel <dvossel at redhat.com> wrote:
> 
> 
> ----- Original Message -----
> > Hello.
> >
> > I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2 are
> > DRBD master-slave, also they have a number of other services installed
> > (postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no
> > DRBD/postgresql/... are installed at it, only corosync+pacemaker.
> >
> > But when I add resources to the cluster, a part of them are somehow moved to
> > node3 and since then fail. Note than I have a "colocation" directive to
> > place these resources to the DRBD master only and "location" with -inf for
> > node3, but this does not help - why? How to make pacemaker not run anything
> > at node3?
> >
> > All the resources are added in a single transaction: "cat config.txt | crm -w
> > -f- configure" where config.txt contains directives and "commit" statement
> > at the end.
> >
> > Below are "crm status" (error messages) and "crm configure show" outputs.
> >
> >
> > root at node3:~# crm status
> > Current DC: node2 (1017525950) - partition with quorum
> > 3 Nodes configured
> > 6 Resources configured
> > Online: [ node1 node2 node3 ]
> > Master/Slave Set: ms_drbd [drbd]
> > Masters: [ node1 ]
> > Slaves: [ node2 ]
> > Resource Group: server
> > fs (ocf::heartbeat:Filesystem): Started node1
> > postgresql (lsb:postgresql): Started node3 FAILED
> > bind9 (lsb:bind9): Started node3 FAILED
> > nginx (lsb:nginx): Started node3 (unmanaged) FAILED
> > Failed actions:
> > drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,
> > last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not
> > installed
> > postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,
> > last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown
> > error
> > bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,
> > last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown
> > error
> > nginx_stop_0 (node=node3, call=767, rc=5, status=complete, last-rc-change=Mon
> > Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed
> 
> Here's what is going on. Even when you say "never run this resource on node3"
> pacemaker is going to probe for the resource regardless on node3 just to verify
> the resource isn't running.
> 
> The failures you are seeing "monitor_0 failed" indicate that pacemaker failed
> to be able to verify resources are running on node3 because the related
> packages for the resources are not installed. Given pacemaker's default
> behavior I'd expect this.
> 
> You have two options.
> 
> 1. install the resource related packages on node3 even though you never want
> them to run there. This will allow the resource-agents to verify the resource
> is in fact inactive.
> 
> 2. If you are using the current master branch of pacemaker, there's a new
> location constraint option called 'resource-discovery=always|never|exclusive'.
> If you add the 'resource-discovery=never' option to your location constraint
> that attempts to keep resources from node3, you'll avoid having pacemaker
> perform the 'monitor_0' actions on node3 as well.
> 
> -- Vossel
> 
> >
> > root at node3:~# crm configure show | cat
> > node $id="1017525950" node2
> > node $id="13071578" node3
> > node $id="1760315215" node1
> > primitive drbd ocf:linbit:drbd \
> > params drbd_resource="vlv" \
> > op start interval="0" timeout="240" \
> > op stop interval="0" timeout="120"
> > primitive fs ocf:heartbeat:Filesystem \
> > params device="/dev/drbd0" directory="/var/lib/vlv.drbd/root"
> > options="noatime,nodiratime" fstype="xfs" \
> > op start interval="0" timeout="300" \
> > op stop interval="0" timeout="300"
> > primitive postgresql lsb:postgresql \
> > op monitor interval="10" timeout="60" \
> > op start interval="0" timeout="60" \
> > op stop interval="0" timeout="60"
> > primitive bind9 lsb:bind9 \
> > op monitor interval="10" timeout="60" \
> > op start interval="0" timeout="60" \
> > op stop interval="0" timeout="60"
> > primitive nginx lsb:nginx \
> > op monitor interval="10" timeout="60" \
> > op start interval="0" timeout="60" \
> > op stop interval="0" timeout="60"
> > group server fs postgresql bind9 nginx
> > ms ms_drbd drbd meta master-max="1" master-node-max="1" clone-max="2"
> > clone-node-max="1" notify="true"
> > location loc_server server rule $id="loc_server-rule" -inf: #uname eq node3
> > colocation col_server inf: server ms_drbd:Master
> > order ord_server inf: ms_drbd:promote server:start
> > property $id="cib-bootstrap-options" \
> > stonith-enabled="false" \
> > last-lrm-refresh="1421079189" \
> > maintenance-mode="false"
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org