[ClusterLabs] Colocation constraint for grouping all master-mode stateful resources with important stateless resources

Fri Mar 23 17:42:41 UTC 2018

Thanks, Ken.

I just want all master-mode resources to be running wherever DRBDFS is running (essentially). If the cluster detects that any of the master-mode resources can't run on the current node (but can run on the other per ethmon), all other master-mode resources as well as DRBDFS should move over to the other node.

The current set of constraints I have will let DRBDFS move to the standby node and "take" the Master mode resources with it, but the Master mode resources failing over to the other node won't take the other Master resources or DRBDFS.

As a side note, there are other resources I have in play (some active/passive like DRBDFS, some Master/Slave like the ship resources) that are related, but not shown here - I'm just having a hard time reasoning about the generalized form that my constraints should take to make this sort of thing work.
-- 
Sam Gardner
Trustwave | SMART SECURITY ON DEMAND

On 3/23/18, 12:34 PM, "Users on behalf of Ken Gaillot" <users-bounces at clusterlabs.org on behalf of kgaillot at redhat.com> wrote:

>On Tue, 2018-03-20 at 16:34 +0000, Sam Gardner wrote:
>> Hi All -
>> 
>> I've implemented a simple two-node cluster with DRBD and a couple of
>> network-based Master/Slave resources.
>> 
>> Using the ethmonitor RA, I set up failover whenever the
>> Master/Primary node loses link on the specified ethernet physical
>> device by constraining the Master role only on nodes where the ethmon
>> variable is "1".
>> 
>> Something is going wrong with my colocation constraint, however - if
>> I set up the DRBDFS resource to monitor link on eth1, unplugging eth1
>> on the Primary node causes a failover as expected - all Master
>> resources are demoted to "slave" and promoted on the opposite node,
>> and the "normal" DRBDFS moves to the other node as expected.
>> 
>> However, if I put the same ethmonitor constraint on the network-based 
>> Master/Slave resource, only that specific resource fails over -
>> DRBDFS stays in the same location (though it stops) as do the other
>> Master/Slave resources.
>> 
>> This *smells* like a constraints issue to me - does anyone know what
>> I might be doing wrong?
>>
>> PCS before:
>> Cluster name: http://scanmail.trustwave.com/?c=4062&d=o7q12l4ebrGiubIsdGUsPkgY15QLdpI4OU2Ogkq_xg&s=5&u=http%3a%2f%2fnode1%2ehostname%2ecom%5fnode2%2ehostname%2ecom
>> Stack: corosync
>> Current DC: node2.hostname.com_0 (version 1.1.16-12.el7_4.4-94ff4df)
>> - partition with quorum
>> Last updated: Tue Mar 20 16:25:47 2018
>> Last change: Tue Mar 20 16:00:33 2018 by hacluster via crmd on
>> node2.hostname.com_0
>> 
>> 2 nodes configured
>> 11 resources configured
>> 
>> Online: [ node1.hostname.com_0 node2.hostname.com_0 ]
>> 
>> Full list of resources:
>> 
>>  Master/Slave Set: drbd.master [drbd.slave]
>>      Masters: [ node1.hostname.com_0 ]
>>      Slaves: [ node2.hostname.com_0 ]
>>  drbdfs (ocf::heartbeat:Filesystem):    Started node1.hostname.com_0
>>  Master/Slave Set: inside-interface-sameip.master [inside-interface-
>> sameip.slave]
>>      Masters: [ node1.hostname.com_0 ]
>>      Slaves: [ node2.hostname.com_0 ]
>>  Master/Slave Set: outside-interface-sameip.master [outside-
>> interface-sameip.slave]
>>      Masters: [ node1.hostname.com_0 ]
>>      Slaves: [ node2.hostname.com_0 ]
>>  Clone Set: monitor-eth1-clone [monitor-eth1]
>>      Started: [ node1.hostname.com_0 node2.hostname.com_0 ]
>>  Clone Set: monitor-eth2-clone [monitor-eth2]
>>      Started: [ node1.hostname.com_0 node2.hostname.com_0 ]
>
>What agent are the two IP resources using? I'm not familiar with any IP
>resource agents that are master/slave clones.
>
>> Daemon Status:
>>   corosync: active/enabled
>>   pacemaker: active/enabled
>>   pcsd: inactive/disabled
>> 
>> PCS after:
>> Cluster name: http://scanmail.trustwave.com/?c=4062&d=o7q12l4ebrGiubIsdGUsPkgY15QLdpI4OU2Ogkq_xg&s=5&u=http%3a%2f%2fnode1%2ehostname%2ecom%5fnode2%2ehostname%2ecom
>> Stack: corosync
>> Current DC: node2.hostname.com_0 (version 1.1.16-12.el7_4.4-94ff4df)
>> - partition with quorum
>> Last updated: Tue Mar 20 16:29:40 2018
>> Last change: Tue Mar 20 16:00:33 2018 by hacluster via crmd on
>> node2.hostname.com_0
>> 
>> 2 nodes configured
>> 11 resources configured
>> 
>> Online: [ node1.hostname.com_0 node2.hostname.com_0 ]
>> 
>> Full list of resources:
>> 
>>  Master/Slave Set: drbd.master [drbd.slave]
>>      Masters: [ node1.hostname.com_0 ]
>>      Slaves: [ node2.hostname.com_0 ]
>>  drbdfs (ocf::heartbeat:Filesystem):    Stopped
>>  Master/Slave Set: inside-interface-sameip.master [inside-interface-
>> sameip.slave]
>>      Masters: [ node2.hostname.com_0 ]
>>      Stopped: [ node1.hostname.com_0 ]
>>  Master/Slave Set: outside-interface-sameip.master [outside-
>> interface-sameip.slave]
>>      Masters: [ node1.hostname.com_0 ]
>>      Slaves: [ node2.hostname.com_0 ]
>>  Clone Set: monitor-eth1-clone [monitor-eth1]
>>      Started: [ node1.hostname.com_0 node2.hostname.com_0 ]
>>  Clone Set: monitor-eth2-clone [monitor-eth2]
>>      Started: [ node1.hostname.com_0 node2.hostname.com_0 ]
>> 
>> Daemon Status:
>>   corosync: active/enabled
>>   pacemaker: active/enabled
>>   pcsd: inactive/disabled
>> 
>> This is the "constraints" section of my CIB (full CIB is attached):
>>       <rsc_colocation
>> id="pcs_rsc_colocation_set_drbdfs_set_drbd.master_inside-interface-
>> sameip.master_outside-interface-sameip.master" score="INFINITY">
>>         <resource_set id="pcs_rsc_set_drbdfs" sequential="false">
>>           <resource_ref id="drbdfs"/>
>>         </resource_set>
>>         <resource_set id="pcs_rsc_set_drbd.master_inside-interface-
>> sameip.master_outside-interface-sameip.master" role="Master"
>> sequential="false">
>>           <resource_ref id="drbd.master"/>
>>           <resource_ref id="inside-interface-sameip.master"/>
>>           <resource_ref id="outside-interface-sameip.master"/>
>>         </resource_set>
>>       </rsc_colocation>
>
>Resource sets can be confusing in the best of cases.
>
>The above constraint says: Place drbdfs only on a node where the master
>instances of drbd.master and the two IPs are running (without any
>dependencies between those resources).
>
>This explains why the master instances can run on different nodes, and
>why drbdfs was stopped when they did.
>
>>       <rsc_order id="pcs_rsc_order_set_drbd.master_inside-interface-
>> sameip.master_outside-interface-sameip.master_set_drbdfs"
>> kind="Serialize" symmetrical="false">
>>         <resource_set action="promote"
>> id="pcs_rsc_set_drbd.master_inside-interface-sameip.master_outside-
>> interface-sameip.master-1" role="Master">
>>           <resource_ref id="drbd.master"/>
>>           <resource_ref id="inside-interface-sameip.master"/>
>>           <resource_ref id="outside-interface-sameip.master"/>
>>         </resource_set>
>>         <resource_set id="pcs_rsc_set_drbdfs-1">
>>           <resource_ref id="drbdfs"/>
>>         </resource_set>
>>       </rsc_order>
>
>The above constraint says: if promoting any of drbd.master and the two
>interfaces and/or starting drbdfs, do each action one at a time (in any
>order). Other actions (including demoting and stopping) can happen in
>any order.
>
>>       <rsc_location id="location-inside-interface-sameip.master"
>> rsc="inside-interface-sameip.master">
>>         <rule id="location-inside-interface-sameip.master-rule"
>> score="-INFINITY">
>>           <expression attribute="ethmon_result-eth1" id="location-
>> inside-interface-sameip.master-rule-expr" operation="ne" value="1"/>
>>         </rule>
>>       </rsc_location>
>>       <rsc_location id="location-outside-interface-sameip.master"
>> rsc="outside-interface-sameip.master">
>>         <rule id="location-outside-interface-sameip.master-rule"
>> score="-INFINITY">
>>           <expression attribute="ethmon_result-eth2" id="location-
>> outside-interface-sameip.master-rule-expr" operation="ne" value="1"/>
>>         </rule>
>>       </rsc_location>
>
>The above constraints keep inside-interface on a node where eth1 is
>good, and outside-interface on a node where eth2 is good.
>
>I'm guessing you want to keep these two constraints, and start over
>from scratch on the others. What are your intended relationships
>between the various resources?
>
>>     </constraints>
>> -- 
>> Sam Gardner  
>> Trustwave | SMART SECURITY ON DEMAND
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://scanmail.trustwave.com/?c=4062&d=o7q12l4ebrGiubIsdGUsPkgY15QLdpI4OU_Y0Uq_mw&s=5&u=https%3a%2f%2flists%2eclusterlabs%2eorg%2fmailman%2flistinfo%2fusers
>> 
>> Project Home: http://scanmail.trustwave.com/?c=4062&d=o7q12l4ebrGiubIsdGUsPkgY15QLdpI4ORjc0UPvxw&s=5&u=http%3a%2f%2fwww%2eclusterlabs%2eorg
>> Getting started: http://scanmail.trustwave.com/?c=4062&d=o7q12l4ebrGiubIsdGUsPkgY15QLdpI4OU-N00O4wA&s=5&u=http%3a%2f%2fwww%2eclusterlabs%2eorg%2fdoc%2fCluster%5ffrom%5fScratch
>> pdf
>> Bugs: http://scanmail.trustwave.com/?c=4062&d=o7q12l4ebrGiubIsdGUsPkgY15QLdpI4ORnZ1kvrxg&s=5&u=http%3a%2f%2fbugs%2eclusterlabs%2eorg
>-- 
>Ken Gaillot <kgaillot at redhat.com>
>_______________________________________________
>Users mailing list: Users at clusterlabs.org
>https://scanmail.trustwave.com/?c=4062&d=o7q12l4ebrGiubIsdGUsPkgY15QLdpI4OU_Y0Uq_mw&s=5&u=https%3a%2f%2flists%2eclusterlabs%2eorg%2fmailman%2flistinfo%2fusers
>
>Project Home: http://scanmail.trustwave.com/?c=4062&d=o7q12l4ebrGiubIsdGUsPkgY15QLdpI4ORjc0UPvxw&s=5&u=http%3a%2f%2fwww%2eclusterlabs%2eorg
>Getting started: http://scanmail.trustwave.com/?c=4062&d=o7q12l4ebrGiubIsdGUsPkgY15QLdpI4ORuJ2B_skg&s=5&u=http%3a%2f%2fwww%2eclusterlabs%2eorg%2fdoc%2fCluster%5ffrom%5fScratch%2epdf
>Bugs: http://scanmail.trustwave.com/?c=4062&d=o7q12l4ebrGiubIsdGUsPkgY15QLdpI4ORnZ1kvrxg&s=5&u=http%3a%2f%2fbugs%2eclusterlabs%2eorg