[ClusterLabs] Non-cloned resource moves before cloned resource startup on unstandby

Tue Sep 11 09:33:19 EDT 2018

On 9/11/2018 9:20 AM, Dan Ragle wrote:
> 
> 
> On 9/11/2018 1:59 AM, Andrei Borzenkov wrote:
>> 07.09.2018 23:07, Dan Ragle пишет:
>>> On an active-active two node cluster with DRBD, dlm, filesystem mounts,
>>> a Web Server, and some crons I can't figure out how to have the crons
>>> jump from node to node in the correct order. Specifically, I have two
>>> crontabs (managed via symlink creation/deletion) which normally will run
>>> one on node1 and the other on node2. When a node goes down, I want both
>>> to run on the remaining node until the original node comes back up, at
>>> which time they should split the nodes again. However, when returning to
>>> the original node the crontab that is being moved must wait until the
>>> underlying FS mount is done on the original node before jumping.
>>>
>>> DRBD, dlm, the filesystem mounts and the Web Server are all working as
>>> expected; when I mark the second node as standby Apache stops, the FS
>>> unmounts, dlm stops, and DRBD stops on the node; and when I mark that
>>> same node unstandby the reverse happens as expected. All three of those
>>> are cloned resources.
>>>
>>> The crontab resources are not cloned and create symlinks, one resource
>>> preferring the first node and the other preferring the second. Each is
>>> colocated and order dependent on the filesystem mounts (which in turn
>>> are colocated and dependent on dlm, which in turn is colocated and
>>> dependent on DRBD promotion). I thought this would be sufficient, but
>>> when the original node is marked unstandby the crontab that prefers to
>>> be on that node attempts to jump over immediately before the FS is
>>> mounted on that node. Of course the crontab link fails because the
>>> underlying filesystem hasn't been mounted yet.
>>>
>>> pcs version is 0.9.162.
>>>
>>> Here's the obfuscated detailed list of commands for the config. I'm
>>> still trying to set it up so it's not production-ready yet, but want to
>>> get this much sorted before I add too much more.
>>>
>>> # pcs config export pcs-commands
>>> #!/usr/bin/sh
>>> # sequence generated on 2018-09-07 15:21:15 with: clufter 0.77.0
>>> # invoked as: ['/usr/sbin/pcs', 'config', 'export', 'pcs-commands']
>>> # targeting system: ('linux', 'centos', '7.5.1804', 'Core')
>>> # using interpreter: CPython 2.7.5
>>> pcs cluster auth node1.mydomain.com node2.mydomain.com <> /dev/tty
>>> pcs cluster setup --name MyCluster \
>>>    node1.mydomain.com node2.mydomain.com --transport udpu
>>> pcs cluster start --all --wait=60
>>> pcs cluster cib tmp-cib.xml
>>> cp tmp-cib.xml tmp-cib.xml.deltasrc
>>> pcs -f tmp-cib.xml property set stonith-enabled=false
>>> pcs -f tmp-cib.xml property set no-quorum-policy=freeze
>>> pcs -f tmp-cib.xml resource defaults resource-stickiness=100
>>> pcs -f tmp-cib.xml resource create DRBD ocf:linbit:drbd drbd_resource=r0 \
>>>    op demote interval=0s timeout=90 monitor interval=60s notify
>>> interval=0s \
>>>    timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \
>>>    start interval=0s timeout=240 stop interval=0s timeout=100
>>> pcs -f tmp-cib.xml resource create dlm ocf:pacemaker:controld \
>>>    allow_stonith_disabled=1 \
>>>    op monitor interval=60s start interval=0s timeout=90 stop interval=0s \
>>>    timeout=100
>>> pcs -f tmp-cib.xml resource create WWWMount ocf:heartbeat:Filesystem \
>>>    device=/dev/drbd1 directory=/var/www fstype=gfs2 \
>>>    options=_netdev,nodiratime,noatime \
>>>    op monitor interval=20 timeout=40 notify interval=0s timeout=60 start \
>>>    interval=0s timeout=120s stop interval=0s timeout=120s
>>> pcs -f tmp-cib.xml resource create WebServer ocf:heartbeat:apache \
>>>    configfile=/etc/httpd/conf/httpd.conf
>>> statusurl=http://localhost/server-status \
>>>    op monitor interval=1min start interval=0s timeout=40s stop interval=0s \
>>>    timeout=60s
>>> pcs -f tmp-cib.xml resource create SharedRootCrons ocf:heartbeat:symlink \
>>>    link=/etc/cron.d/root-shared target=/var/www/crons/root-shared \
>>>    op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
>>>    interval=0s timeout=15
>>> pcs -f tmp-cib.xml resource create SharedUserCrons ocf:heartbeat:symlink \
>>>    link=/etc/cron.d/User-shared target=/var/www/crons/User-shared \
>>>    op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
>>>    interval=0s timeout=15
>>> pcs -f tmp-cib.xml resource create PrimaryUserCrons ocf:heartbeat:symlink \
>>>    link=/etc/cron.d/User-server1 target=/var/www/crons/User-server1 \
>>>    op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
>>>    interval=0s timeout=15 meta resource-stickiness=0
>>> pcs -f tmp-cib.xml \
>>>    resource create SecondaryUserCrons ocf:heartbeat:symlink \
>>>    link=/etc/cron.d/User-server2 target=/var/www/crons/User-server2 \
>>>    op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
>>>    interval=0s timeout=15 meta resource-stickiness=0
>>> pcs -f tmp-cib.xml \
>>>    resource clone dlm clone-max=2 clone-node-max=1 interleave=true
>>> pcs -f tmp-cib.xml resource clone WWWMount interleave=true
>>> pcs -f tmp-cib.xml resource clone WebServer interleave=true
>>> pcs -f tmp-cib.xml resource clone SharedRootCrons interleave=true
>>> pcs -f tmp-cib.xml resource clone SharedUserCrons interleave=true
>>> pcs -f tmp-cib.xml \
>>>    resource master DRBDClone DRBD master-node-max=1 clone-max=2
>>> master-max=2 \
>>>    interleave=true notify=true clone-node-max=1
>>> pcs -f tmp-cib.xml \
>>>    constraint colocation add dlm-clone with DRBDClone \
>>>    id=colocation-dlm-clone-DRBDClone-INFINITY
>>> pcs -f tmp-cib.xml constraint order promote DRBDClone \
>>>    then dlm-clone id=order-DRBDClone-dlm-clone-mandatory
>>> pcs -f tmp-cib.xml \
>>>    constraint colocation add WWWMount-clone with dlm-clone \
>>>    id=colocation-WWWMount-clone-dlm-clone-INFINITY
>>> pcs -f tmp-cib.xml constraint order dlm-clone \
>>>    then WWWMount-clone id=order-dlm-clone-WWWMount-clone-mandatory
>>> pcs -f tmp-cib.xml \
>>>    constraint colocation add WebServer-clone with WWWMount-clone \
>>>    id=colocation-WebServer-clone-WWWMount-clone-INFINITY
>>> pcs -f tmp-cib.xml constraint order WWWMount-clone \
>>>    then WebServer-clone id=order-WWWMount-clone-WebServer-clone-mandatory
>>> pcs -f tmp-cib.xml \
>>>    constraint colocation add SharedRootCrons-clone with WWWMount-clone \
>>>    id=colocation-SharedRootCrons-clone-WWWMount-clone-INFINITY
>>> pcs -f tmp-cib.xml \
>>>    constraint colocation add SharedUserCrons-clone with WWWMount-clone \
>>>    id=colocation-SharedUserCrons-clone-WWWMount-clone-INFINITY
>>> pcs -f tmp-cib.xml constraint order WWWMount-clone \
>>>    then SharedRootCrons-clone \
>>>    id=order-WWWMount-clone-SharedRootCrons-clone-mandatory
>>> pcs -f tmp-cib.xml constraint order WWWMount-clone \
>>>    then SharedUserCrons-clone \
>>>    id=order-WWWMount-clone-SharedUserCrons-clone-mandatory
>>> pcs -f tmp-cib.xml \
>>>    constraint location PrimaryUserCrons prefers node1.mydomain.com=500
>>> pcs -f tmp-cib.xml \
>>>    constraint colocation add PrimaryUserCrons with WWWMount-clone \
>>>    id=colocation-PrimaryUserCrons-WWWMount-clone-INFINITY
>>> pcs -f tmp-cib.xml constraint order WWWMount-clone \
>>>    then PrimaryUserCrons \
>>>    id=order-WWWMount-clone-PrimaryUserCrons-mandatory
>>> pcs -f tmp-cib.xml \
>>>    constraint location SecondaryUserCrons prefers node2.mydomain.com=500
>>
>> I can't answer your question, but just observation - it appears only
>> resources with explicit location preferences misbehave. Is it possible
>> as workaround to not use them?
> 
> I suppose it's not *critical* that PrimaryCrons be on node1 and SecondaryCrons on node2; so long as during normal operation they 
> remain split. I could try something like negative colocation (?) to keep them separate, if nothing else to see if that allows them 
> to bounce back and forth cleanly with regards to their other constraints. I'll give that a shot this morning.
> 

Removed the two location constraints, and instead did:

pcs constraint colocation add PrimaryUserCrons with SecondaryUserCrons -500

Same result.

>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>