[ClusterLabs] Non-cloned resource moves before cloned resource startup on unstandby

Tue Sep 11 09:20:43 EDT 2018

On 9/11/2018 1:59 AM, Andrei Borzenkov wrote:
> 07.09.2018 23:07, Dan Ragle пишет:
>> On an active-active two node cluster with DRBD, dlm, filesystem mounts,
>> a Web Server, and some crons I can't figure out how to have the crons
>> jump from node to node in the correct order. Specifically, I have two
>> crontabs (managed via symlink creation/deletion) which normally will run
>> one on node1 and the other on node2. When a node goes down, I want both
>> to run on the remaining node until the original node comes back up, at
>> which time they should split the nodes again. However, when returning to
>> the original node the crontab that is being moved must wait until the
>> underlying FS mount is done on the original node before jumping.
>>
>> DRBD, dlm, the filesystem mounts and the Web Server are all working as
>> expected; when I mark the second node as standby Apache stops, the FS
>> unmounts, dlm stops, and DRBD stops on the node; and when I mark that
>> same node unstandby the reverse happens as expected. All three of those
>> are cloned resources.
>>
>> The crontab resources are not cloned and create symlinks, one resource
>> preferring the first node and the other preferring the second. Each is
>> colocated and order dependent on the filesystem mounts (which in turn
>> are colocated and dependent on dlm, which in turn is colocated and
>> dependent on DRBD promotion). I thought this would be sufficient, but
>> when the original node is marked unstandby the crontab that prefers to
>> be on that node attempts to jump over immediately before the FS is
>> mounted on that node. Of course the crontab link fails because the
>> underlying filesystem hasn't been mounted yet.
>>
>> pcs version is 0.9.162.
>>
>> Here's the obfuscated detailed list of commands for the config. I'm
>> still trying to set it up so it's not production-ready yet, but want to
>> get this much sorted before I add too much more.
>>
>> # pcs config export pcs-commands
>> #!/usr/bin/sh
>> # sequence generated on 2018-09-07 15:21:15 with: clufter 0.77.0
>> # invoked as: ['/usr/sbin/pcs', 'config', 'export', 'pcs-commands']
>> # targeting system: ('linux', 'centos', '7.5.1804', 'Core')
>> # using interpreter: CPython 2.7.5
>> pcs cluster auth node1.mydomain.com node2.mydomain.com <> /dev/tty
>> pcs cluster setup --name MyCluster \
>>    node1.mydomain.com node2.mydomain.com --transport udpu
>> pcs cluster start --all --wait=60
>> pcs cluster cib tmp-cib.xml
>> cp tmp-cib.xml tmp-cib.xml.deltasrc
>> pcs -f tmp-cib.xml property set stonith-enabled=false
>> pcs -f tmp-cib.xml property set no-quorum-policy=freeze
>> pcs -f tmp-cib.xml resource defaults resource-stickiness=100
>> pcs -f tmp-cib.xml resource create DRBD ocf:linbit:drbd drbd_resource=r0 \
>>    op demote interval=0s timeout=90 monitor interval=60s notify
>> interval=0s \
>>    timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \
>>    start interval=0s timeout=240 stop interval=0s timeout=100
>> pcs -f tmp-cib.xml resource create dlm ocf:pacemaker:controld \
>>    allow_stonith_disabled=1 \
>>    op monitor interval=60s start interval=0s timeout=90 stop interval=0s \
>>    timeout=100
>> pcs -f tmp-cib.xml resource create WWWMount ocf:heartbeat:Filesystem \
>>    device=/dev/drbd1 directory=/var/www fstype=gfs2 \
>>    options=_netdev,nodiratime,noatime \
>>    op monitor interval=20 timeout=40 notify interval=0s timeout=60 start \
>>    interval=0s timeout=120s stop interval=0s timeout=120s
>> pcs -f tmp-cib.xml resource create WebServer ocf:heartbeat:apache \
>>    configfile=/etc/httpd/conf/httpd.conf
>> statusurl=http://localhost/server-status \
>>    op monitor interval=1min start interval=0s timeout=40s stop interval=0s \
>>    timeout=60s
>> pcs -f tmp-cib.xml resource create SharedRootCrons ocf:heartbeat:symlink \
>>    link=/etc/cron.d/root-shared target=/var/www/crons/root-shared \
>>    op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
>>    interval=0s timeout=15
>> pcs -f tmp-cib.xml resource create SharedUserCrons ocf:heartbeat:symlink \
>>    link=/etc/cron.d/User-shared target=/var/www/crons/User-shared \
>>    op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
>>    interval=0s timeout=15
>> pcs -f tmp-cib.xml resource create PrimaryUserCrons ocf:heartbeat:symlink \
>>    link=/etc/cron.d/User-server1 target=/var/www/crons/User-server1 \
>>    op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
>>    interval=0s timeout=15 meta resource-stickiness=0
>> pcs -f tmp-cib.xml \
>>    resource create SecondaryUserCrons ocf:heartbeat:symlink \
>>    link=/etc/cron.d/User-server2 target=/var/www/crons/User-server2 \
>>    op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
>>    interval=0s timeout=15 meta resource-stickiness=0
>> pcs -f tmp-cib.xml \
>>    resource clone dlm clone-max=2 clone-node-max=1 interleave=true
>> pcs -f tmp-cib.xml resource clone WWWMount interleave=true
>> pcs -f tmp-cib.xml resource clone WebServer interleave=true
>> pcs -f tmp-cib.xml resource clone SharedRootCrons interleave=true
>> pcs -f tmp-cib.xml resource clone SharedUserCrons interleave=true
>> pcs -f tmp-cib.xml \
>>    resource master DRBDClone DRBD master-node-max=1 clone-max=2
>> master-max=2 \
>>    interleave=true notify=true clone-node-max=1
>> pcs -f tmp-cib.xml \
>>    constraint colocation add dlm-clone with DRBDClone \
>>    id=colocation-dlm-clone-DRBDClone-INFINITY
>> pcs -f tmp-cib.xml constraint order promote DRBDClone \
>>    then dlm-clone id=order-DRBDClone-dlm-clone-mandatory
>> pcs -f tmp-cib.xml \
>>    constraint colocation add WWWMount-clone with dlm-clone \
>>    id=colocation-WWWMount-clone-dlm-clone-INFINITY
>> pcs -f tmp-cib.xml constraint order dlm-clone \
>>    then WWWMount-clone id=order-dlm-clone-WWWMount-clone-mandatory
>> pcs -f tmp-cib.xml \
>>    constraint colocation add WebServer-clone with WWWMount-clone \
>>    id=colocation-WebServer-clone-WWWMount-clone-INFINITY
>> pcs -f tmp-cib.xml constraint order WWWMount-clone \
>>    then WebServer-clone id=order-WWWMount-clone-WebServer-clone-mandatory
>> pcs -f tmp-cib.xml \
>>    constraint colocation add SharedRootCrons-clone with WWWMount-clone \
>>    id=colocation-SharedRootCrons-clone-WWWMount-clone-INFINITY
>> pcs -f tmp-cib.xml \
>>    constraint colocation add SharedUserCrons-clone with WWWMount-clone \
>>    id=colocation-SharedUserCrons-clone-WWWMount-clone-INFINITY
>> pcs -f tmp-cib.xml constraint order WWWMount-clone \
>>    then SharedRootCrons-clone \
>>    id=order-WWWMount-clone-SharedRootCrons-clone-mandatory
>> pcs -f tmp-cib.xml constraint order WWWMount-clone \
>>    then SharedUserCrons-clone \
>>    id=order-WWWMount-clone-SharedUserCrons-clone-mandatory
>> pcs -f tmp-cib.xml \
>>    constraint location PrimaryUserCrons prefers node1.mydomain.com=500
>> pcs -f tmp-cib.xml \
>>    constraint colocation add PrimaryUserCrons with WWWMount-clone \
>>    id=colocation-PrimaryUserCrons-WWWMount-clone-INFINITY
>> pcs -f tmp-cib.xml constraint order WWWMount-clone \
>>    then PrimaryUserCrons \
>>    id=order-WWWMount-clone-PrimaryUserCrons-mandatory
>> pcs -f tmp-cib.xml \
>>    constraint location SecondaryUserCrons prefers node2.mydomain.com=500
> 
> I can't answer your question, but just observation - it appears only
> resources with explicit location preferences misbehave. Is it possible
> as workaround to not use them?

I suppose it's not *critical* that PrimaryCrons be on node1 and SecondaryCrons on node2; so long as during normal operation they 
remain split. I could try something like negative colocation (?) to keep them separate, if nothing else to see if that allows them 
to bounce back and forth cleanly with regards to their other constraints. I'll give that a shot this morning.

> 
>> pcs -f tmp-cib.xml \
>>    constraint colocation add SecondaryUserCrons with WWWMount-clone \
>>    id=colocation-SecondaryUserCrons-WWWMount-clone-INFINITY
>> pcs -f tmp-cib.xml constraint order WWWMount-clone \
>>    then SecondaryUserCrons \
>>    id=order-WWWMount-clone-SecondaryUserCrons-mandatory
>> pcs cluster cib-push tmp-cib.xml diff-against=tmp-cib.xml.deltasrc
>>
>> When I standby node2, the SecondaryUserCrons bounces over to node1 as
>> expected. When I unstandby node2, it bounces back to node2 immediately,
>> before WWWMount is performed, and thus it fails. What am I missing? Here
>> are the log messages from the unstandby operation:
>>
>> Sep  7 15:02:28 node2 crmd[58188]:   notice: State transition S_IDLE ->
>> S_POLICY_ENGINE
>> Sep  7 15:02:28 node2 pengine[58187]:   notice:  * Start
>> DRBD:1                 (                        node2.mydomain.com )
>> Sep  7 15:02:28 node2 pengine[58187]:   notice:  * Start
>> dlm:1                  (                        node2.mydomain.com ) due
>> to unrunnable DRBD:1 promote (blocked)
>> Sep  7 15:02:28 node2 pengine[58187]:   notice:  * Start
>> WWWMount:1             (                        node2.mydomain.com ) due
>> to unrunnable dlm:1 start (blocked)
>> Sep  7 15:02:28 node2 pengine[58187]:   notice:  * Start
>> WebServer:1            (                        node2.mydomain.com ) due
>> to unrunnable WWWMount:1 start (blocked)
>> Sep  7 15:02:28 node2 pengine[58187]:   notice:  * Start
>> SharedRootCrons:1      (                        node2.mydomain.com ) due
>> to unrunnable WWWMount:1 start (blocked)
>> Sep  7 15:02:28 node2 pengine[58187]:   notice:  * Start
>> SharedUserCrons:1      (                        node2.mydomain.com ) due
>> to unrunnable WWWMount:1 start (blocked)
>> Sep  7 15:02:28 node2 pengine[58187]:   notice:  * Move
>> SecondaryUserCrons     ( node1.mydomain.com -> node2.mydomain.com )
>> Sep  7 15:02:28 node2 pengine[58187]:   notice: Calculated transition
>> 129, saving inputs in /var/lib/pacemaker/pengine/pe-input-2795.bz2
> 
> This file would be useful to have.

Reran the test this morning, the file you noted is enclosed. I removed the WebServer and the SharedCrons from the test setup in an 
attempt to simplify, but other than that should be the same. Still getting the same issue.

Remember this file is the transition generated when I connect to node2 and execute pcs node unstandby. I.E.:

Sep 11 08:42:52 node1 pengine[103342]:   notice:  * Start      DRBD:1               (                       node2.mydomain.com )
Sep 11 08:42:52 node1 pengine[103342]:   notice:  * Start      dlm:1                (                       node2.mydomain.com ) 
due to unrunnable DRBD:1 promote (blocked)
Sep 11 08:42:52 node1 pengine[103342]:   notice:  * Start      WWWMount:1           (                       node2.mydomain.com ) 
due to unrunnable dlm:1 start (blocked)
Sep 11 08:42:52 node1 pengine[103342]:   notice:  * Move       SecondaryUserCrons   ( node1.mydomain.com -> node2.mydomain.com )
Sep 11 08:42:52 node1 pengine[103342]:   notice: Calculated transition 72, saving inputs in /var/lib/pacemaker/pengine/pe-input-1412.bz2

> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pe-input-1412.bz2
Type: application/octet-stream
Size: 3168 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180911/3c2fcd83/attachment-0002.obj>