[ClusterLabs] Non-cloned resource moves before cloned resource startup on unstandby
Dan Ragle
daniel at Biblestuph.com
Fri Sep 7 16:07:47 EDT 2018
On an active-active two node cluster with DRBD, dlm, filesystem mounts, a Web Server, and some crons I can't figure out how to have
the crons jump from node to node in the correct order. Specifically, I have two crontabs (managed via symlink creation/deletion)
which normally will run one on node1 and the other on node2. When a node goes down, I want both to run on the remaining node until
the original node comes back up, at which time they should split the nodes again. However, when returning to the original node the
crontab that is being moved must wait until the underlying FS mount is done on the original node before jumping.
DRBD, dlm, the filesystem mounts and the Web Server are all working as expected; when I mark the second node as standby Apache
stops, the FS unmounts, dlm stops, and DRBD stops on the node; and when I mark that same node unstandby the reverse happens as
expected. All three of those are cloned resources.
The crontab resources are not cloned and create symlinks, one resource preferring the first node and the other preferring the
second. Each is colocated and order dependent on the filesystem mounts (which in turn are colocated and dependent on dlm, which in
turn is colocated and dependent on DRBD promotion). I thought this would be sufficient, but when the original node is marked
unstandby the crontab that prefers to be on that node attempts to jump over immediately before the FS is mounted on that node. Of
course the crontab link fails because the underlying filesystem hasn't been mounted yet.
pcs version is 0.9.162.
Here's the obfuscated detailed list of commands for the config. I'm still trying to set it up so it's not production-ready yet, but
want to get this much sorted before I add too much more.
# pcs config export pcs-commands
#!/usr/bin/sh
# sequence generated on 2018-09-07 15:21:15 with: clufter 0.77.0
# invoked as: ['/usr/sbin/pcs', 'config', 'export', 'pcs-commands']
# targeting system: ('linux', 'centos', '7.5.1804', 'Core')
# using interpreter: CPython 2.7.5
pcs cluster auth node1.mydomain.com node2.mydomain.com <> /dev/tty
pcs cluster setup --name MyCluster \
node1.mydomain.com node2.mydomain.com --transport udpu
pcs cluster start --all --wait=60
pcs cluster cib tmp-cib.xml
cp tmp-cib.xml tmp-cib.xml.deltasrc
pcs -f tmp-cib.xml property set stonith-enabled=false
pcs -f tmp-cib.xml property set no-quorum-policy=freeze
pcs -f tmp-cib.xml resource defaults resource-stickiness=100
pcs -f tmp-cib.xml resource create DRBD ocf:linbit:drbd drbd_resource=r0 \
op demote interval=0s timeout=90 monitor interval=60s notify interval=0s \
timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \
start interval=0s timeout=240 stop interval=0s timeout=100
pcs -f tmp-cib.xml resource create dlm ocf:pacemaker:controld \
allow_stonith_disabled=1 \
op monitor interval=60s start interval=0s timeout=90 stop interval=0s \
timeout=100
pcs -f tmp-cib.xml resource create WWWMount ocf:heartbeat:Filesystem \
device=/dev/drbd1 directory=/var/www fstype=gfs2 \
options=_netdev,nodiratime,noatime \
op monitor interval=20 timeout=40 notify interval=0s timeout=60 start \
interval=0s timeout=120s stop interval=0s timeout=120s
pcs -f tmp-cib.xml resource create WebServer ocf:heartbeat:apache \
configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status \
op monitor interval=1min start interval=0s timeout=40s stop interval=0s \
timeout=60s
pcs -f tmp-cib.xml resource create SharedRootCrons ocf:heartbeat:symlink \
link=/etc/cron.d/root-shared target=/var/www/crons/root-shared \
op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
interval=0s timeout=15
pcs -f tmp-cib.xml resource create SharedUserCrons ocf:heartbeat:symlink \
link=/etc/cron.d/User-shared target=/var/www/crons/User-shared \
op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
interval=0s timeout=15
pcs -f tmp-cib.xml resource create PrimaryUserCrons ocf:heartbeat:symlink \
link=/etc/cron.d/User-server1 target=/var/www/crons/User-server1 \
op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
interval=0s timeout=15 meta resource-stickiness=0
pcs -f tmp-cib.xml \
resource create SecondaryUserCrons ocf:heartbeat:symlink \
link=/etc/cron.d/User-server2 target=/var/www/crons/User-server2 \
op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
interval=0s timeout=15 meta resource-stickiness=0
pcs -f tmp-cib.xml \
resource clone dlm clone-max=2 clone-node-max=1 interleave=true
pcs -f tmp-cib.xml resource clone WWWMount interleave=true
pcs -f tmp-cib.xml resource clone WebServer interleave=true
pcs -f tmp-cib.xml resource clone SharedRootCrons interleave=true
pcs -f tmp-cib.xml resource clone SharedUserCrons interleave=true
pcs -f tmp-cib.xml \
resource master DRBDClone DRBD master-node-max=1 clone-max=2 master-max=2 \
interleave=true notify=true clone-node-max=1
pcs -f tmp-cib.xml \
constraint colocation add dlm-clone with DRBDClone \
id=colocation-dlm-clone-DRBDClone-INFINITY
pcs -f tmp-cib.xml constraint order promote DRBDClone \
then dlm-clone id=order-DRBDClone-dlm-clone-mandatory
pcs -f tmp-cib.xml \
constraint colocation add WWWMount-clone with dlm-clone \
id=colocation-WWWMount-clone-dlm-clone-INFINITY
pcs -f tmp-cib.xml constraint order dlm-clone \
then WWWMount-clone id=order-dlm-clone-WWWMount-clone-mandatory
pcs -f tmp-cib.xml \
constraint colocation add WebServer-clone with WWWMount-clone \
id=colocation-WebServer-clone-WWWMount-clone-INFINITY
pcs -f tmp-cib.xml constraint order WWWMount-clone \
then WebServer-clone id=order-WWWMount-clone-WebServer-clone-mandatory
pcs -f tmp-cib.xml \
constraint colocation add SharedRootCrons-clone with WWWMount-clone \
id=colocation-SharedRootCrons-clone-WWWMount-clone-INFINITY
pcs -f tmp-cib.xml \
constraint colocation add SharedUserCrons-clone with WWWMount-clone \
id=colocation-SharedUserCrons-clone-WWWMount-clone-INFINITY
pcs -f tmp-cib.xml constraint order WWWMount-clone \
then SharedRootCrons-clone \
id=order-WWWMount-clone-SharedRootCrons-clone-mandatory
pcs -f tmp-cib.xml constraint order WWWMount-clone \
then SharedUserCrons-clone \
id=order-WWWMount-clone-SharedUserCrons-clone-mandatory
pcs -f tmp-cib.xml \
constraint location PrimaryUserCrons prefers node1.mydomain.com=500
pcs -f tmp-cib.xml \
constraint colocation add PrimaryUserCrons with WWWMount-clone \
id=colocation-PrimaryUserCrons-WWWMount-clone-INFINITY
pcs -f tmp-cib.xml constraint order WWWMount-clone \
then PrimaryUserCrons \
id=order-WWWMount-clone-PrimaryUserCrons-mandatory
pcs -f tmp-cib.xml \
constraint location SecondaryUserCrons prefers node2.mydomain.com=500
pcs -f tmp-cib.xml \
constraint colocation add SecondaryUserCrons with WWWMount-clone \
id=colocation-SecondaryUserCrons-WWWMount-clone-INFINITY
pcs -f tmp-cib.xml constraint order WWWMount-clone \
then SecondaryUserCrons \
id=order-WWWMount-clone-SecondaryUserCrons-mandatory
pcs cluster cib-push tmp-cib.xml diff-against=tmp-cib.xml.deltasrc
When I standby node2, the SecondaryUserCrons bounces over to node1 as expected. When I unstandby node2, it bounces back to node2
immediately, before WWWMount is performed, and thus it fails. What am I missing? Here are the log messages from the unstandby operation:
Sep 7 15:02:28 node2 crmd[58188]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Sep 7 15:02:28 node2 pengine[58187]: notice: * Start DRBD:1 ( node2.mydomain.com )
Sep 7 15:02:28 node2 pengine[58187]: notice: * Start dlm:1 ( node2.mydomain.com )
due to unrunnable DRBD:1 promote (blocked)
Sep 7 15:02:28 node2 pengine[58187]: notice: * Start WWWMount:1 ( node2.mydomain.com )
due to unrunnable dlm:1 start (blocked)
Sep 7 15:02:28 node2 pengine[58187]: notice: * Start WebServer:1 ( node2.mydomain.com )
due to unrunnable WWWMount:1 start (blocked)
Sep 7 15:02:28 node2 pengine[58187]: notice: * Start SharedRootCrons:1 ( node2.mydomain.com )
due to unrunnable WWWMount:1 start (blocked)
Sep 7 15:02:28 node2 pengine[58187]: notice: * Start SharedUserCrons:1 ( node2.mydomain.com )
due to unrunnable WWWMount:1 start (blocked)
Sep 7 15:02:28 node2 pengine[58187]: notice: * Move SecondaryUserCrons ( node1.mydomain.com -> node2.mydomain.com )
Sep 7 15:02:28 node2 pengine[58187]: notice: Calculated transition 129, saving inputs in /var/lib/pacemaker/pengine/pe-input-2795.bz2
Sep 7 15:02:28 node2 crmd[58188]: notice: Initiating stop operation SecondaryUserCrons_stop_0 on node1.mydomain.com
Sep 7 15:02:28 node2 crmd[58188]: notice: Initiating notify operation DRBD_pre_notify_start_0 on node1.mydomain.com
Sep 7 15:02:28 node2 crmd[58188]: notice: Initiating start operation SecondaryUserCrons_start_0 locally on node2.mydomain.com
Sep 7 15:02:28 node2 symlink(SecondaryUserCrons)[52196]: WARNING: /var/www/crons/User-server2 does not exist!
Sep 7 15:02:28 node2 crmd[58188]: notice: Initiating start operation DRBD_start_0 locally on node2.mydomain.com
Sep 7 15:02:28 node2 symlink(SecondaryUserCrons)[52196]: INFO: '/etc/cron.d/User-server2' -> '/var/www/crons/User-server2'
Sep 7 15:02:28 node2 symlink(SecondaryUserCrons)[52196]: ERROR: /etc/cron.d/User-server2 does not point to /var/www/crons/User-server2!
Sep 7 15:02:28 node2 lrmd[58185]: notice: SecondaryUserCrons_start_0:52196:stderr [ ocf-exit-reason:/etc/cron.d/User-server2 does
not point to /var/www/crons/User-server2! ]
Sep 7 15:02:28 node2 crmd[58188]: notice: Result of start operation for SecondaryUserCrons on node2.mydomain.com: 5 (not installed)
Sep 7 15:02:28 node2 crmd[58188]: notice: node2.mydomain.com-SecondaryUserCrons_start_0:390 [
ocf-exit-reason:/etc/cron.d/User-server2 does not point to /var/www/crons/User-server2!\n ]
Sep 7 15:02:28 node2 crmd[58188]: warning: Action 109 (SecondaryUserCrons_start_0) on node2.mydomain.com failed (target: 0 vs. rc:
5): Error
Sep 7 15:02:28 node2 crmd[58188]: notice: Transition aborted by operation SecondaryUserCrons_start_0 'modify' on
node2.mydomain.com: Event failed
Sep 7 15:02:28 node2 crmd[58188]: warning: Action 109 (SecondaryUserCrons_start_0) on node2.mydomain.com failed (target: 0 vs. rc:
5): Error
Sep 7 15:02:28 node2 crmd[58188]: notice: Transition aborted by status-2-fail-count-SecondaryUserCrons.start_0 doing create
fail-count-SecondaryUserCrons#start_0=INFINITY: Transient attribute change
Sep 7 15:02:28 node2 kernel: drbd r0: Starting worker thread (from drbdsetup [52264])
Sep 7 15:02:28 node2 kernel: drbd r0/0 drbd1: disk( Diskless -> Attaching )
Sep 7 15:02:28 node2 kernel: drbd r0/0 drbd1: Maximum number of peer devices = 1
Sep 7 15:02:28 node2 kernel: drbd r0: Method to ensure write ordering: drain
Sep 7 15:02:28 node2 kernel: drbd r0/0 drbd1: drbd_bm_resize called with capacity == 1048543928
Sep 7 15:02:28 node2 kernel: drbd r0/0 drbd1: resync bitmap: bits=131067991 words=2047938 pages=4000
Sep 7 15:02:28 node2 kernel: drbd r0/0 drbd1: size = 500 GB (524271964 KB)
Sep 7 15:02:28 node2 kernel: drbd r0/0 drbd1: size = 500 GB (524271964 KB)
Sep 7 15:02:28 node2 kernel: drbd r0/0 drbd1: recounting of set bits took additional 13ms
Sep 7 15:02:28 node2 kernel: drbd r0/0 drbd1: disk( Attaching -> Outdated )
Sep 7 15:02:28 node2 kernel: drbd r0/0 drbd1: attached to current UUID: A2457506F4D44F1C
Sep 7 15:02:28 node2 kernel: drbd r0/1 drbd2: disk( Diskless -> Attaching )
Sep 7 15:02:28 node2 kernel: drbd r0/1 drbd2: Maximum number of peer devices = 1
Sep 7 15:02:28 node2 kernel: drbd r0/1 drbd2: drbd_bm_resize called with capacity == 2097016
Sep 7 15:02:28 node2 kernel: drbd r0/1 drbd2: resync bitmap: bits=262127 words=4096 pages=8
Sep 7 15:02:28 node2 kernel: drbd r0/1 drbd2: size = 1024 MB (1048508 KB)
Sep 7 15:02:28 node2 kernel: drbd r0/1 drbd2: size = 1024 MB (1048508 KB)
Sep 7 15:02:28 node2 kernel: drbd r0/1 drbd2: recounting of set bits took additional 0ms
Sep 7 15:02:28 node2 kernel: drbd r0/1 drbd2: disk( Attaching -> Outdated )
Sep 7 15:02:28 node2 kernel: drbd r0/1 drbd2: attached to current UUID: 0EC5D56AEE53C6B6
Sep 7 15:02:28 node2 kernel: drbd r0 node1.mydomain.com: Starting sender thread (from drbdsetup [52291])
Sep 7 15:02:28 node2 kernel: drbd r0 node1.mydomain.com: conn( StandAlone -> Unconnected )
Sep 7 15:02:28 node2 kernel: drbd r0 node1.mydomain.com: Starting receiver thread (from drbd_w_r0 [52265])
Sep 7 15:02:28 node2 kernel: drbd r0 node1.mydomain.com: conn( Unconnected -> Connecting )
Sep 7 15:02:28 node2 crmd[58188]: notice: Result of start operation for DRBD on node2.mydomain.com: 0 (ok)
Sep 7 15:02:28 node2 crmd[58188]: notice: Initiating notify operation DRBD_post_notify_start_0 on node1.mydomain.com
Sep 7 15:02:28 node2 crmd[58188]: notice: Initiating notify operation DRBD_post_notify_start_0 locally on node2.mydomain.com
Sep 7 15:02:28 node2 crmd[58188]: notice: Result of notify operation for DRBD on node2.mydomain.com: 0 (ok)
Sep 7 15:02:28 node2 crmd[58188]: notice: Transition 129 (Complete=29, Pending=0, Fired=0, Skipped=1, Incomplete=7,
Source=/var/lib/pacemaker/pengine/pe-input-2795.bz2): Stopped
Sep 7 15:02:28 node2 pengine[58187]: warning: Processing failed op start for SecondaryUserCrons on node2.mydomain.com: not
installed (5)
Sep 7 15:02:28 node2 pengine[58187]: notice: Preventing SecondaryUserCrons from re-starting on node2.mydomain.com: operation
start failed 'not installed' (5)
Sep 7 15:02:28 node2 pengine[58187]: warning: Processing failed op start for SecondaryUserCrons on node2.mydomain.com: not
installed (5)
Sep 7 15:02:28 node2 pengine[58187]: notice: Preventing SecondaryUserCrons from re-starting on node2.mydomain.com: operation
start failed 'not installed' (5)
Sep 7 15:02:28 node2 pengine[58187]: warning: Forcing SecondaryUserCrons away from node2.mydomain.com after 1000000 failures
(max=1000000)
Sep 7 15:02:28 node2 pengine[58187]: notice: * Start dlm:1 ( node2.mydomain.com )
due to unrunnable DRBD:1 promote (blocked)
Sep 7 15:02:28 node2 pengine[58187]: notice: * Start WWWMount:1 ( node2.mydomain.com )
due to unrunnable dlm:1 start (blocked)
Sep 7 15:02:28 node2 pengine[58187]: notice: * Start WebServer:1 ( node2.mydomain.com )
due to unrunnable WWWMount:1 start (blocked)
Sep 7 15:02:28 node2 pengine[58187]: notice: * Start SharedRootCrons:1 ( node2.mydomain.com )
due to unrunnable WWWMount:1 start (blocked)
Sep 7 15:02:28 node2 pengine[58187]: notice: * Start SharedUserCrons:1 ( node2.mydomain.com )
due to unrunnable WWWMount:1 start (blocked)
Sep 7 15:02:28 node2 pengine[58187]: notice: * Recover SecondaryUserCrons ( node2.mydomain.com -> node1.mydomain.com )
Sep 7 15:02:28 node2 pengine[58187]: notice: Calculated transition 130, saving inputs in /var/lib/pacemaker/pengine/pe-input-2796.bz2
Sep 7 15:02:28 node2 crmd[58188]: notice: Initiating monitor operation DRBD_monitor_60000 locally on node2.mydomain.com
Sep 7 15:02:28 node2 crmd[58188]: notice: Initiating stop operation SecondaryUserCrons_stop_0 locally on node2.mydomain.com
Sep 7 15:02:28 node2 symlink(SecondaryUserCrons)[52329]: WARNING: /var/www/crons/User-server2 does not exist!
Sep 7 15:02:28 node2 symlink(SecondaryUserCrons)[52329]: ERROR: /etc/cron.d/User-server2 does not point to /var/www/crons/User-server2!
Sep 7 15:02:28 node2 lrmd[58185]: notice: SecondaryUserCrons_stop_0:52329:stderr [ ocf-exit-reason:/etc/cron.d/User-server2 does
not point to /var/www/crons/User-server2! ]
Sep 7 15:02:28 node2 crmd[58188]: notice: Result of stop operation for SecondaryUserCrons on node2.mydomain.com: 5 (not installed)
Sep 7 15:02:28 node2 crmd[58188]: notice: node2.mydomain.com-SecondaryUserCrons_stop_0:394 [
ocf-exit-reason:/etc/cron.d/User-server2 does not point to /var/www/crons/User-server2!\n ]
Sep 7 15:02:28 node2 crmd[58188]: warning: Action 10 (SecondaryUserCrons_stop_0) on node2.mydomain.com failed (target: 0 vs. rc:
5): Error
Sep 7 15:02:28 node2 crmd[58188]: notice: Transition aborted by operation SecondaryUserCrons_stop_0 'modify' on
node2.mydomain.com: Event failed
Sep 7 15:02:28 node2 crmd[58188]: warning: Action 10 (SecondaryUserCrons_stop_0) on node2.mydomain.com failed (target: 0 vs. rc:
5): Error
Sep 7 15:02:28 node2 crmd[58188]: notice: Transition aborted by status-2-fail-count-SecondaryUserCrons.stop_0 doing create
fail-count-SecondaryUserCrons#stop_0=INFINITY: Transient attribute change
Sep 7 15:02:28 node2 crmd[58188]: notice: Transition 130 (Complete=18, Pending=0, Fired=0, Skipped=0, Incomplete=8,
Source=/var/lib/pacemaker/pengine/pe-input-2796.bz2): Complete
Sep 7 15:02:29 node2 pengine[58187]: error: No further recovery can be attempted for SecondaryUserCrons: stop action failed with
'not installed' (5)
Sep 7 15:02:29 node2 pengine[58187]: warning: Processing failed op stop for SecondaryUserCrons on node2.mydomain.com: not
installed (5)
Sep 7 15:02:29 node2 pengine[58187]: notice: Preventing SecondaryUserCrons from re-starting on node2.mydomain.com: operation stop
failed 'not installed' (5)
Sep 7 15:02:29 node2 pengine[58187]: error: No further recovery can be attempted for SecondaryUserCrons: stop action failed with
'not installed' (5)
Sep 7 15:02:29 node2 pengine[58187]: warning: Processing failed op stop for SecondaryUserCrons on node2.mydomain.com: not
installed (5)
Sep 7 15:02:29 node2 pengine[58187]: notice: Preventing SecondaryUserCrons from re-starting on node2.mydomain.com: operation stop
failed 'not installed' (5)
Sep 7 15:02:29 node2 pengine[58187]: warning: Forcing SecondaryUserCrons away from node2.mydomain.com after 1000000 failures
(max=1000000)
Sep 7 15:02:29 node2 pengine[58187]: notice: * Start dlm:1 ( node2.mydomain.com )
due to unrunnable DRBD:1 promote (blocked)
Sep 7 15:02:29 node2 pengine[58187]: notice: * Start WWWMount:1 ( node2.mydomain.com )
due to unrunnable dlm:1 start (blocked)
Sep 7 15:02:29 node2 pengine[58187]: notice: * Start WebServer:1 ( node2.mydomain.com )
due to unrunnable WWWMount:1 start (blocked)
Sep 7 15:02:29 node2 pengine[58187]: notice: * Start SharedRootCrons:1 ( node2.mydomain.com )
due to unrunnable WWWMount:1 start (blocked)
Sep 7 15:02:29 node2 pengine[58187]: notice: * Start SharedUserCrons:1 ( node2.mydomain.com )
due to unrunnable WWWMount:1 start (blocked)
Sep 7 15:02:29 node2 pengine[58187]: error: Calculated transition 131 (with errors), saving inputs in
/var/lib/pacemaker/pengine/pe-error-26.bz2
Sep 7 15:02:29 node2 crmd[58188]: warning: Transition 131 (Complete=16, Pending=0, Fired=0, Skipped=0, Incomplete=5,
Source=/var/lib/pacemaker/pengine/pe-error-26.bz2): Terminated
Sep 7 15:02:29 node2 crmd[58188]: warning: Transition failed: terminated
Sep 7 15:02:29 node2 crmd[58188]: notice: Graph 131 with 21 actions: batch-limit=0 jobs, network-delay=60000ms
Sep 7 15:02:29 node2 crmd[58188]: notice: [Action 47]: Completed pseudo op dlm-clone_running_0 on N/A (priority:
1000000, waiting: none)
Sep 7 15:02:29 node2 crmd[58188]: notice: [Action 46]: Completed pseudo op dlm-clone_start_0 on N/A (priority: 0,
waiting: none)
Sep 7 15:02:29 node2 crmd[58188]: notice: [Action 55]: Completed pseudo op WWWMount-clone_running_0 on N/A (priority:
1000000, waiting: none)
Sep 7 15:02:29 node2 crmd[58188]: notice: [Action 54]: Completed pseudo op WWWMount-clone_start_0 on N/A (priority: 0,
waiting: none)
Sep 7 15:02:29 node2 crmd[58188]: notice: [Action 69]: Pending rsc op WebServer_monitor_60000 on node2.mydomain.com
(priority: 0, waiting: none)
Sep 7 15:02:29 node2 crmd[58188]: notice: * [Input 68]: Unresolved dependency rsc op WebServer_start_0 on node2.mydomain.com
Sep 7 15:02:29 node2 crmd[58188]: notice: [Action 71]: Completed pseudo op WebServer-clone_running_0 on N/A (priority:
1000000, waiting: none)
Sep 7 15:02:29 node2 crmd[58188]: notice: [Action 70]: Completed pseudo op WebServer-clone_start_0 on N/A (priority:
0, waiting: none)
Sep 7 15:02:29 node2 crmd[58188]: notice: [Action 93]: Pending rsc op SharedRootCrons_monitor_60000 on node2.mydomain.com
(priority: 0, waiting: none)
Sep 7 15:02:29 node2 crmd[58188]: notice: * [Input 92]: Unresolved dependency rsc op SharedRootCrons_start_0 on node2.mydomain.com
Sep 7 15:02:29 node2 crmd[58188]: notice: [Action 95]: Completed pseudo op SharedRootCrons-clone_running_0 on N/A (priority:
1000000, waiting: none)
Sep 7 15:02:29 node2 crmd[58188]: notice: [Action 94]: Completed pseudo op SharedRootCrons-clone_start_0 on N/A (priority: 0,
waiting: none)
Sep 7 15:02:29 node2 crmd[58188]: notice: [Action 101]: Pending rsc op SharedUserCrons_monitor_60000 on node2.mydomain.com
(priority: 0, waiting: none)
Sep 7 15:02:29 node2 crmd[58188]: notice: * [Input 100]: Unresolved dependency rsc op SharedUserCrons_start_0 on node2.mydomain.com
Sep 7 15:02:29 node2 crmd[58188]: notice: [Action 103]: Completed pseudo op SharedUserCrons-clone_running_0 on N/A (priority:
1000000, waiting: none)
Sep 7 15:02:29 node2 crmd[58188]: notice: [Action 102]: Completed pseudo op SharedUserCrons-clone_start_0 on N/A (priority: 0,
waiting: none)
Sep 7 15:02:29 node2 crmd[58188]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Sep 7 15:02:29 node2 kernel: drbd r0 node1.mydomain.com: Handshake to peer 0 successful: Agreed network protocol version 113
Sep 7 15:02:29 node2 kernel: drbd r0 node1.mydomain.com: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
WRITE_ZEROES.
Sep 7 15:02:29 node2 kernel: drbd r0 node1.mydomain.com: Starting ack_recv thread (from drbd_r_r0 [52295])
Sep 7 15:02:29 node2 kernel: drbd r0 node1.mydomain.com: Preparing remote state change 2019156377
Sep 7 15:02:29 node2 kernel: drbd r0 node1.mydomain.com: Committing remote state change 2019156377 (primary_nodes=1)
Sep 7 15:02:29 node2 kernel: drbd r0 node1.mydomain.com: conn( Connecting -> Connected ) peer( Unknown -> Primary )
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1 node1.mydomain.com: drbd_sync_handshake:
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1 node1.mydomain.com: self
A2457506F4D44F1C:0000000000000000:B13E5D392CF268C4:FE2F70857D64FB02 bits:0 flags:20
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1 node1.mydomain.com: peer
D355B0F942665879:A2457506F4D44F1D:B13E5D392CF268C4:E56E164C51EEFAB0 bits:6 flags:120
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1 node1.mydomain.com: uuid_compare()=-2 by rule 50
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1 node1.mydomain.com: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
Sep 7 15:02:29 node2 kernel: drbd r0/1 drbd2 node1.mydomain.com: drbd_sync_handshake:
Sep 7 15:02:29 node2 kernel: drbd r0/1 drbd2 node1.mydomain.com: self
0EC5D56AEE53C6B6:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
Sep 7 15:02:29 node2 kernel: drbd r0/1 drbd2 node1.mydomain.com: peer
0EC5D56AEE53C6B6:0000000000000000:B62926494645765C:0000000000000000 bits:0 flags:120
Sep 7 15:02:29 node2 kernel: drbd r0/1 drbd2 node1.mydomain.com: uuid_compare()=0 by rule 38
Sep 7 15:02:29 node2 kernel: drbd r0/1 drbd2: disk( Outdated -> UpToDate )
Sep 7 15:02:29 node2 kernel: drbd r0/1 drbd2 node1.mydomain.com: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1 node1.mydomain.com: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 27(1),
total 27; compression: 100.0%
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1 node1.mydomain.com: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 27(1), total
27; compression: 100.0%
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1 node1.mydomain.com: helper command: /sbin/drbdadm before-resync-target
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1 node1.mydomain.com: helper command: /sbin/drbdadm before-resync-target exit code 0 (0x0)
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1: disk( Outdated -> Inconsistent )
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1 node1.mydomain.com: repl( WFBitMapT -> SyncTarget )
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1 node1.mydomain.com: Began resync as SyncTarget (will sync 24 KB [6 bits set]).
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1 node1.mydomain.com: Resync done (total 1 sec; paused 0 sec; 24 K/sec)
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1 node1.mydomain.com: updated UUIDs
D355B0F942665878:0000000000000000:A2457506F4D44F1C:E2BDB50A1BFBAE5E
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1: disk( Inconsistent -> UpToDate )
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1 node1.mydomain.com: repl( SyncTarget -> Established )
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1 node1.mydomain.com: helper command: /sbin/drbdadm after-resync-target
Sep 7 15:02:29 node2 kernel: drbd r0/0 drbd1 node1.mydomain.com: helper command: /sbin/drbdadm after-resync-target exit code 0 (0x0)
Sep 7 15:03:29 node2 crmd[58188]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Sep 7 15:03:29 node2 pengine[58187]: error: No further recovery can be attempted for SecondaryUserCrons: stop action failed with
'not installed' (5)
Sep 7 15:03:29 node2 pengine[58187]: warning: Processing failed op stop for SecondaryUserCrons on node2.mydomain.com: not
installed (5)
Sep 7 15:03:29 node2 pengine[58187]: notice: Preventing SecondaryUserCrons from re-starting on node2.mydomain.com: operation stop
failed 'not installed' (5)
Sep 7 15:03:29 node2 pengine[58187]: error: No further recovery can be attempted for SecondaryUserCrons: stop action failed with
'not installed' (5)
Sep 7 15:03:29 node2 pengine[58187]: warning: Processing failed op stop for SecondaryUserCrons on node2.mydomain.com: not
installed (5)
Sep 7 15:03:29 node2 pengine[58187]: notice: Preventing SecondaryUserCrons from re-starting on node2.mydomain.com: operation stop
failed 'not installed' (5)
Sep 7 15:03:29 node2 pengine[58187]: warning: Forcing SecondaryUserCrons away from node2.mydomain.com after 1000000 failures
(max=1000000)
Sep 7 15:03:29 node2 pengine[58187]: notice: * Promote DRBD:1 ( Slave -> Master node2.mydomain.com )
Sep 7 15:03:29 node2 pengine[58187]: notice: * Start dlm:1 ( node2.mydomain.com )
Sep 7 15:03:29 node2 pengine[58187]: notice: * Start WWWMount:1 ( node2.mydomain.com )
Sep 7 15:03:29 node2 pengine[58187]: notice: * Start WebServer:1 ( node2.mydomain.com )
Sep 7 15:03:29 node2 pengine[58187]: notice: * Start SharedRootCrons:1 ( node2.mydomain.com )
Sep 7 15:03:29 node2 pengine[58187]: notice: * Start SharedUserCrons:1 ( node2.mydomain.com )
Sep 7 15:03:29 node2 pengine[58187]: error: Calculated transition 132 (with errors), saving inputs in
/var/lib/pacemaker/pengine/pe-error-27.bz2
Sep 7 15:03:29 node2 crmd[58188]: notice: Initiating cancel operation DRBD_monitor_60000 locally on node2.mydomain.com
Sep 7 15:03:29 node2 crmd[58188]: notice: Initiating notify operation DRBD_pre_notify_promote_0 on node1.mydomain.com
Sep 7 15:03:29 node2 crmd[58188]: notice: Initiating notify operation DRBD_pre_notify_promote_0 locally on node2.mydomain.com
Sep 7 15:03:29 node2 crmd[58188]: notice: Result of notify operation for DRBD on node2.mydomain.com: 0 (ok)
Sep 7 15:03:29 node2 crmd[58188]: notice: Initiating promote operation DRBD_promote_0 locally on node2.mydomain.com
Sep 7 15:03:29 node2 kernel: drbd r0: Preparing cluster-wide state change 360863446 (1->-1 3/1)
Sep 7 15:03:29 node2 kernel: drbd r0: State change 360863446: primary_nodes=3, weak_nodes=FFFFFFFFFFFFFFFC
Sep 7 15:03:29 node2 kernel: drbd r0: Committing cluster-wide state change 360863446 (0ms)
Sep 7 15:03:29 node2 kernel: drbd r0: role( Secondary -> Primary )
Sep 7 15:03:29 node2 crmd[58188]: notice: Result of promote operation for DRBD on node2.mydomain.com: 0 (ok)
Sep 7 15:03:29 node2 crmd[58188]: notice: Initiating notify operation DRBD_post_notify_promote_0 on node1.mydomain.com
Sep 7 15:03:29 node2 crmd[58188]: notice: Initiating notify operation DRBD_post_notify_promote_0 locally on node2.mydomain.com
Sep 7 15:03:29 node2 crmd[58188]: notice: Result of notify operation for DRBD on node2.mydomain.com: 0 (ok)
Sep 7 15:03:29 node2 crmd[58188]: notice: Initiating start operation dlm_start_0 locally on node2.mydomain.com
Sep 7 15:03:29 node2 dlm_controld[53127]: 693403 dlm_controld 4.0.7 started
Sep 7 15:03:30 node2 crmd[58188]: notice: Result of start operation for dlm on node2.mydomain.com: 0 (ok)
Sep 7 15:03:30 node2 crmd[58188]: notice: Initiating monitor operation dlm_monitor_60000 locally on node2.mydomain.com
Sep 7 15:03:30 node2 crmd[58188]: notice: Initiating start operation WWWMount_start_0 locally on node2.mydomain.com
Sep 7 15:03:30 node2 Filesystem(WWWMount)[53154]: INFO: Running start for /dev/drbd1 on /var/www
Sep 7 15:03:30 node2 kernel: dlm: Using TCP for communications
Sep 7 15:03:30 node2 kernel: GFS2: fsid=MyCluster:www: Trying to join cluster "lock_dlm", "MyCluster:www"
Sep 7 15:03:30 node2 kernel: dlm: connecting to 1
Sep 7 15:03:30 node2 kernel: dlm: got connection from 1
Sep 7 15:03:31 node2 kernel: GFS2: fsid=MyCluster:www: Joined cluster. Now mounting FS...
Sep 7 15:03:31 node2 kernel: GFS2: fsid=MyCluster:www.1: jid=1, already locked for use
Sep 7 15:03:31 node2 kernel: GFS2: fsid=MyCluster:www.1: jid=1: Looking at journal...
Sep 7 15:03:31 node2 kernel: GFS2: fsid=MyCluster:www.1: jid=1: Done
Sep 7 15:03:31 node2 crmd[58188]: notice: Result of start operation for WWWMount on node2.mydomain.com: 0 (ok)
Sep 7 15:03:31 node2 crmd[58188]: notice: Initiating monitor operation WWWMount_monitor_20000 locally on node2.mydomain.com
Sep 7 15:03:31 node2 crmd[58188]: notice: Initiating start operation WebServer_start_0 locally on node2.mydomain.com
Sep 7 15:03:31 node2 crmd[58188]: notice: Initiating start operation SharedRootCrons_start_0 locally on node2.mydomain.com
Sep 7 15:03:31 node2 crmd[58188]: notice: Initiating start operation SharedUserCrons_start_0 locally on node2.mydomain.com
Sep 7 15:03:31 node2 symlink(SharedRootCrons)[53328]: INFO: '/etc/cron.d/root-shared' -> '/var/www/crons/root-shared'
Sep 7 15:03:31 node2 symlink(SharedUserCrons)[53329]: INFO: '/etc/cron.d/User-shared' -> '/var/www/crons/User-shared'
Sep 7 15:03:31 node2 crmd[58188]: notice: Result of start operation for SharedRootCrons on node2.mydomain.com: 0 (ok)
Sep 7 15:03:31 node2 crmd[58188]: notice: Result of start operation for SharedUserCrons on node2.mydomain.com: 0 (ok)
Sep 7 15:03:31 node2 crmd[58188]: notice: Initiating monitor operation SharedRootCrons_monitor_60000 locally on node2.mydomain.com
Sep 7 15:03:31 node2 crmd[58188]: notice: Initiating monitor operation SharedUserCrons_monitor_60000 locally on node2.mydomain.com
Sep 7 15:03:31 node2 apache(WebServer)[53325]: INFO: apache not running
Sep 7 15:03:31 node2 apache(WebServer)[53325]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up
Sep 7 15:03:32 node2 crmd[58188]: notice: Result of start operation for WebServer on node2.mydomain.com: 0 (ok)
Sep 7 15:03:32 node2 crmd[58188]: notice: Initiating monitor operation WebServer_monitor_60000 locally on node2.mydomain.com
Sep 7 15:03:33 node2 crmd[58188]: notice: Transition 132 (Complete=44, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-error-27.bz2): Complete
Sep 7 15:03:33 node2 crmd[58188]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
More information about the Users
mailing list