[ClusterLabs] Resource monitors crash, restart, leave core files
Jaap Winius
jwinius at umrk.nl
Thu Mar 5 08:14:53 EST 2020
Hi folks,
My test system, which includes support for a filesystem resource
called 'mount', works fine otherwise, but every day or so I see
monitor errors like the following when I run 'pcs status':
Failed Resource Actions:
* mount_monitor_20000 on bd3c7 'unknown error' (1): call=23,
status=Error, exitreason='',
last-rc-change='Thu Mar 5 04:57:55 2020', queued=0ms, exec=0ms
The corosync.log shows some more information (see log fragments
below), but I'm unable to identify a cause. The resource monitor bombs
out, produces a core dump and then starts up again about 2 seconds
later. I've also seen this happen with the monitor for my nfsserver
resource. Other than that it stops for a few seconds, the other
problem is that this will eventually cause the filesystem with the
./pacemaker/cores/ directory to fill up with core files (so far, each
is less than 1MB).
Could this be a bug, or is my software not configured correctly (see
cfg below)?
Thanks,
Jaap
PS -- I'm using CentOS 7.7.1908, Corosync 2.4.3, Pacemaker 1.1.20, PCS
0.9.167 and DRBD 9.10.0.
################# corosync.log #########
Mar 05 04:57:55 [15652] bd3c7.umrk.nl lrmd: error:
child_waitpid: Managed process 22553 (mount_monitor_20000) dumped
core
Mar 05 04:57:55 [15652] bd3c7.umrk.nl lrmd: warning:
operation_finished: mount_monitor_20000:22553 - terminated with signal
11
Mar 05 04:57:55 [15655] bd3c7.umrk.nl crmd: error:
process_lrm_event: Result of monitor operation for mount on bd3c7:
Error | call=23 key=mount_monitor_20000 confirmed=false status=4
cib-update=143
...
Mar 05 04:57:55 [15655] bd3c7.umrk.nl crmd: info:
abort_transition_graph: Transition aborted by operation
mount_monitor_20000 'create' on bd3c7: Old event |
magic=4:1;40:2:0:37dad885-d4be-4dcd-8d5f-fd9663e9f953 cib=0.22.62
source=process_graph_event:499 complete=true
...
Mar 05 04:57:55 [15655] bd3c7.umrk.nl crmd: info:
process_graph_event: Detected action (2.40)
mount_monitor_20000.23=unknown error: failed
...
Mar 05 04:57:56 [15652] bd3c7.umrk.nl lrmd: info:
cancel_recurring_action: Cancelling ocf operation mount_monitor_20000
...
Mar 05 04:57:57 [15655] bd3c7.umrk.nl crmd: notice:
te_rsc_command: Initiating monitor operation mount_monitor_20000
locally on bd3c7 | action 1
Mar 05 04:57:57 [15655] bd3c7.umrk.nl crmd: info:
do_lrm_rsc_op: Performing
key=1:71:0:37dad885-d4be-4dcd-8d5f-fd9663e9f953 op=mount_monitor_20000
...
Mar 05 04:57:57 [15650] bd3c7.umrk.nl cib: info:
cib_perform_op: +
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='mount']/lrm_rsc_op[@id='mount_monitor_20000']: @transition-key=1:71:0:37dad885-d4be-4dcd-8d5f-fd9663e9f953, @transition-magic=-1:193;1:71:0:37dad885-d4be-4dcd-8d5f-fd9663e9f953, @call-id=-1, @rc-code=193, @op-status=-1, @last-rc-change=1583380677,
@exec-time=0
...
Mar 05 04:57:57 [15655] bd3c7.umrk.nl crmd: info:
process_lrm_event: Result of monitor operation for mount on bd3c7: 0
(ok) | call=51 key=mount_monitor_20000 confirmed=false cib-update=159
...
Mar 05 04:57:57 [15650] bd3c7.umrk.nl cib: info:
cib_perform_op: +
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='mount']/lrm_rsc_op[@id='mount_monitor_20000']: @transition-magic=0:0;1:71:0:37dad885-d4be-4dcd-8d5f-fd9663e9f953, @call-id=51, @rc-code=0, @op-status=0,
@exec-time=70
Mar 05 04:57:57 [15650] bd3c7.umrk.nl cib: info:
cib_process_request: Completed cib_modify operation for section
status: OK (rc=0, origin=bd3c7/crmd/159, version=0.22.77)
Mar 05 04:57:57 [15655] bd3c7.umrk.nl crmd: info:
match_graph_event: Action mount_monitor_20000 (1) confirmed on bd3c7
(rc=0)
########################################
################# Pacemaker cfg ########
~# pcs resource defaults resource-stickiness=100 ; \
pcs resource create drbd ocf:linbit:drbd drbd_resource=r0 op
monitor interval=60s ; \
pcs resource master drbd master-max=1 master-node-max=1
clone-max=2 clone-node-max=1 notify=true ; \
pcs resource create mount Filesystem device="/dev/drbd0"
directory="/data" fstype="ext4" ; \
pcs constraint colocation add mount with drbd-master INFINITY
with-rsc-role=Master ; \
pcs constraint order promote drbd-master then mount ; \
pcs resource create vip ocf:heartbeat:IPaddr2 ip=192.168.2.73
cidr_netmask=24 op monitor interval=30s ; \
pcs constraint colocation add vip with drbd-master INFINITY
with-rsc-role=Master ; \
pcs constraint order mount then vip ; \
pcs resource create nfsd nfsserver nfs_shared_infodir=/data ; \
pcs resource create nfscfg exportfs clientspec="192.168.2.55"
options=rw,no_subtree_check,no_root_squash directory=/data fsid=0 ; \
pcs constraint colocation add nfsd with vip ; \
pcs constraint colocation add nfscfg with nfsd ; \
pcs constraint order vip then nfsd ; \
pcs constraint order nfsd then nfscfg
########################################
More information about the Users
mailing list