[ClusterLabs] A bug? (SLES15 SP2 with "crm resource refresh")
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Fri Jan 8 05:46:16 EST 2021
Hi!
Trying to reproduce a problem that had occurred in the past after a "crm resource refresh" ("reprobe"), I noticed something on the DC that looks odd to me:
Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Forcing the status of all resources to be redetected
Jan 08 11:13:21 h16 pacemaker-controld[4478]: warning: new_event_notification (4478-26817-13): Broken pipe (32)
### We had that before, already...
Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Jan 08 11:13:21 h16 pacemaker-schedulerd[4477]: notice: Watchdog will be used via SBD if fencing is required and stonith-watchdog-timeout is nonzero
Jan 08 11:13:21 h16 pacemaker-schedulerd[4477]: notice: * Start prm_stonith_sbd ( h16 )
Jan 08 11:13:21 h16 pacemaker-schedulerd[4477]: notice: * Start prm_DLM:0 ( h18 )
Jan 08 11:13:21 h16 pacemaker-schedulerd[4477]: notice: * Start prm_DLM:1 ( h19 )
Jan 08 11:13:21 h16 pacemaker-schedulerd[4477]: notice: * Start prm_DLM:2 ( h16 )
...
## So basically an announcemt to START everything that's running (everything is running); shouldn't that be "monitoring" (probe) instead?
Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Initiating monitor operation prm_stonith_sbd_monitor_0 on h19
Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Initiating monitor operation prm_stonith_sbd_monitor_0 on h18
Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Initiating monitor operation prm_stonith_sbd_monitor_0 locally on h16
...
### So _probes_ are started,
Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Transition 139 aborted by operation prm_testVG_testLV_activate_monitor_0 'modify' on h16: Event failed
Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Transition 139 action 7 (prm_testVG_testLV_activate_monitor_0 on h16): expected 'not running' but got 'ok'
Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Transition 139 action 19 (prm_testVG_testLV_activate_monitor_0 on h18): expected 'not running' but got 'ok'
Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Transition 139 action 31 (prm_testVG_testLV_activate_monitor_0 on h19): expected 'not running' but got 'ok'
...
### That's odd, because the clone WAS running on each node. (Similar results were reported for other clones)
Jan 08 11:13:43 h16 pacemaker-controld[4478]: notice: Transition 140 (Complete=34, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-79.bz2): Complete
Jan 08 11:13:43 h16 pacemaker-controld[4478]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
### So in the end nothing was actually started, but those messages are quite confusing.
Pacemaker version was "(version 2.0.4+20200616.2deceaa3a-3.3.1-2.0.4+20200616.2deceaa3a)" on all three nodes (latest for SLES).
For reference here are the primitives that had odd result:
primitive prm_testVG_testLV_activate LVM-activate \
params vgname=testVG lvname=testLV vg_access_mode=lvmlockd activation_mode=shared \
op start timeout=90s interval=0 \
op stop timeout=90s interval=0 \
op monitor interval=60s timeout=90s \
meta priority=9000
clone cln_testVG_activate prm_testVG_testLV_activate \
meta interleave=true priority=9800 target-role=Started
primitive prm_lvmlockd lvmlockd \
op start timeout=90 interval=0 \
op stop timeout=100 interval=0 \
op monitor interval=60 timeout=90 \
meta priority=9800
clone cln_lvmlockd prm_lvmlockd \
meta interleave=true priority=9800
order ord_lvmlockd__lvm_activate Mandatory: cln_lvmlockd ( cln_testVG_activate )
colocation col_lvm_activate__lvmlockd inf: ( cln_testVG_activate ) cln_lvmlockd
### lvmlockd similarly depends on DLM (order, colocation), so I don't see a problem
Finally:
h16:~ # vgs
VG #PV #LV #SN Attr VSize VFree
sys 1 3 0 wz--n- 222.50g 0
testVG 1 1 0 wz--ns 299.81g 289.81g
Regards,
Ulrich
More information about the Users
mailing list