[Pacemaker] When pacemaker expects resource to go directly to Master after start?

Andrei Borzenkov arvidjaar at gmail.com
Thu Oct 2 06:02:23 EDT 2014


According to documentation (Pacemaker 1.1.x explained) "when
[Master/Slave] the resource is started, it must come up in the
mode called Slave". But what I observe here - in some cases pacemaker
treats Slave state as error. As example (pacemaker 1.1.9):

Oct  2 13:23:34 cn1 pengine[9446]:   notice: unpack_rsc_op: Operation
monitor found resource test_Dummy:0 active in master mode on cn1

So resource currently is Master on node cn1. Second node boots and
starts pacemaker which now decides to restart it on the first node (I
know why it happens, so it is not relevant to this question :) )

Oct  2 13:23:34 cn1 pengine[9446]:   notice: LogActions: Restart
test_Dummy:0  (Master cn1)
Oct  2 13:23:34 cn1 pengine[9446]:   notice: LogActions: Start
test_Dummy:1  (cn2)
Oct  2 13:23:34 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
action 31: monitor test_Dummy:1_monitor_0 on cn2
Oct  2 13:23:34 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
action 84: demote test_Dummy_demote_0 on cn1 (local)
Oct  2 13:23:34 cn1 crmd[9447]:   notice: process_lrm_event: LRM
operation test_Dummy_demote_0 (call=1227, rc=0, cib-update=7826,
confirmed=true) ok
Oct  2 13:23:34 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
action 85: stop test_Dummy_stop_0 on cn1 (local)
Oct  2 13:23:34 cn1 crmd[9447]:   notice: process_lrm_event: LRM
operation test_Dummy_stop_0 (call=1234, rc=0, cib-update=7827,
confirmed=true) ok

As expected it calls demote first and stop next. At this point
resource is stopped.

Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
action 83: start test_Dummy_start_0 on cn1 (local)
Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
action 87: start test_Dummy:1_start_0 on cn2
Oct  2 13:23:35 cn1 crmd[9447]:   notice: process_lrm_event: LRM
operation test_Dummy_start_0 (call=1244, rc=0, cib-update=7830,
confirmed=true) ok

Resource is started again. In full conformance with requirement above,
it is now slave.

Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
action 88: monitor test_Dummy:1_monitor_11000 on cn2
Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
action 3: monitor test_Dummy_monitor_10000 on cn1 (local)
Oct  2 13:23:35 cn1 crmd[9447]:   notice: process_lrm_event: LRM
operation test_Dummy_monitor_10000 (call=1247, rc=0, cib-update=7831,
confirmed=false) ok
Oct  2 13:23:35 cn1 crmd[9447]:  warning: status_from_rc: Action 3
(test_Dummy_monitor_10000) on cn1 failed (target: 8 vs. rc: 0): Error

Oops! Why pacemaker expects resource to be Master on cn1? It had been
stopped, it was started, it was not promoted yet. Only after recovery
from above "error" does it get promoted:

Oct  2 13:23:41 cn1 pengine[9446]:   notice: LogActions: Promote
test_Dummy:0  (Slave -> Master cn1)

primitive pcm_Dummy ocf:pacemaker:Dummy
primitive test_Dummy ocf:test:Dummy \
        op monitor interval="10" role="Master" \
        op monitor interval="11" \
        op start interval="0" timeout="30" \
        op stop interval="0" timeout="120" \
        op promote interval="0" timeout="20" \
        op demote interval="0" timeout="20"
ms ms_Dummy test_Dummy \
        meta target-role="Master"
clone cln_Dummy pcm_Dummy
order ms_Dummy-after-cln_Dummy 2000: cln_Dummy ms_Dummy




More information about the Pacemaker mailing list