[Pacemaker] When pacemaker expects resource to go directly to Master after start?

Andrew Beekhof andrew at beekhof.net
Sun Oct 5 22:08:22 EDT 2014


On 2 Oct 2014, at 8:02 pm, Andrei Borzenkov <arvidjaar at gmail.com> wrote:

> According to documentation (Pacemaker 1.1.x explained) "when
> [Master/Slave] the resource is started, it must come up in the
> mode called Slave". But what I observe here - in some cases pacemaker
> treats Slave state as error. As example (pacemaker 1.1.9):
> 
> Oct  2 13:23:34 cn1 pengine[9446]:   notice: unpack_rsc_op: Operation
> monitor found resource test_Dummy:0 active in master mode on cn1
> 
> So resource currently is Master on node cn1. Second node boots and
> starts pacemaker which now decides to restart it on the first node (I
> know why it happens, so it is not relevant to this question :) )
> 
> Oct  2 13:23:34 cn1 pengine[9446]:   notice: LogActions: Restart
> test_Dummy:0  (Master cn1)
> Oct  2 13:23:34 cn1 pengine[9446]:   notice: LogActions: Start
> test_Dummy:1  (cn2)
> Oct  2 13:23:34 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
> action 31: monitor test_Dummy:1_monitor_0 on cn2
> Oct  2 13:23:34 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
> action 84: demote test_Dummy_demote_0 on cn1 (local)
> Oct  2 13:23:34 cn1 crmd[9447]:   notice: process_lrm_event: LRM
> operation test_Dummy_demote_0 (call=1227, rc=0, cib-update=7826,
> confirmed=true) ok
> Oct  2 13:23:34 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
> action 85: stop test_Dummy_stop_0 on cn1 (local)
> Oct  2 13:23:34 cn1 crmd[9447]:   notice: process_lrm_event: LRM
> operation test_Dummy_stop_0 (call=1234, rc=0, cib-update=7827,
> confirmed=true) ok
> 
> As expected it calls demote first and stop next. At this point
> resource is stopped.
> 
> Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
> action 83: start test_Dummy_start_0 on cn1 (local)
> Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
> action 87: start test_Dummy:1_start_0 on cn2
> Oct  2 13:23:35 cn1 crmd[9447]:   notice: process_lrm_event: LRM
> operation test_Dummy_start_0 (call=1244, rc=0, cib-update=7830,
> confirmed=true) ok
> 
> Resource is started again. In full conformance with requirement above,
> it is now slave.
> 
> Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
> action 88: monitor test_Dummy:1_monitor_11000 on cn2
> Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
> action 3: monitor test_Dummy_monitor_10000 on cn1 (local)
> Oct  2 13:23:35 cn1 crmd[9447]:   notice: process_lrm_event: LRM
> operation test_Dummy_monitor_10000 (call=1247, rc=0, cib-update=7831,
> confirmed=false) ok
> Oct  2 13:23:35 cn1 crmd[9447]:  warning: status_from_rc: Action 3
> (test_Dummy_monitor_10000) on cn1 failed (target: 8 vs. rc: 0): Error
> 
> Oops! Why pacemaker expects resource to be Master on cn1? It had been
> stopped, it was started, it was not promoted yet.

true.  more than likely a bug that has been fixed since 1.1.9.
if you send through a crm_report i can verify if the current code would have some the right thing

> Only after recovery
> from above "error" does it get promoted:
> 
> Oct  2 13:23:41 cn1 pengine[9446]:   notice: LogActions: Promote
> test_Dummy:0  (Slave -> Master cn1)
> 
> primitive pcm_Dummy ocf:pacemaker:Dummy
> primitive test_Dummy ocf:test:Dummy \
>        op monitor interval="10" role="Master" \
>        op monitor interval="11" \
>        op start interval="0" timeout="30" \
>        op stop interval="0" timeout="120" \
>        op promote interval="0" timeout="20" \
>        op demote interval="0" timeout="20"
> ms ms_Dummy test_Dummy \
>        meta target-role="Master"
> clone cln_Dummy pcm_Dummy
> order ms_Dummy-after-cln_Dummy 2000: cln_Dummy ms_Dummy
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20141006/3b94c2ba/attachment-0003.sig>


More information about the Pacemaker mailing list