[ClusterLabs] Pacemaker reload Master/Slave resource
Ken Gaillot
kgaillot at redhat.com
Mon Jun 6 21:30:36 UTC 2016
On 05/20/2016 06:20 AM, Felix Zachlod (Lists) wrote:
> version 1.1.13-10.el7_2.2-44eb2dd
>
> Hello!
>
> I am currently developing a master/slave resource agent. So far it is working just fine, but this resource agent implements reload() and this does not work as expected when running as Master:
> The reload action is invoked and it succeeds returning 0. The resource is still Master and monitor will return $OCF_RUNNING_MASTER.
>
> But Pacemaker considers the instance being slave afterwards. Actually only reload is invoked, no monitor, no demote etc.
>
> I first thought that reload should possibly return $OCF_RUNNING_MASTER too but this leads to the resource failing on reload. It seems 0 is the only valid return code.
>
> I can recover the cluster state running resource $resourcename promote, which will call
>
> notify
> promote
> notify
>
> Afterwards my resource is considered Master again. After PEngine Recheck Timer (I_PE_CALC) just popped (900000ms), the cluster manager will promote the resource itself.
> But this can lead to unexpected results, it could promote the resource on the wrong node so that both sides are actually running master, the cluster will not even notice it does not call monitor either.
>
> Is this a bug?
>
> regards, Felix
I think it depends on your point of view :)
Reload is implemented as an alternative to stop-then-start. For m/s
clones, start leaves the resource in slave state.
So on the one hand, it makes sense that Pacemaker would expect a m/s
reload to end up in slave state, regardless of the initial state, since
it should be equivalent to stop-then-start.
On the other hand, you could argue that a reload for a master should
logically be an alternative to demote-stop-start-promote.
On the third hand ;) you could argue that reload is ambiguous for master
resources and thus shouldn't be supported at all.
Feel free to open a feature request at http://bugs.clusterlabs.org/ to
say how you think it should work.
As an aside, I think the current implementation of reload in pacemaker
is unsatisfactory for two reasons:
* Using the "unique" attribute to determine whether a parameter is
reloadable was a bad idea. For example, the location of a daemon binary
is generally set to unique=0, which is sensible in that multiple RA
instances can use the same binary, but a reload could not handle that
change. It is not a problem only because no one ever changes that.
* There is a fundamental misunderstanding between pacemaker and most RA
developers as to what reload means. Pacemaker uses the reload action to
make parameter changes in the resource's *pacemaker* configuration take
effect, but RA developers tend to use it to reload the service's own
configuration files (a more natural interpretation, but completely
different from how pacemaker uses it).
> trace May 20 12:58:31 cib_create_op(609):0: Sending call options: 00100000, 1048576
> trace May 20 12:58:31 cib_native_perform_op_delegate(384):0: Sending cib_modify message to CIB service (timeout=120s)
> trace May 20 12:58:31 crm_ipc_send(1175):0: Sending from client: cib_shm request id: 745 bytes: 1070 timeout:120000 msg...
> trace May 20 12:58:31 crm_ipc_send(1188):0: Message sent, not waiting for reply to 745 from cib_shm to 1070 bytes...
> trace May 20 12:58:31 cib_native_perform_op_delegate(395):0: Reply: No data to dump as XML
> trace May 20 12:58:31 cib_native_perform_op_delegate(398):0: Async call, returning 268
> trace May 20 12:58:31 do_update_resource(2274):0: Sent resource state update message: 268 for reload=0 on scst_dg_ssd
> trace May 20 12:58:31 cib_client_register_callback_full(606):0: Adding callback cib_rsc_callback for call 268
> trace May 20 12:58:31 process_lrm_event(2374):0: Op scst_dg_ssd_reload_0 (call=449, stop-id=scst_dg_ssd:449, remaining=3): Confirmed
> notice May 20 12:58:31 process_lrm_event(2392):0: Operation scst_dg_ssd_reload_0: ok (node=alpha, call=449, rc=0, cib-update=268, confirmed=true)
> debug May 20 12:58:31 update_history_cache(196):0: Updating history for 'scst_dg_ssd' with reload op
> trace May 20 12:58:31 crm_ipc_read(992):0: No message from lrmd received: Resource temporarily unavailable
> trace May 20 12:58:31 mainloop_gio_callback(654):0: Message acquisition from lrmd[0x22b0ec0] failed: No message of desired type (-42)
> trace May 20 12:58:31 crm_fsa_trigger(293):0: Invoked (queue len: 0)
> trace May 20 12:58:31 s_crmd_fsa(159):0: FSA invoked with Cause: C_FSA_INTERNAL State: S_NOT_DC
> trace May 20 12:58:31 s_crmd_fsa(246):0: Exiting the FSA
> trace May 20 12:58:31 crm_fsa_trigger(295):0: Exited (queue len: 0)
> trace May 20 12:58:31 crm_ipc_read(989):0: Received cib_shm event 2108, size=183, rc=183, text: <cib-reply t="cib" cib_op="cib_modify" cib_callid="268" cib_clientid="60010689-7350-4916-a7bd-bd85ff
> trace May 20 12:58:31 mainloop_gio_callback(659):0: New message from cib_shm[0x23b7ab0] = 143
> trace May 20 12:58:31 cib_native_dispatch_internal(100):0: dispatching 0x22b2370
> trace May 20 12:58:31 cib_native_dispatch_internal(116):0: Activating cib callbacks...
> trace May 20 12:58:31 cib_native_callback(649):0: Invoking callback cib_rsc_callback for call 268
> trace May 20 12:58:31 cib_rsc_callback(2113):0: Resource update 268 complete: rc=0
> trace May 20 12:58:31 cib_rsc_callback(2121):0: Triggering FSA: cib_rsc_callback
> trace May 20 12:58:31 cib_native_callback(666):0: OP callback activated for 268
> trace May 20 12:58:31 crm_ipc_read(992):0: No message from cib_shm received: Resource temporarily unavailable
> trace May 20 12:58:31 mainloop_gio_callback(654):0: Message acquisition from cib_shm[0x23b7ab0] failed: No message of desired type (-42)
> trace May 20 12:58:31 crm_fsa_trigger(293):0: Invoked (queue len: 0)
> trace May 20 12:58:31 s_crmd_fsa(159):0: FSA invoked with Cause: C_FSA_INTERNAL State: S_NOT_DC
> trace May 20 12:58:31 s_crmd_fsa(246):0: Exiting the FSA
> trace May 20 12:58:31 crm_fsa_trigger(295):0: Exited (queue len: 0)
> notice May 20 12:58:43 crm_signal_dispatch(272):0: Invoking handler for signal 5: Trace/breakpoint trap
> notice May 20 12:58:43 crm_write_blackbox(431):0: Blackbox dump requested, please see /var/lib/pacemaker/blackbox/crmd-2877.2 for contents
>
> --
> Mit freundlichen Grüßen
> Dipl. Inf. (FH) Felix Zachlod
>
> Onesty Tech GmbH
> Lieberoser Str. 7
> 03046 Cottbus
>
> Tel.: +49 (355) 289430
> Fax.: +49 (355) 28943100
> fz at onesty-tech.de
>
> Registergericht Amtsgericht Cottbus, HRB 7885 Geschäftsführer Romy Schötz, Thomas Menzel
More information about the Users
mailing list