[Pacemaker] unmanaged resource stopped the group

Thu May 23 23:29:53 EDT 2013

On 23/05/2013, at 8:52 PM, Alexandr A. Alexandrov <shurrman at gmail.com> wrote:

> Hi, All!
> 
> On one of my clusters I have resources groups, second group depends on first resource in the first group. Today I needed to restart one service from the first group (no dependancies other than group), so I made in unmanaged:
> 
> May 23 14:14:22 kennedy cib[20888]:   notice: cib:diff: --             <nvpair value="true" id="wcs_wcsd-meta_attributes-is-managed" />
> May 23 14:14:22 kennedy cib[20888]:   notice: cib:diff: ++             <nvpair id="wcs_wcsd-meta_attributes-is-managed" name="is-manage" value="false" />
> 
> I made sure that the resource is "unmanaged" in crm_mon. After that I stopped the resource.
> However, after that the monitor operation was performed and resource was marked as failed, and both groups got stopped! Well, the second group got stopped because of dependancy, but why was the first group stopped because of failure of an unmanaged resource, in the first place?

Did you set is-managed=false for the group or a resource in the group?
I'm assuming the latter - basically the cluster noticed your resource was not running anymore.
While it did not try and do anything to fix that resource, it did stop anything that needed it.
Then when the resource came back, it was able to start the dependancies again.

A better approach would have been to disable the recurring monitor - then the cluster wouldn't have noticed the resource was restarted.
Well, unless the dependancies noticed something they needed wasn't there and failed themselves.

> 
> May 23 14:16:32 kennedy crmd[1787]:   notice: process_lrm_event: LRM operation wcs_wcsd_monitor_15000 (call=832, rc=7, cib-update=668, confirmed=false) not running
> May 23 14:16:32 kennedy crmd[1787]:  warning: update_failcount: Updating failcount for wcs_wcsd on kennedy after failed monitor: rc=7 (update=value++, time=1369304192)
> May 23 14:16:32 kennedy crmd[1787]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> May 23 14:16:32 kennedy attrd[20891]:   notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-wcs_wcsd (1)
> May 23 14:16:32 kennedy pengine[1783]:   notice: unpack_config: On loss of CCM Quorum: Ignore
> May 23 14:16:32 kennedy pengine[1783]:  warning: unpack_rsc_op: Processing failed op monitor for wcs_wcsd on kennedy: not running (7)
> May 23 14:16:32 kennedy attrd[20891]:   notice: attrd_perform_update: Sent update 230: fail-count-wcs_wcsd=1
> 
> Is this a bug, or expected behaviour, or did I miss something from documentation?
> 
> Thanks in advance,
> Alexandr
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org