[Pacemaker] making resource managed

Wed Nov 10 03:03:16 EST 2010

On Tue, Nov 9, 2010 at 2:14 PM, Vadim S. Khondar <v.khondar at o3.ua> wrote:
> У вт, 2010-11-09 у 09:49 +0100, Andrew Beekhof пише:
>> being unmanaged is a side-effect of a) the resource failing to stop
>> and b) no fencing being configured
>> once you've fixed the error, run crm resource cleanup as misch suggested
>>
>
> I understand that.
> However, for example, in situation when VPS fails to start (not to stop)

Its failing to stop too:

ca_stop_0 (node=ha-3, call=49, rc=1, status=complete): unknown error
   ^^^^^^^^

> because of lack of configuration file and due to this becomes unmanaged,
> I run:
>
> crm(live)# status
> ============
> Last updated: Tue Nov  9 14:53:09 2010
> Stack: Heartbeat
> Current DC: ha-3 (a1ad8f56-7eb0-4aec-8d32-83e283903879) - partition with
> quorum
> Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
> 2 Nodes configured, unknown expected votes
> 2 Resources configured.
> ============
>
> Online: [ ha-3 ha-4 ]
>
>  test_ManageVE (ocf::heartbeat:ManageVE): Started ha-3
>  ca (ocf::heartbeat:ManageVE): Started ha-3 (unmanaged) FAILED
>
> Failed actions:
>    ca_start_0 (node=ha-3, call=48, rc=5, status=complete): not
> installed
>    ca_stop_0 (node=ha-3, call=49, rc=1, status=complete): unknown error
>
> After fixing the issue (and checking that VPS really can be started via
> shell):
>
> crm(live)# resource cleanup ca
> Cleaning up ca on ha-3
> Cleaning up ca on ha-4
>
>
> Got the following in /var/log/messages on current DC ha-3:
>
> Nov  9 14:58:19 ha-3 crmd: [8434]: notice: do_lrm_invoke: Not creating
> resource for a delete event: (null)
> Nov  9 14:58:19 ha-3 crmd: [8434]: info: send_direct_ack: ACK'ing
> resource op ca_delete_60000 from 0:0:crm-resource-17296:
> lrm_invoke-lrmd-1289307499-777
> Nov  9 14:58:20 ha-3 attrd: [8433]: info: attrd_ha_callback: Update
> relayed from ha-4
> Nov  9 14:58:25 ha-3 lrmd: [8431]: info: Resource Agent output: []
> Nov  9 14:58:25 ha-3 lrmd: [8431]: notice: read's ret: 0 when lrmd_op
> finished
>
> crm(live)# resource manage ca
> Log:
> Nov  9 15:00:48 ha-3 cib: [8430]: info: cib_process_request: Operation
> complete: op cib_replace for section resources (origin=ha-4/cibadmin/2,
> version=0.92.2): ok (rc=0)
>
> And after this still:
> Online: [ ha-3 ha-4 ]
>
>  test_ManageVE (ocf::heartbeat:ManageVE): Started ha-3
>  ca (ocf::heartbeat:ManageVE): Started ha-3 (unmanaged) FAILED
>
> Failed actions:
>    ca_start_0 (node=ha-3, call=48, rc=5, status=complete): not
> installed
>    ca_stop_0 (node=ha-3, call=49, rc=1, status=complete): unknown error
>
>
> If after this I edit CIB and apply it, all LRM messages disappear and
> resource starts managed as it should.
> Seems like cleanup does not clean all the status information.
>
> What am I missing?

Possibly an ordering constraint.  Otherwise, no idea.
Depends on how your resource agent works.