[Pacemaker] crm resource cleanup ignored

Bernd Schubert bernd.schubert at fastmail.fm
Fri Jul 2 08:56:04 EDT 2010


Hello all,

after the update 1.0.9 on our test cluster, new weird stonith issues 
come up. 

1) It fails to start stonith resources on *some* nodes
=======================================================

Jul 02 14:43:23 phys-oss3 pengine: [18077]: WARN: unpack_rsc_op: Processing failed op st-riloe-phys-oss1_start_0 on phys-oss3: unknown error 
(1)

Failed actions:
    st-riloe-phys-oss1_start_0 (node=phys-oss3, call=25, rc=1, status=complete): unknown error
    st-riloe-phys-oss2_start_0 (node=phys-oss0, call=25, rc=1, status=complete): unknown error


On other nodes it properly starts it:

Node phys-oss0 (d8b9b1c6-fdf4-40f1-be3d-9158237ad4cb): online                                                                                
        st-riloe-phys-oss1      (stonith:external/riloe) Started                                                                             


2) When I try to clean it, it does not work:
============================================

root at rhel5-nfs@phys-oss3:~# date
Fri Jul  2 14:50:15 CEST 2010


root at rhel5-nfs@phys-oss3:~# crm resource cleanup st-riloe-phys-oss1 phys-oss3
Cleaning up st-riloe-phys-oss1 on phys-oss3

crm_mon:

Failed actions:
    st-riloe-phys-oss1_start_0 (node=phys-oss3, call=25, rc=1, status=complete): unknown error
    st-riloe-phys-oss2_start_0 (node=phys-oss0, call=25, rc=1, status=complete): unknown error
Failed actions:
    st-riloe-phys-oss1_start_0 (node=phys-oss3, call=25, rc=1, status=complete): unknown error
    st-riloe-phys-oss2_start_0 (node=phys-oss0, call=25, rc=1, status=complete): unknown error


root at rhel5-nfs@phys-oss3:~# tail /var/log/ha-log
Jul 02 14:48:40 phys-oss3 crmd: [18056]: info: ais_status_callback: status: phys-oss2 is now lost (was member)
Jul 02 14:48:40 phys-oss3 crmd: [18056]: info: crm_update_peer: Node phys-oss2: id=4 state=lost (new) addr=(null) votes=-1 born=5 seen=6 
proc=00000000000000000000000000000200
Jul 02 14:48:40 phys-oss3 crmd: [18056]: info: erase_node_from_join: Removed node phys-oss1 from join calculations: welcomed=0 itegrated=0 
finalized=0 confirmed=1
Jul 02 14:48:40 phys-oss3 crmd: [18056]: info: erase_node_from_join: Removed node phys-oss2 from join calculations: welcomed=0 itegrated=0 
finalized=0 confirmed=1
Jul 02 14:48:40 phys-oss3 crmd: [18056]: info: populate_cib_nodes_ha: Requesting the list of configured nodes
Jul 02 14:48:40 phys-oss3 cib: [18052]: info: cib_process_request: Operation complete: op cib_modify for section nodes 
(origin=local/crmd/133, version=0.735.1): ok (rc=0)
Jul 02 14:50:23 phys-oss3 crmd: [18056]: notice: do_lrm_invoke: Not creating resource for a delete event: (null)
Jul 02 14:50:23 phys-oss3 crmd: [18056]: info: send_direct_ack: ACK'ing resource op st-riloe-phys-oss1_delete_60000 from 0:0:crm-
resource-21728: lrm_invoke-lrmd-1278075023-300
Jul 02 14:51:14 phys-oss3 crmd: [18056]: notice: do_lrm_invoke: Not creating resource for a delete event: (null)
Jul 02 14:51:14 phys-oss3 crmd: [18056]: info: send_direct_ack: ACK'ing resource op st-riloe-phys-oss1_delete_60000 from 0:0:crm-
resource-21797: lrm_invoke-lrmd-1278075074-302



Any ideas?

Thanks,
Bernd




More information about the Pacemaker mailing list