[Pacemaker] cluster got stuck on stopping resources

Andreas Kurz andreas.kurz at linbit.com
Mon Jun 7 06:13:41 EDT 2010


Hi all,

I observed a strange behaviour when trying to stop two resources with latest 
pacemaker:

I updated two resources (ping) and changed some constraints. One of the 
changed resources is mentioned in the logs with "strange" lrmd messages :

...
 Jun 07 10:16:58 emahqwienfw1b crmd: [31354]: ERROR: do_lrm_rsc_op: Operation 
monitor on res_ping_ABC failed: -1
Jun 07 10:16:58 emahqwienfw1b lrmd: [31351]: notice: on_msg_perform_op: 
resource res_ping_ABC is frozen, no ops can run.
Jun 07 10:16:58 emahqwienfw1b lrmd: [31351]: debug: RA output [dummy status to 
fool heartbeat
] didn't match any pattern
Jun 07 10:16:58 emahqwienfw1b crmd: [31354]: WARN: do_log: FSA: Input I_FAIL 
from do_lrm_rsc_op() received in state S_TRANSITI
ON_ENGINE
Jun 07 10:16:58 emahqwienfw1b crmd: [31354]: info: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_POLICY_ENGIN
E [ input=I_FAIL cause=C_FSA_INTERNAL origin=do_lrm_rsc_op ]
....

Then I try to stop two other resources (part of a group) and nothing happens. 
One of this resources is a dependency of  res_ping_ABC that is mentioned as 
"frozen" by the lrmd. 

Running ptest -L shows that pengine knows what to do (stop the two resources 
and all dependencies).


Any ideas? hb_report is attached .... I left the cluster in this state so if 
there is anything else I should provide for debugging please tell me.

Regards,
Andreas

-------------- next part --------------
A non-text attachment was scrubbed...
Name: resource_stop-stucks.tar.bz2
Type: application/x-bzip-compressed-tar
Size: 52834 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100607/eb651c43/attachment-0002.bin>


More information about the Pacemaker mailing list