[Pacemaker] monitor of stonith resource times out

Kazunori INOUE inouekazu at intellilink.co.jp
Thu Aug 16 06:41:47 UTC 2012


Hi,

I reported it in Bugzilla.
* http://bugs.clusterlabs.org/show_bug.cgi?id=5098

Best Regards,
Kazunori INOUE

(12.08.15 19:08), Andrew Beekhof wrote:
> Ouch.  Yes definitely a bug.
> I seem to recall David looking into something like this recently...
> did you file a bug that he has been working on?
>
> -- Andrew
>
> On Fri, Aug 10, 2012 at 8:08 PM, Kazunori INOUE
> <inouekazu at intellilink.co.jp> wrote:
>> Hi,
>> I am using Pacemaker-1.1.
>> When stonith resource is in fencing, monitor of the stonith resource times out.
>>
>> [test case]
>> 1. add sleep to reset() of stonith-plugin.
>>     i am using libvirt.
>>
>>     # diff -u libvirt.ORG libvirt
>>     --- libvirt.ORG 2012-07-17 13:10:01.000000000 +0900
>>     +++ libvirt     2012-08-10 16:52:31.214366776 +0900
>>     @@ -204,6 +204,7 @@
>>          reset)
>>          # libvirt has no reset so we do a power cycle
>>     +    sleep 3600
>>          libvirt_check_config
>>          libvirt_set_domain_id $2
>>
>>
>> 2. start a stonith resource (libvirt).
>>     - stonith-timeout is 600s
>>     - monitor interval is 10s
>>     - monitor timeout is 60s
>>     * stonith-timeout > monitor interval + monitor timeout
>>
>>     # crm_mon
>>     :
>>     Stack: corosync
>>     Current DC: dev1 (-1788499776) - partition with quorum
>>     Version: 1.1.7-916198b
>>     2 Nodes configured, unknown expected votes
>>     1 Resources configured.
>>     ============
>>     Online: [ dev1 dev2 ]
>>
>>     Full list of resources:
>>     f-dev2  (stonith:external/libvirt):     Started dev1
>>
>>
>> 3. require 'fence' and wait..
>>     # crm node fence dev2
>>
>>     and wait..
>>     then, although monitor of libvirt is performed, since libvirt is in
>>     fencing by added sleep, monitor of the libvirt times out.
>>
>>     # crm_mon
>>     :
>>     ============
>>     Node dev2 (-1822054208): UNCLEAN (online)
>>     Online: [ dev1 ]
>>
>>     Full list of resources:
>>     f-dev2  (stonith:external/libvirt):     Started dev1 FAILED
>>
>>     Failed actions:
>>         f-dev2_monitor_10000 (node=dev1, call=10, rc=1, status=Timed Out): unknown error
>>
>> ----
>> Aug 10 15:39:59 [6559] dev1 stonith-ng:  info: call_remote_stonith:  Requesting that dev1 perform op reboot dev2
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: stonith_command:  Processing st_fence from dev1 (               0)
>> Aug 10 15:39:59 [6559] dev1 stonith-ng:  info: can_fence_host_with_device:  f-dev2 can fence dev2: dynamic-list
>> Aug 10 15:39:59 [6559] dev1 stonith-ng:  info: stonith_fence:  Found 1 matching devices for 'dev2'
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: schedule_stonith_command:  Scheduling reboot on f-dev2 for 15fb4a4b-4d91-4b8e-9e3d-8ca9394ce7e5 (timeout=601ms)
>> Aug 10 15:39:59 [6559] dev1 stonith-ng:  info: stonith_command:  Processed st_fence from dev1: Operation now in progress (-115)
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg:  Appending: plugin=external/libvirt
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg:  Appending: hostlist=dev2
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg:  Appending: hypervisor_uri=qemu+ssh://n7/system
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg:  Appending: action=reboot
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg:  Appending: nodename=dev2
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: make_args:  Performing reboot action for node 'dev2' as 'port=dev2'
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg:  Appending: port=dev2
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: make_args:  Calculated: plugin=external/libvirt
>> hostlist=dev2
>> hypervisor_uri=qemu+ssh://n7/system
>> action=reboot
>> nodename=dev2
>> port=dev2
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: run_stonith_agent:  forking
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: run_stonith_agent:  sending args
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: run_stonith_agent:  Op: reboot on fence_legacy, pid: 6725, timeout: 601s
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: stonith_device_execute:  Operation reboot for node dev2 on f-dev2 now running with pid=6725, timeout=601ms
>> Aug 10 15:40:06 [6560] dev1       lrmd: trace: stonith_create_op:  Sending call options: 00000000, 0
>> Aug 10 15:40:06 [6560] dev1       lrmd: trace: stonith_send_command:  Sending st_execute message to STONITH service, Timeout: 64s
>> Aug 10 15:40:06 [6560] dev1       lrmd: trace: stonith_send_command:  Reply: No data to dump as XML
>> Aug 10 15:40:06 [6560] dev1       lrmd: trace: stonith_send_command:  Async call 8, returning
>> Aug 10 15:40:06 [6560] dev1       lrmd: trace: stonith_api_add_callback:  Added callback to lrmd_stonith_callback for call 8
>> Aug 10 15:40:06 [6559] dev1 stonith-ng: debug: stonith_command:  Processing st_execute from lrmd (               0)
>> Aug 10 15:40:06 [6559] dev1 stonith-ng: debug: schedule_stonith_command:  Scheduling monitor on f-dev2 for 15cf9708-0b22-4e8e-a724-7e58f1014b99 (timeout=64ms)
>> Aug 10 15:40:06 [6559] dev1 stonith-ng:  info: stonith_command:  Processed st_execute from lrmd: Operation now in progress (-115)
>> Aug 10 15:40:59 [6559] dev1 stonith-ng: debug: remote_op_query_timeout:  Operation 15fb4a4b-4d91-4b8e-9e3d-8ca9394ce7e5 for dev2 already in progress
>> Aug 10 15:42:10 [6560] dev1       lrmd: error: stonith_async_timeout_handler:  Async call 8 timed out after 124000ms
>> Aug 10 15:42:10 [6560] dev1       lrmd: trace: stonith_perform_callback:  Invoking callback lrmd_stonith_callback for call 8
>> Aug 10 15:42:10 [6560] dev1       lrmd: trace: stonith_perform_callback:  OP callback activated.
>> Aug 10 15:42:10 [6560] dev1       lrmd: debug: process_lrmd_message:  Processed lrmd_rsc_info operation from eadb7520-fdff-4231-a488-9fde0b45073c: rc=0, reply=0, notify=0, exit=0
>> Aug 10 15:42:10 [6563] dev1       crmd: error: process_lrm_event:  LRM operation f-dev2_monitor_10000 (10) Timed Out (timeout=64000ms)
>>
>> Best Regards,
>> Kazunori INOUE
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list