<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">2015-06-16 19:30 GMT+02:00 Digimer <span dir="ltr"><<a href="mailto:lists@alteeve.ca" target="_blank">lists@alteeve.ca</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 16/06/15 04:18 AM, Oscar Salvador wrote:<br>

><br>

><br>

> 2015-06-16 5:59 GMT+02:00 Andrew Beekhof <<a href="mailto:andrew@beekhof.net">andrew@beekhof.net</a><br>

</span>> <mailto:<a href="mailto:andrew@beekhof.net">andrew@beekhof.net</a>>>:<br>

<span class="">><br>

><br>

>     > On 16 Jun 2015, at 12:00 am, Oscar Salvador <<a href="mailto:osalvador.vilardaga@gmail.com">osalvador.vilardaga@gmail.com</a><br>

</span><span class="">>     <mailto:<a href="mailto:osalvador.vilardaga@gmail.com">osalvador.vilardaga@gmail.com</a>>> wrote:<br>

>     ><br>

>     > Hi,<br>

>     ><br>

>     > I've configured a fencing with libvirt, but I'm having some<br>

>     problem with stonith, due to the error "no route to host”<br>

><br>

>     That message is a bit wonky.<br>

>     What it really means is that there were no devices that advertise<br>

>     the ability to fence that node.<br>

><br>

>     In this case, pacemaker wants to fence “server” but hostlist is set<br>

>     to server.fqdn<br>

>     Drop the .fqdn and it should work<br>

><br>

><br>

> Get rid of the +fqdn was not an option, sorry, but I could fix it in<br>

> another way with the help of digimer.<br>

> I've used the fence_virsh, from fence_agents.<br>

><br>

> First of all I configured it in this way:<br>

><br>

</span>> /primitive fence_server01 stonith:fence_virsh \<br>

> /<br>

> /        params ipaddr=virtnode01 port=server01.fqdn action=reboot<br>

> login=root passwd=passwd delay=15  \/<br>

> /        op monitor interval=60s /<br>

> /primitive fence_server02 stonith:fence_virsh \/<br>

> /        params ipaddr=virtnode02 port=server02.fqdn action=reboot<br>

> login=root passwd=passwd delay=15  \/<br>

> /        op monitor interval=60s /<br>

> /<br>

> /<br>

<span class="">><br>

> But when I tried to fence a node, I received this errors:<br>

><br>

</span>>  1.<br>

<span class="">>     Jun 16 09:37:59 [1298] server01    pengine:  warning: pe_fence_node:<br>

>         Node server02 will be fenced because p_fence_server01 is thought<br>

>     to be active there<br>

</span>>  2.<br>

<span class="">>     Jun 16 09:37:59 [1299] server01       crmd:   notice: te_fence_node:<br>

>         Executing reboot fencing operation (12) on server02 (timeout=60000)<br>

</span>>  3.<br>

<span class="">>     Jun 16 09:37:59 [1295] server01   stonithd:   notice:<br>

>     handle_request:    Client crmd.1299.d339ea94 wants to fence (reboot)<br>

>     'server02' with device '(any)'<br>

</span>>  4.<br>

<span class="">>     Jun 16 09:37:59 [1295] server01   stonithd:   notice:<br>

>     initiate_remote_stonith_op:        Initiating remote operation<br>

>     reboot for server02: 19fdb8e0-2611-45a7-b44d-b58fa0e99cab (0)<br>

</span>>  5.<br>

<span class="">>     Jun 16 09:37:59 [1297] server01      attrd:     info:<br>

>     attrd_cib_callback:        Update 12 for probe_complete: OK (0)<br>

</span>>  6.<br>

<span class="">>     Jun 16 09:37:59 [1297] server01      attrd:     info:<br>

>     attrd_cib_callback:        Update 12 for<br>

>     probe_complete[server01]=true: OK (0)<br>

</span>>  7.<br>

<span class="">>     Jun 16 09:37:59 [1295] server01   stonithd:   notice:<br>

>     can_fence_host_with_device:        p_fence_server02 can not fence<br>

>     (reboot) server02: dynamic-list<br>

</span>>  8.<br>

<span class="">>     Jun 16 09:37:59 [1295] server01   stonithd:     info:<br>

>     process_remote_stonith_query:      All queries have arrived,<br>

>     continuing (1, 1, 1, 19fdb8e0-2611-45a7-b44d-b58fa0e99cab)<br>

</span>>  9.<br>

<span class="">>     Jun 16 09:37:59 [1295] server01   stonithd:   notice:<br>

>     stonith_choose_peer:       Couldn't find anyone to fence server02<br>

>     with <any><br>

</span>> 10.<br>

<span class="">>     Jun 16 09:37:59 [1295] server01   stonithd:     info:<br>

>     call_remote_stonith:       Total remote op timeout set to 60 for<br>

>     fencing of node server02 for crmd.1299.19fdb8e0<br>

</span>> 11.<br>

<span class="">>     Jun 16 09:37:59 [1295] server01   stonithd:     info:<br>

>     call_remote_stonith:       None of the 1 peers have devices capable<br>

>     of terminating server02 for crmd.1299 (0)<br>

</span>> 12.<br>

<span class="">>     Jun 16 09:37:59 [1295] server01   stonithd:  warning:<br>

>     get_xpath_object:  No match for //@st_delegate in /st-reply<br>

</span>> 13.<br>

<span class="">>     Jun 16 09:37:59 [1295] server01   stonithd:    error:<br>

>     remote_op_done:    Operation reboot of server02 by server01 for<br>

>     crmd.1299@server01.19fdb8e0: No such device<br>

</span>> 14.<br>

<span class="">>     Jun 16 09:37:59 [1299] server01       crmd:   notice:<br>

>     tengine_stonith_callback:  Stonith operation<br>

>     3/12:1:0:a989fb7b-1af1-4bac-992b-eef416e25775: No such device (-19)<br>

</span>> 15.<br>

<span class="">>     Jun 16 09:37:59 [1299] server01       crmd:   notice:<br>

>     tengine_stonith_callback:  Stonith operation 3 for server02 failed<br>

>     (No such device): aborting transition.<br>

</span>> 16.<br>

<span class="">>     Jun 16 09:37:59 [1299] server01       crmd:   notice:<br>

>     abort_transition_graph:    Transition aborted: Stonith failed<br>

>     (source=tengine_stonith_callback:697, 0)<br>

</span>> 17.<br>

<span class="">>     Jun 16 09:37:59 [1299] server01       crmd:   notice:<br>

>     tengine_stonith_notify:    Peer server02 was not terminated (reboot)<br>

>     by server01 for server01: No such device<br>

>     (ref=19fdb8e0-2611-45a7-b44d-b58fa0e99cab) by client crmd.1299<br>

><br>

><br>

</span>> So, I had to put *pcmk_host_list *parameter, like:<br>

<span class="">><br>

> primitive fence_server01 stonith:fence_virsh \<br>

>         params ipaddr=virtnode01 port=server01.fqdn action=reboot<br>

> login=root passwd=passwd delay=15 pcmk_host_list=server01 \<br>

>         op monitor interval=60s<br>

> primitive fence_server02 stonith:fence_virsh \<br>

>         params ipaddr=virtnode02 port=server02.fqdn action=reboot<br>

> login=root passwd=passwd delay=15 pcmk_host_list=server02 \<br>

>         op monitor interval=60s<br>

><br>

> Could you explain me, why? I hope that this doesn't not sound rough,<br>

> it's only I don't understand why.<br>

><br>

> Thank you very much<br>

> Oscar Salvador<br>

<br>

</span>Don't use 'delay="15"' on both nodes! It's means to give one node a<br>

head-start over the other to help avoid a 'dual fence'. The node that<br>

has the delay will live while the node without a delay will die in a<br>

case where communications fails and both nodes try to fence the other at<br>

the same time.<br>

<br>

Say you have 'delay="15"' on 'server01'; Both start to fence, server01<br>

looks up how to fence server02, sees no delay and immediately fences.<br>

Meanwhile, 'server02' looks up how to fence 'server01', sees a delay and<br>

pauses. If server01 was really dead, after 15 seconds, it would proceed<br>

with the fence action. However, if server01 is alive, server02 will die<br>

long before it's pause expires.<br>

<span class="HOEnZb"><font color="#888888"><br></font></span></blockquote><div><br></div></div>Hey Digimer, I know, actually in my config I have only one "delay" specified for this purpose. Maybe was an copy/paste error.<br></div><div class="gmail_extra">Thanks anyway ;)<br><br></div><div class="gmail_extra">Oscar Salvador<br></div></div>