<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">2015-06-16 19:30 GMT+02:00 Digimer <span dir="ltr"><<a href="mailto:lists@alteeve.ca" target="_blank">lists@alteeve.ca</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 16/06/15 04:18 AM, Oscar Salvador wrote:<br>
><br>
><br>
> 2015-06-16 5:59 GMT+02:00 Andrew Beekhof <<a href="mailto:andrew@beekhof.net">andrew@beekhof.net</a><br>
</span>> <mailto:<a href="mailto:andrew@beekhof.net">andrew@beekhof.net</a>>>:<br>
<span class="">><br>
><br>
> > On 16 Jun 2015, at 12:00 am, Oscar Salvador <<a href="mailto:osalvador.vilardaga@gmail.com">osalvador.vilardaga@gmail.com</a><br>
</span><span class="">> <mailto:<a href="mailto:osalvador.vilardaga@gmail.com">osalvador.vilardaga@gmail.com</a>>> wrote:<br>
> ><br>
> > Hi,<br>
> ><br>
> > I've configured a fencing with libvirt, but I'm having some<br>
> problem with stonith, due to the error "no route to host”<br>
><br>
> That message is a bit wonky.<br>
> What it really means is that there were no devices that advertise<br>
> the ability to fence that node.<br>
><br>
> In this case, pacemaker wants to fence “server” but hostlist is set<br>
> to server.fqdn<br>
> Drop the .fqdn and it should work<br>
><br>
><br>
> Get rid of the +fqdn was not an option, sorry, but I could fix it in<br>
> another way with the help of digimer.<br>
> I've used the fence_virsh, from fence_agents.<br>
><br>
> First of all I configured it in this way:<br>
><br>
</span>> /primitive fence_server01 stonith:fence_virsh \<br>
> /<br>
> / params ipaddr=virtnode01 port=server01.fqdn action=reboot<br>
> login=root passwd=passwd delay=15 \/<br>
> / op monitor interval=60s /<br>
> /primitive fence_server02 stonith:fence_virsh \/<br>
> / params ipaddr=virtnode02 port=server02.fqdn action=reboot<br>
> login=root passwd=passwd delay=15 \/<br>
> / op monitor interval=60s /<br>
> /<br>
> /<br>
<span class="">><br>
> But when I tried to fence a node, I received this errors:<br>
><br>
</span>> 1.<br>
<span class="">> Jun 16 09:37:59 [1298] server01 pengine: warning: pe_fence_node:<br>
> Node server02 will be fenced because p_fence_server01 is thought<br>
> to be active there<br>
</span>> 2.<br>
<span class="">> Jun 16 09:37:59 [1299] server01 crmd: notice: te_fence_node:<br>
> Executing reboot fencing operation (12) on server02 (timeout=60000)<br>
</span>> 3.<br>
<span class="">> Jun 16 09:37:59 [1295] server01 stonithd: notice:<br>
> handle_request: Client crmd.1299.d339ea94 wants to fence (reboot)<br>
> 'server02' with device '(any)'<br>
</span>> 4.<br>
<span class="">> Jun 16 09:37:59 [1295] server01 stonithd: notice:<br>
> initiate_remote_stonith_op: Initiating remote operation<br>
> reboot for server02: 19fdb8e0-2611-45a7-b44d-b58fa0e99cab (0)<br>
</span>> 5.<br>
<span class="">> Jun 16 09:37:59 [1297] server01 attrd: info:<br>
> attrd_cib_callback: Update 12 for probe_complete: OK (0)<br>
</span>> 6.<br>
<span class="">> Jun 16 09:37:59 [1297] server01 attrd: info:<br>
> attrd_cib_callback: Update 12 for<br>
> probe_complete[server01]=true: OK (0)<br>
</span>> 7.<br>
<span class="">> Jun 16 09:37:59 [1295] server01 stonithd: notice:<br>
> can_fence_host_with_device: p_fence_server02 can not fence<br>
> (reboot) server02: dynamic-list<br>
</span>> 8.<br>
<span class="">> Jun 16 09:37:59 [1295] server01 stonithd: info:<br>
> process_remote_stonith_query: All queries have arrived,<br>
> continuing (1, 1, 1, 19fdb8e0-2611-45a7-b44d-b58fa0e99cab)<br>
</span>> 9.<br>
<span class="">> Jun 16 09:37:59 [1295] server01 stonithd: notice:<br>
> stonith_choose_peer: Couldn't find anyone to fence server02<br>
> with <any><br>
</span>> 10.<br>
<span class="">> Jun 16 09:37:59 [1295] server01 stonithd: info:<br>
> call_remote_stonith: Total remote op timeout set to 60 for<br>
> fencing of node server02 for crmd.1299.19fdb8e0<br>
</span>> 11.<br>
<span class="">> Jun 16 09:37:59 [1295] server01 stonithd: info:<br>
> call_remote_stonith: None of the 1 peers have devices capable<br>
> of terminating server02 for crmd.1299 (0)<br>
</span>> 12.<br>
<span class="">> Jun 16 09:37:59 [1295] server01 stonithd: warning:<br>
> get_xpath_object: No match for //@st_delegate in /st-reply<br>
</span>> 13.<br>
<span class="">> Jun 16 09:37:59 [1295] server01 stonithd: error:<br>
> remote_op_done: Operation reboot of server02 by server01 for<br>
> crmd.1299@server01.19fdb8e0: No such device<br>
</span>> 14.<br>
<span class="">> Jun 16 09:37:59 [1299] server01 crmd: notice:<br>
> tengine_stonith_callback: Stonith operation<br>
> 3/12:1:0:a989fb7b-1af1-4bac-992b-eef416e25775: No such device (-19)<br>
</span>> 15.<br>
<span class="">> Jun 16 09:37:59 [1299] server01 crmd: notice:<br>
> tengine_stonith_callback: Stonith operation 3 for server02 failed<br>
> (No such device): aborting transition.<br>
</span>> 16.<br>
<span class="">> Jun 16 09:37:59 [1299] server01 crmd: notice:<br>
> abort_transition_graph: Transition aborted: Stonith failed<br>
> (source=tengine_stonith_callback:697, 0)<br>
</span>> 17.<br>
<span class="">> Jun 16 09:37:59 [1299] server01 crmd: notice:<br>
> tengine_stonith_notify: Peer server02 was not terminated (reboot)<br>
> by server01 for server01: No such device<br>
> (ref=19fdb8e0-2611-45a7-b44d-b58fa0e99cab) by client crmd.1299<br>
><br>
><br>
</span>> So, I had to put *pcmk_host_list *parameter, like:<br>
<span class="">><br>
> primitive fence_server01 stonith:fence_virsh \<br>
> params ipaddr=virtnode01 port=server01.fqdn action=reboot<br>
> login=root passwd=passwd delay=15 pcmk_host_list=server01 \<br>
> op monitor interval=60s<br>
> primitive fence_server02 stonith:fence_virsh \<br>
> params ipaddr=virtnode02 port=server02.fqdn action=reboot<br>
> login=root passwd=passwd delay=15 pcmk_host_list=server02 \<br>
> op monitor interval=60s<br>
><br>
> Could you explain me, why? I hope that this doesn't not sound rough,<br>
> it's only I don't understand why.<br>
><br>
> Thank you very much<br>
> Oscar Salvador<br>
<br>
</span>Don't use 'delay="15"' on both nodes! It's means to give one node a<br>
head-start over the other to help avoid a 'dual fence'. The node that<br>
has the delay will live while the node without a delay will die in a<br>
case where communications fails and both nodes try to fence the other at<br>
the same time.<br>
<br>
Say you have 'delay="15"' on 'server01'; Both start to fence, server01<br>
looks up how to fence server02, sees no delay and immediately fences.<br>
Meanwhile, 'server02' looks up how to fence 'server01', sees a delay and<br>
pauses. If server01 was really dead, after 15 seconds, it would proceed<br>
with the fence action. However, if server01 is alive, server02 will die<br>
long before it's pause expires.<br>
<span class="HOEnZb"><font color="#888888"><br></font></span></blockquote><div><br></div></div>Hey Digimer, I know, actually in my config I have only one "delay" specified for this purpose. Maybe was an copy/paste error.<br></div><div class="gmail_extra">Thanks anyway ;)<br><br></div><div class="gmail_extra">Oscar Salvador<br></div></div>