[ClusterLabs] stonith - no route to host
Oscar Salvador
osalvador.vilardaga at gmail.com
Tue Jun 16 18:10:52 UTC 2015
2015-06-16 19:30 GMT+02:00 Digimer <lists at alteeve.ca>:
> On 16/06/15 04:18 AM, Oscar Salvador wrote:
> >
> >
> > 2015-06-16 5:59 GMT+02:00 Andrew Beekhof <andrew at beekhof.net
> > <mailto:andrew at beekhof.net>>:
> >
> >
> > > On 16 Jun 2015, at 12:00 am, Oscar Salvador <
> osalvador.vilardaga at gmail.com
> > <mailto:osalvador.vilardaga at gmail.com>> wrote:
> > >
> > > Hi,
> > >
> > > I've configured a fencing with libvirt, but I'm having some
> > problem with stonith, due to the error "no route to host”
> >
> > That message is a bit wonky.
> > What it really means is that there were no devices that advertise
> > the ability to fence that node.
> >
> > In this case, pacemaker wants to fence “server” but hostlist is set
> > to server.fqdn
> > Drop the .fqdn and it should work
> >
> >
> > Get rid of the +fqdn was not an option, sorry, but I could fix it in
> > another way with the help of digimer.
> > I've used the fence_virsh, from fence_agents.
> >
> > First of all I configured it in this way:
> >
> > /primitive fence_server01 stonith:fence_virsh \
> > /
> > / params ipaddr=virtnode01 port=server01.fqdn action=reboot
> > login=root passwd=passwd delay=15 \/
> > / op monitor interval=60s /
> > /primitive fence_server02 stonith:fence_virsh \/
> > / params ipaddr=virtnode02 port=server02.fqdn action=reboot
> > login=root passwd=passwd delay=15 \/
> > / op monitor interval=60s /
> > /
> > /
> >
> > But when I tried to fence a node, I received this errors:
> >
> > 1.
> > Jun 16 09:37:59 [1298] server01 pengine: warning: pe_fence_node:
> > Node server02 will be fenced because p_fence_server01 is thought
> > to be active there
> > 2.
> > Jun 16 09:37:59 [1299] server01 crmd: notice: te_fence_node:
> > Executing reboot fencing operation (12) on server02
> (timeout=60000)
> > 3.
> > Jun 16 09:37:59 [1295] server01 stonithd: notice:
> > handle_request: Client crmd.1299.d339ea94 wants to fence (reboot)
> > 'server02' with device '(any)'
> > 4.
> > Jun 16 09:37:59 [1295] server01 stonithd: notice:
> > initiate_remote_stonith_op: Initiating remote operation
> > reboot for server02: 19fdb8e0-2611-45a7-b44d-b58fa0e99cab (0)
> > 5.
> > Jun 16 09:37:59 [1297] server01 attrd: info:
> > attrd_cib_callback: Update 12 for probe_complete: OK (0)
> > 6.
> > Jun 16 09:37:59 [1297] server01 attrd: info:
> > attrd_cib_callback: Update 12 for
> > probe_complete[server01]=true: OK (0)
> > 7.
> > Jun 16 09:37:59 [1295] server01 stonithd: notice:
> > can_fence_host_with_device: p_fence_server02 can not fence
> > (reboot) server02: dynamic-list
> > 8.
> > Jun 16 09:37:59 [1295] server01 stonithd: info:
> > process_remote_stonith_query: All queries have arrived,
> > continuing (1, 1, 1, 19fdb8e0-2611-45a7-b44d-b58fa0e99cab)
> > 9.
> > Jun 16 09:37:59 [1295] server01 stonithd: notice:
> > stonith_choose_peer: Couldn't find anyone to fence server02
> > with <any>
> > 10.
> > Jun 16 09:37:59 [1295] server01 stonithd: info:
> > call_remote_stonith: Total remote op timeout set to 60 for
> > fencing of node server02 for crmd.1299.19fdb8e0
> > 11.
> > Jun 16 09:37:59 [1295] server01 stonithd: info:
> > call_remote_stonith: None of the 1 peers have devices capable
> > of terminating server02 for crmd.1299 (0)
> > 12.
> > Jun 16 09:37:59 [1295] server01 stonithd: warning:
> > get_xpath_object: No match for //@st_delegate in /st-reply
> > 13.
> > Jun 16 09:37:59 [1295] server01 stonithd: error:
> > remote_op_done: Operation reboot of server02 by server01 for
> > crmd.1299 at server01.19fdb8e0: No such device
> > 14.
> > Jun 16 09:37:59 [1299] server01 crmd: notice:
> > tengine_stonith_callback: Stonith operation
> > 3/12:1:0:a989fb7b-1af1-4bac-992b-eef416e25775: No such device (-19)
> > 15.
> > Jun 16 09:37:59 [1299] server01 crmd: notice:
> > tengine_stonith_callback: Stonith operation 3 for server02 failed
> > (No such device): aborting transition.
> > 16.
> > Jun 16 09:37:59 [1299] server01 crmd: notice:
> > abort_transition_graph: Transition aborted: Stonith failed
> > (source=tengine_stonith_callback:697, 0)
> > 17.
> > Jun 16 09:37:59 [1299] server01 crmd: notice:
> > tengine_stonith_notify: Peer server02 was not terminated (reboot)
> > by server01 for server01: No such device
> > (ref=19fdb8e0-2611-45a7-b44d-b58fa0e99cab) by client crmd.1299
> >
> >
> > So, I had to put *pcmk_host_list *parameter, like:
> >
> > primitive fence_server01 stonith:fence_virsh \
> > params ipaddr=virtnode01 port=server01.fqdn action=reboot
> > login=root passwd=passwd delay=15 pcmk_host_list=server01 \
> > op monitor interval=60s
> > primitive fence_server02 stonith:fence_virsh \
> > params ipaddr=virtnode02 port=server02.fqdn action=reboot
> > login=root passwd=passwd delay=15 pcmk_host_list=server02 \
> > op monitor interval=60s
> >
> > Could you explain me, why? I hope that this doesn't not sound rough,
> > it's only I don't understand why.
> >
> > Thank you very much
> > Oscar Salvador
>
> Don't use 'delay="15"' on both nodes! It's means to give one node a
> head-start over the other to help avoid a 'dual fence'. The node that
> has the delay will live while the node without a delay will die in a
> case where communications fails and both nodes try to fence the other at
> the same time.
>
> Say you have 'delay="15"' on 'server01'; Both start to fence, server01
> looks up how to fence server02, sees no delay and immediately fences.
> Meanwhile, 'server02' looks up how to fence 'server01', sees a delay and
> pauses. If server01 was really dead, after 15 seconds, it would proceed
> with the fence action. However, if server01 is alive, server02 will die
> long before it's pause expires.
>
>
Hey Digimer, I know, actually in my config I have only one "delay"
specified for this purpose. Maybe was an copy/paste error.
Thanks anyway ;)
Oscar Salvador
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150616/8357a345/attachment.htm>
More information about the Users
mailing list