[ClusterLabs] stonith - no route to host

Tue Jun 16 08:18:55 UTC 2015

2015-06-16 5:59 GMT+02:00 Andrew Beekhof <andrew at beekhof.net>:

>
> > On 16 Jun 2015, at 12:00 am, Oscar Salvador <
> osalvador.vilardaga at gmail.com> wrote:
> >
> > Hi,
> >
> > I've configured a fencing with libvirt, but I'm having some problem with
> stonith, due to the error "no route to host”
>
> That message is a bit wonky.
> What it really means is that there were no devices that advertise the
> ability to fence that node.
>
> In this case, pacemaker wants to fence “server” but hostlist is set to
> server.fqdn
> Drop the .fqdn and it should work
>

Get rid of the +fqdn was not an option, sorry, but I could fix it in
another way with the help of digimer.
I've used the fence_virsh, from fence_agents.

First of all I configured it in this way:

*primitive fence_server01 stonith:fence_virsh \*
*        params ipaddr=virtnode01 port=server01.fqdn action=reboot
login=root passwd=passwd delay=15  \*
*        op monitor interval=60s *
*primitive fence_server02 stonith:fence_virsh \*
*        params ipaddr=virtnode02 port=server02.fqdn action=reboot
login=root passwd=passwd delay=15  \*
*        op monitor interval=60s *

But when I tried to fence a node, I received this errors:

   1. Jun 16 09:37:59 [1298] server01    pengine:  warning: pe_fence_node:
       Node server02 will be fenced because p_fence_server01 is thought to be
   active there
   2. Jun 16 09:37:59 [1299] server01       crmd:   notice: te_fence_node:
       Executing reboot fencing operation (12) on server02 (timeout=60000)
   3. Jun 16 09:37:59 [1295] server01   stonithd:   notice: handle_request:
      Client crmd.1299.d339ea94 wants to fence (reboot) 'server02' with device
   '(any)'
   4. Jun 16 09:37:59 [1295] server01   stonithd:   notice:
   initiate_remote_stonith_op:        Initiating remote operation reboot for
   server02: 19fdb8e0-2611-45a7-b44d-b58fa0e99cab (0)
   5. Jun 16 09:37:59 [1297] server01      attrd:     info:
   attrd_cib_callback:        Update 12 for probe_complete: OK (0)
   6. Jun 16 09:37:59 [1297] server01      attrd:     info:
   attrd_cib_callback:        Update 12 for probe_complete[server01]=true: OK
   (0)
   7. Jun 16 09:37:59 [1295] server01   stonithd:   notice:
   can_fence_host_with_device:        p_fence_server02 can not fence (reboot)
   server02: dynamic-list
   8. Jun 16 09:37:59 [1295] server01   stonithd:     info:
   process_remote_stonith_query:      All queries have arrived, continuing (1,
   1, 1, 19fdb8e0-2611-45a7-b44d-b58fa0e99cab)
   9. Jun 16 09:37:59 [1295] server01   stonithd:   notice:
   stonith_choose_peer:       Couldn't find anyone to fence server02 with <any>
   10. Jun 16 09:37:59 [1295] server01   stonithd:     info:
   call_remote_stonith:       Total remote op timeout set to 60 for fencing of
   node server02 for crmd.1299.19fdb8e0
   11. Jun 16 09:37:59 [1295] server01   stonithd:     info:
   call_remote_stonith:       None of the 1 peers have devices capable of
   terminating server02 for crmd.1299 (0)
   12. Jun 16 09:37:59 [1295] server01   stonithd:  warning:
   get_xpath_object:  No match for //@st_delegate in /st-reply
   13. Jun 16 09:37:59 [1295] server01   stonithd:    error:
   remote_op_done:    Operation reboot of server02 by server01 for
   crmd.1299 at server01.19fdb8e0: No such device
   14. Jun 16 09:37:59 [1299] server01       crmd:   notice:
   tengine_stonith_callback:  Stonith operation
   3/12:1:0:a989fb7b-1af1-4bac-992b-eef416e25775: No such device (-19)
   15. Jun 16 09:37:59 [1299] server01       crmd:   notice:
   tengine_stonith_callback:  Stonith operation 3 for server02 failed (No such
   device): aborting transition.
   16. Jun 16 09:37:59 [1299] server01       crmd:   notice:
   abort_transition_graph:    Transition aborted: Stonith failed
   (source=tengine_stonith_callback:697, 0)
   17. Jun 16 09:37:59 [1299] server01       crmd:   notice:
   tengine_stonith_notify:    Peer server02 was not terminated (reboot) by
   server01 for server01: No such device
   (ref=19fdb8e0-2611-45a7-b44d-b58fa0e99cab) by client crmd.1299

So, I had to put *pcmk_host_list *parameter, like:

primitive fence_server01 stonith:fence_virsh \
        params ipaddr=virtnode01 port=server01.fqdn action=reboot
login=root passwd=passwd delay=15 pcmk_host_list=server01 \
        op monitor interval=60s
primitive fence_server02 stonith:fence_virsh \
        params ipaddr=virtnode02 port=server02.fqdn action=reboot
login=root passwd=passwd delay=15 pcmk_host_list=server02 \
        op monitor interval=60s

Could you explain me, why? I hope that this doesn't not sound rough, it's
only I don't understand why.

Thank you very much
Oscar Salvador
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150616/5322162e/attachment.htm>