[ClusterLabs] stonith - no route to host

Mon Jun 15 10:00:27 EDT 2015

Hi,

I've configured a fencing with libvirt, but I'm having some problem with
stonith, due to the error "no route to host"

Config:

*node 1053402612: server01*
*node 1053402613: server02 \*
*        attributes standby=off*
*primitive IP-rsc_nginx IPaddr2 \*
*        params ip=xx.xx.xx.xx nic=eth0 cidr_netmask=xx.xx.xy.xx \*
*        meta migration-threshold=2 \*
*        op monitor interval=20 timeout=60 on-fail=restart*
*primitive Nginx-rsc nginx \*
*        meta migration-threshold=2 \*
*        op monitor interval=20 timeout=60 on-fail=restart*
*primitive p_fence_server01 stonith:external/libvirt \*
*        params hostlist=server01.fqdn
hypervisor_uri="qemu+tls://virtnode01:16514/system"*
*primitive p_fence_testlb02 stonith:external/libvirt \*
*        params hostlist=server02.fqdn
hypervisor_uri="qemu+tls://virtnode02:16514/system"*
*location l_fence_server01 p_fence_server01 -inf: server01*
*location l_fence_testlb02 p_fence_testlb02 -inf: server02*
*colocation lb-loc inf: IP-rsc_nginx Nginx-rsc*
*order lb-ord inf: IP-rsc_nginx Nginx-rsc*
*property cib-bootstrap-options: \*
*        stonith-enabled=true \*
*        no-quorum-policy=ignore \*
*        default-resource-stickiness=100 \*
*        last-lrm-refresh=1434360625 \*
*        dc-version=1.1.12-561c4cf \*
*        cluster-infrastructure=corosync*

As you see, in hostlist i'm searching for the host+fqdn, since it's the
name that you can see doing "*virsh list*"
Also, from one node you can ping each other and viceverse doing  only
"server0x", you don't need the full domain.

I was testing stonith, just killing corosync on server02, and I got this
error in the logs:

*Jun 15 14:44:45 [1301] server01   stonithd:    debug:
stonith_action_async_done:         Child process 18649 performing action
'reboot' exited with rc 1*
*Jun 15 14:44:45 [1301] server01   stonithd:     info:
update_remaining_timeout:  Attempted to execute agent fence_legacy (reboot)
the maximum number of times (2) allowed*
*Jun 15 14:44:45 [1301] server01   stonithd:    debug: st_child_done:
Operation 'reboot' on 'p_fence_server02' completed with rc=1 (0 remaining)*
*Jun 15 14:44:45 [1301] server01   stonithd:    error: log_operation:
Operation 'reboot' [18649] (call 13 from crmd.1305) for host 'server02'
with device 'p_fence_server02' returned: -201 (Generic Pacemaker error) *
*Jun 15 14:44:45 [1301] server01   stonithd:  warning: log_operation:
p_fence_server02:18649 [ Performing: stonith -t external/libvirt -T reset
server02 ]*
*Jun 15 14:44:45 [1301] server01   stonithd:  warning: log_operation:
p_fence_server02:18649 [ failed: server02 5 ]*

*Jun 15 14:44:49 [1301] server01   stonithd:    debug: stonith_command:
Processing st_notify reply 0 from server01 (               0)            *
*Jun 15 14:44:49 [1301] server01   stonithd:    debug:
process_remote_stonith_exec:       Marking call to reboot for server02 on
behalf of crmd.1305 at 4281c4bb-9922-4a4d-97f3-706f7d34ec1c.test-lb0: No route
to host (-113) *
*Jun 15 14:44:49 [1301] server01   stonithd:  warning: get_xpath_object:
 No match for //@st_delegate in /st-reply*
*Jun 15 14:44:49 [1301] server01   stonithd:    error: remote_op_done:
 Operation reboot of server02 by server01 for crmd.1305 at server01.4281c4bb:
No route to host*
*Jun 15 14:44:49 [1301] server01   stonithd:    debug: stonith_command:
Processed st_notify reply from server01: OK (0)*
*Jun 15 14:44:49 [1305] server01       crmd:   notice:
tengine_stonith_callback:  Stonith operation
13/14:26:0:9234dba0-9b0d-4047-b4df-d05f9430f101: No route to host (-113) *
*Jun 15 14:44:49 [1305] server01       crmd:   notice:
tengine_stonith_callback:  Stonith operation 13 for server02 failed (No
route to host): aborting transition.*
*Jun 15 14:44:49 [1305] server01       crmd:     info:
abort_transition_graph:    Transition aborted: Stonith failed
(source=tengine_stonith_callback:697, 0)*
*Jun 15 14:44:49 [1305] server01       crmd:   notice:
tengine_stonith_notify:    Peer server02 was not terminated (reboot) by
server01 for server01: No route to host
(ref=4281c4bb-9922-4a4d-97f3-706f7d34ec1c) *
*by client crmd.1305*

I tried manually in this way:

*stonith_admin -V -F server02*

I got the same error, but if I try with the fqdn, like:

stonith_admin -V -F server02+fqdn

Then it works. I don't know why pacemaker can't resolve the host without
the fqdn:

*root at server01 ~# host server02*
*server02+fqdn has address xx.xx.xx.xx*
*root at server01 ~# host server01*
*server01+fqdn has address xx.xx.xx.xy*

*root at server02 ~# host server02*
*server02+fqdn has address xx.xx.xx.xx*
*root at server02 ~# host server01*
*server01+fqdn has address xx.xx.xx.xy*

Anybody has an idea about that?

Thank you very much
Oscar Salvador
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20150615/10d61054/attachment-0002.html>