[ClusterLabs] fence_vbox Unable to connect/login to fencing device
Ken Gaillot
kgaillot at redhat.com
Thu Jul 6 11:31:43 EDT 2017
On 07/06/2017 10:29 AM, Ken Gaillot wrote:
> On 07/06/2017 10:13 AM, ArekW wrote:
>> Hi,
>>
>> It seems that my the fence_vbox is running but there are errors in
>> logs every few minutes like:
>>
>> Jul 6 12:51:12 nfsnode1 fence_vbox: Unable to connect/login to fencing device
>> Jul 6 12:51:13 nfsnode1 stonith-ng[7899]: warning: fence_vbox[30220]
>> stderr: [ Unable to connect/login to fencing device ]
>> Jul 6 12:51:13 nfsnode1 stonith-ng[7899]: warning: fence_vbox[30220]
>> stderr: [ ]
>> Jul 6 12:51:13 nfsnode1 stonith-ng[7899]: warning: fence_vbox[30220]
>> stderr: [ ]
>>
>> Eventually after fome time the pcs status shows Failed Actions:
>>
>> # pcs status --full
>> Cluster name: nfscluster
>> Stack: corosync
>> Current DC: nfsnode1 (1) (version 1.1.15-11.el7_3.5-e174ec8) -
>> partition with quorum
>> Last updated: Thu Jul 6 13:02:52 2017 Last change: Thu Jul
>> 6 13:00:33 2017 by root via crm_resource on nfsnode1
>>
>> 2 nodes and 11 resources configured
>>
>> Online: [ nfsnode1 (1) nfsnode2 (2) ]
>>
>> Full list of resources:
>>
>> Master/Slave Set: StorageClone [Storage]
>> Storage (ocf::linbit:drbd): Master nfsnode1
>> Storage (ocf::linbit:drbd): Master nfsnode2
>> Masters: [ nfsnode1 nfsnode2 ]
>> Clone Set: dlm-clone [dlm]
>> dlm (ocf::pacemaker:controld): Started nfsnode1
>> dlm (ocf::pacemaker:controld): Started nfsnode2
>> Started: [ nfsnode1 nfsnode2 ]
>> vbox-fencing (stonith:fence_vbox): Started nfsnode1
>> Clone Set: ClusterIP-clone [ClusterIP] (unique)
>> ClusterIP:0 (ocf::heartbeat:IPaddr2): Started nfsnode1
>> ClusterIP:1 (ocf::heartbeat:IPaddr2): Started nfsnode2
>> Clone Set: StorageFS-clone [StorageFS]
>> StorageFS (ocf::heartbeat:Filesystem): Started nfsnode1
>> StorageFS (ocf::heartbeat:Filesystem): Started nfsnode2
>> Started: [ nfsnode1 nfsnode2 ]
>> Clone Set: WebSite-clone [WebSite]
>> WebSite (ocf::heartbeat:apache): Started nfsnode1
>> WebSite (ocf::heartbeat:apache): Started nfsnode2
>> Started: [ nfsnode1 nfsnode2 ]
>>
>> Failed Actions:
>> * vbox-fencing_start_0 on nfsnode1 'unknown error' (1): call=157,
>> status=Error, exitreason='none',
>> last-rc-change='Thu Jul 6 13:58:04 2017', queued=0ms, exec=11947ms
>> * vbox-fencing_start_0 on nfsnode2 'unknown error' (1): call=57,
>> status=Error, exitreason='none',
>> last-rc-change='Thu Jul 6 13:58:16 2017', queued=0ms, exec=11953ms
>>
>> The fence was created with command:
>> pcs -f stonith_cfg stonith create vbox-fencing fence_vbox ip=10.0.2.2
>> ipaddr=10.0.2.2 login=AW23321 username=AW23321
>> identity_file=/root/.ssh/id_rsa host_os=windows
>> pcmk_host_check=static-list pcmk_host_list="centos1 centos2"
>> vboxmanage_path="/cygdrive/c/Program\
>> Files/Oracle/VirtualBox/VBoxManage" op monitor interval=5
>>
>> where centos1 and centos2 are VBox machines names (not hostnames). I
>> used duplicated login/username parameters as it is indicated as
>> required in stonith description fence_vbox.
>>
>> Then I updated the configuration and set:
>>
>> pcs stonith update vbox-fencing pcmk_host_list="nfsnode1 nfsnode2"
>> pcs stonith update vbox-fencing
>> pcmk_host_map="nfsnode1:centos1;nfsnode2:centos2"
>>
>> where nfsnode1 and nfsnode2 are the hostnames
>>
>> I'not sure which config is correct but both shows Failed Actions after
>> some time.
>
> You only need one of pcmk_host_list or pcmk_host_map. Use pcmk_host_list
> if fence_vbox recognizes the node names used by the cluster, or
> pcmk_host_map if fence_vbox knows the nodes by other names. In this
> case, it looks like you want to tell fence_vbox to use "centos2" when
> the cluster wants to fence nfsnode2, so your pcmk_host_map is the right
> choice.
>
>> I've successfully tested the fence connection to the VBox host with:
>> fence_vbox --ip 10.0.2.2 --username=AW23321
>> --identity-file=/root/.ssh/id_rsa --plug=centos2 --host-os=windows
>> --action=status --vboxmanage-path="/cygdrive/c/Program\
>> Files/Oracle/VirtualBox/VBoxManage"
>>
>> Why the above configuration work as standalone command and does not
>> work in pcs ?
> Two main possibilities: you haven't expressed those identical options in
> the cluster configuration correctly; or, you have some permissions on
> the command line that the cluster doesn't have (maybe SELinux, or file
> permissions, or ...).
Forgot one other possibility: the status shows that the *start* action
is what failed, not a fence action. Check the fence_vbox source code to
see what start does, and try to do that manually step by step.
More information about the Users
mailing list