[Pacemaker] xen-cluster with ocfs2

Wed Sep 12 13:23:15 EDT 2012

Hi,
On Sep 12, 2012, at 6:28 PM, Lars Marowsky-Bree wrote:

> On 2012-09-12T18:01:25, Waldemar Brodkorb <mail at waldemar-brodkorb.de> wrote:
> 
>> Is there no way to handle a power outage of xen01 (virtual box poweroff button), when stonith is disabled?
>> Actually xvm-01 resource can not be started on xen02, because /cluster is not accessible on xen02. 
>> (ls -la /cluster is hanging endlessly, it works when I power on xen01 again)
> 
> You can use a manual fencing ACK.

What does this means? 

In the meanwhile I found the -f 0 option for dlm_controld.pcmk. After activating this option in the ocf script "controld"
and restart of both nodes, I finally can recover from a power outage of one node. No OCFS2 hanging anymore.

I can now set the node which runs the virtual machine resource to standby and the virtual machine is automatically
started on the other node. Bringing it back to online works, too. 

When powering the machine off, the failover works, too. But when the dead machine comes back, I get following
error message when trying to mount the cluster filesystem:
root at xen01:~# mount /dev/drbd/by-res/cluster-ocfs /cluster
mount.ocfs2: Transport endpoint is not connected while mounting /dev/drbd0 on /cluster. Check 'dmesg' for more information on this error.
root at xen01:~# dmesg
[  394.187654] dlm: no local IP address has been set
[  394.188460] dlm: cannot start dlm lowcomms -107
[  394.191062] (mount.ocfs2,4647,0):ocfs2_dlm_init:3001 ERROR: status = -107
[  394.194428] (mount.ocfs2,4647,0):ocfs2_mount_volume:1879 ERROR: status = -107
[  394.201157] ocfs2: Unmounting device (147,0) on (node 0)
[  394.201167] (mount.ocfs2,4647,0):ocfs2_fill_super:1234 ERROR: status = -107
root at xen01:~# /etc/init.d/corosync stop
Stopping corosync daemon: corosync.
root at xen01:~# /etc/init.d/corosync start
Starting corosync daemon: corosync.
root at xen01:~# mount |grep cluster
root at xen01:~# 
root at xen01:~# crm resource list|grep Mount
 Clone Set: Cluster-FS-Mount-Clone [Cluster-FS-Mount]
root at xen01:~# crm resource cleanup Cluster-FS-Mount-Clone
Cleaning up Cluster-FS-Mount:0 on xen01
Cleaning up Cluster-FS-Mount:0 on xen02
Cleaning up Cluster-FS-Mount:1 on xen01
Cleaning up Cluster-FS-Mount:1 on xen02
Waiting for 5 replies from the CRMd..... OK
root at xen01:~# mount |grep cluster
/dev/drbd0 on /cluster type ocfs2 (rw,relat

Strange, isn't it. But may be you are right, playing around with OCFS2 without fencing is not worth the pain.

BTW: the crm_gui is running fine on MacOSX. (hackish compiled, but working) 

best regards
 Waldemar

> But I'd not even bother with OCFS2 (or GFS2, for the matter) if you
> don't have fencing. It's not worth the pain.
> 
> You could use SBD, but since you're running OCFS2 on top of DRBD, you
> can't. For your lab setup though you could use a third VM with an iSCSI
> target as the storage back-end.
> 
> 
> Regards,
>    Lars
> 
> -- 
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>