<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 09/05/2016 03:02 PM, Gabriele Bulfon
wrote:<br>
</div>
<blockquote
cite="mid:13621319.53.1473080520636.JavaMail.sonicle@www"
type="cite">
<div style="font-family: Arial; font-size: 13;">I read docs, looks
like sbd fencing is more about iscsi/fc exposed storage
resources.<br>
Here I have real shared disks (seen from solaris with the format
utility as normal sas disks, but on both nodes).<br>
They are all jbod disks, that ZFS organizes in raidz/mirror
pools, so I have 5 disks on one pool in one node, and the other
5 disks on another pool in one node.<br>
How can sbd work in this situation? Has it already been
used/tested on a Solaris env with ZFS ?<br>
</div>
</blockquote>
<br>
You wouldn't have to have discs at all with sbd. You can just use it
for pacemaker<br>
to be monitored by a hardware-watchdog.<br>
But if you want to add discs it shouldn't really matter how they are
accessed as<br>
long as you can concurrently read/write the block-devices.
Configuration of <br>
caching in the controllers might be an issue as well.<br>
I'm e.g. currently testing with a simple kvm setup using following
virsh-config<br>
for the shared block-device:<br>
<br>
<disk type='file' device='disk'><br>
<driver name='qemu' type='raw' cache='none'/><br>
<source file='SHARED_IMAGE_FILE'/><br>
<target dev='vdb' bus='virtio'/><br>
<shareable/><br>
<address type='pci' domain='0x0000' bus='0x00' slot='0x15'
function='0x0'/><br>
</disk><br>
<br>
Don't know about test-coverage for sbd on Solaris. Actually it
should be independent<br>
of which file-system you are using as you would anyway use a
partition without <br>
filesystem for sbd.<br>
<br>
<blockquote
cite="mid:13621319.53.1473080520636.JavaMail.sonicle@www"
type="cite">
<div style="font-family: Arial; font-size: 13;"><br>
BTW, is there any other possibility other than sbd.<br>
<br>
</div>
</blockquote>
<br>
Probably - see Kens' suggestions.<br>
Excuse me thinking a little unidimensional at the moment<br>
working on some sbd-issue ;-)<br>
And not having a proper fencing-device a watchdog is the last resort
to have something<br>
working reliably. And pacemakers' way to do watchdog is sbd...<br>
<br>
<blockquote
cite="mid:13621319.53.1473080520636.JavaMail.sonicle@www"
type="cite">
<div style="font-family: Arial; font-size: 13;">Last but not
least, is there any way to let ssh-fencing be considered good?<br>
At the moment, with ssh-fencing, if I shut down the second node,
I get all second resources in UNCLEAN state, not taken by the
first one.<br>
If I reboot the second , I only get the node on again, but
resources remain stopped.<br>
</div>
</blockquote>
<br>
Strange... What do the logs say about the fencing-action being
successful or not?<br>
<br>
<blockquote
cite="mid:13621319.53.1473080520636.JavaMail.sonicle@www"
type="cite">
<div style="font-family: Arial; font-size: 13;"><br>
I remember my tests with heartbeat react different (halt would
move everything to node1 and get back everything on restart)<br>
<br>
Gabriele<br>
<br>
<div id="wt-mailcard">
<div style="font-family: Arial;">----------------------------------------------------------------------------------------<br>
</div>
<div style="font-family: Arial;"><b>Sonicle S.r.l. </b>: <a
moz-do-not-send="true" href="http://www.sonicle.com/"
target="_new"><a class="moz-txt-link-freetext" href="http://www.sonicle.com">http://www.sonicle.com</a></a></div>
<div style="font-family: Arial;"><b>Music: </b><a
moz-do-not-send="true"
href="http://www.gabrielebulfon.com/" target="_new"><a class="moz-txt-link-freetext" href="http://www.gabrielebulfon.com">http://www.gabrielebulfon.com</a></a></div>
<div style="font-family: Arial;"><b>Quantum Mechanics : </b><a
moz-do-not-send="true"
href="http://www.cdbaby.com/cd/gabrielebulfon"
target="_new"><a class="moz-txt-link-freetext" href="http://www.cdbaby.com/cd/gabrielebulfon">http://www.cdbaby.com/cd/gabrielebulfon</a></a></div>
</div>
<tt><br>
<br>
<br>
----------------------------------------------------------------------------------<br>
<br>
Da: Klaus Wenninger <a class="moz-txt-link-rfc2396E" href="mailto:kwenning@redhat.com"><kwenning@redhat.com></a><br>
A: <a class="moz-txt-link-abbreviated" href="mailto:users@clusterlabs.org">users@clusterlabs.org</a> <br>
Data: 5 settembre 2016 12.21.25 CEST<br>
Oggetto: Re: [ClusterLabs] ip clustering strange behaviour<br>
<br>
</tt>
<blockquote style="BORDER-LEFT: #000080 2px solid; MARGIN-LEFT:
5px; PADDING-LEFT: 5px"><tt>On 09/05/2016 11:20 AM, Gabriele
Bulfon wrote:<br>
> The dual machine is equipped with a syncro controller
LSI 3008 MPT SAS3.<br>
> Both nodes can see the same jbod disks (10 at the
moment, up to 24).<br>
> Systems are XStreamOS / illumos, with ZFS.<br>
> Each system has one ZFS pool of 5 disks, with different
pool names<br>
> (data1, data2).<br>
> When in active / active, the two machines run different
zones and<br>
> services on their pools, on their networks.<br>
> I have custom resource agents (tested on
pacemaker/heartbeat, now<br>
> porting to pacemaker/corosync) for ZFS pools and zones
migration.<br>
> When I was testing pacemaker/heartbeat, when
ssh-fencing discovered<br>
> the other node to be down (cleanly or abrupt halt), it
was<br>
> automatically using IPaddr and our ZFS agents to take
control of<br>
> everything, mounting the other pool and running any
configured zone in it.<br>
> I would like to do the same with pacemaker/corosync.<br>
> The two nodes of the dual machine have an inernal lan
connecting them,<br>
> a 100Mb ethernet: maybe this is enough reliable to
trust ssh-fencing?<br>
> Or is there anything I can do to ensure at the
controller level that<br>
> the pool is not in use on the other node?<br>
<br>
It is not just about the reliability of the
networking-connection why<br>
ssh-fencing might be<br>
suboptimal. Something with the IP-Stack config (dynamic due
to moving<br>
resources)<br>
might have gone wrong. And resources might be somehow
hanging so that<br>
the node<br>
can't be brought down gracefully. Thus my suggestion to add
a watchdog<br>
(so available)<br>
via sbd.<br>
<br>
><br>
> Gabriele<br>
><br>
>
----------------------------------------------------------------------------------------<br>
> *Sonicle S.r.l. *: <a class="moz-txt-link-freetext" href="http://www.sonicle.com">http://www.sonicle.com</a>
<a class="moz-txt-link-rfc2396E" href="http://www.sonicle.com/"><http://www.sonicle.com/></a><br>
> *Music: *<a class="moz-txt-link-freetext" href="http://www.gabrielebulfon.com">http://www.gabrielebulfon.com</a>
<a class="moz-txt-link-rfc2396E" href="http://www.gabrielebulfon.com/"><http://www.gabrielebulfon.com/></a><br>
> *Quantum Mechanics :
*<a class="moz-txt-link-freetext" href="http://www.cdbaby.com/cd/gabrielebulfon">http://www.cdbaby.com/cd/gabrielebulfon</a><br>
><br>
><br>
><br>
>
----------------------------------------------------------------------------------<br>
><br>
> Da: Ken Gaillot <a class="moz-txt-link-rfc2396E" href="mailto:kgaillot@redhat.com"><kgaillot@redhat.com></a><br>
> A: <a class="moz-txt-link-abbreviated" href="mailto:gbulfon@sonicle.com">gbulfon@sonicle.com</a> Cluster Labs - All topics
related to<br>
> open-source clustering welcomed
<a class="moz-txt-link-rfc2396E" href="mailto:users@clusterlabs.org"><users@clusterlabs.org></a><br>
> Data: 1 settembre 2016 15.49.04 CEST<br>
> Oggetto: Re: [ClusterLabs] ip clustering strange
behaviour<br>
><br>
> On 08/31/2016 11:50 PM, Gabriele Bulfon wrote:<br>
> > Thanks, got it.<br>
> > So, is it better to use "two_node: 1" or, as
suggested else<br>
> where, or<br>
> > "no-quorum-policy=stop"?<br>
><br>
> I'd prefer "two_node: 1" and letting pacemaker's
options default. But<br>
> see the votequorum(5) man page for what two_node
implies -- most<br>
> importantly, both nodes have to be available when the
cluster starts<br>
> before it will start any resources. Node failure is
handled fine once<br>
> the cluster has started, but at start time, both nodes
must be up.<br>
><br>
> > About fencing, the machine I'm going to implement
the 2-nodes<br>
> cluster is<br>
> > a dual machine with shared disks backend.<br>
> > Each node has two 10Gb ethernets dedicated to the
public ip and the<br>
> > admin console.<br>
> > Then there is a third 100Mb ethernet connecing the
two machines<br>
> internally.<br>
> > I was going to use this last one as fencing via
ssh, but looks<br>
> like this<br>
> > way I'm not gonna have ip/pool/zone movements if
one of the nodes<br>
> > freezes or halts without shutting down pacemaker
clean.<br>
> > What should I use instead?<br>
><br>
> I'm guessing as a dual machine, they share a power
supply, so that<br>
> rules<br>
> out a power switch. If the box has IPMI that can
individually power<br>
> cycle each host, you can use fence_ipmilan. If the
disks are<br>
> shared via<br>
> iSCSI, you could use fence_scsi. If the box has a
hardware watchdog<br>
> device that can individually target the hosts, you
could use sbd. If<br>
> none of those is an option, probably the best you could
do is run the<br>
> cluster nodes as VMs on each host, and use fence_xvm.<br>
><br>
> > Thanks for your help,<br>
> > Gabriele<br>
> ><br>
> ><br>
>
----------------------------------------------------------------------------------------<br>
> > *Sonicle S.r.l. *: <a class="moz-txt-link-freetext" href="http://www.sonicle.com">http://www.sonicle.com</a>
<a class="moz-txt-link-rfc2396E" href="http://www.sonicle.com/"><http://www.sonicle.com/></a><br>
> > *Music: *<a class="moz-txt-link-freetext" href="http://www.gabrielebulfon.com">http://www.gabrielebulfon.com</a><br>
> <a class="moz-txt-link-rfc2396E" href="http://www.gabrielebulfon.com/"><http://www.gabrielebulfon.com/></a><br>
> > *Quantum Mechanics :
*<a class="moz-txt-link-freetext" href="http://www.cdbaby.com/cd/gabrielebulfon">http://www.cdbaby.com/cd/gabrielebulfon</a><br>
> ><br>
> ><br>
> ><br>
> ><br>
>
----------------------------------------------------------------------------------<br>
> ><br>
> > Da: Ken Gaillot <a class="moz-txt-link-rfc2396E" href="mailto:kgaillot@redhat.com"><kgaillot@redhat.com></a><br>
> > A: <a class="moz-txt-link-abbreviated" href="mailto:users@clusterlabs.org">users@clusterlabs.org</a><br>
> > Data: 31 agosto 2016 17.25.05 CEST<br>
> > Oggetto: Re: [ClusterLabs] ip clustering strange
behaviour<br>
> ><br>
> > On 08/30/2016 01:52 AM, Gabriele Bulfon wrote:<br>
> > > Sorry for reiterating, but my main question
was:<br>
> > ><br>
> > > why does node 1 removes its own IP if I shut
down node 2 abruptly?<br>
> > > I understand that it does not take the node 2
IP (because the<br>
> > > ssh-fencing has no clue about what happened
on the 2nd node),<br>
> but I<br>
> > > wouldn't expect it to shut down its own
IP...this would kill any<br>
> > service<br>
> > > on both nodes...what am I wrong?<br>
> ><br>
> > Assuming you're using corosync 2, be sure you have
"two_node: 1" in<br>
> > corosync.conf. That will tell corosync to pretend
there is always<br>
> > quorum, so pacemaker doesn't need any special
quorum settings.<br>
> See the<br>
> > votequorum(5) man page for details. Of course, you
need fencing<br>
> in this<br>
> > setup, to handle when communication between the
nodes is broken<br>
> but both<br>
> > are still up.<br>
> ><br>
> > ><br>
> ><br>
>
----------------------------------------------------------------------------------------<br>
> > > *Sonicle S.r.l. *: <a class="moz-txt-link-freetext" href="http://www.sonicle.com">http://www.sonicle.com</a><br>
> <a class="moz-txt-link-rfc2396E" href="http://www.sonicle.com/"><http://www.sonicle.com/></a><br>
> > > *Music: *<a class="moz-txt-link-freetext" href="http://www.gabrielebulfon.com">http://www.gabrielebulfon.com</a><br>
> > <a class="moz-txt-link-rfc2396E" href="http://www.gabrielebulfon.com/"><http://www.gabrielebulfon.com/></a><br>
> > > *Quantum Mechanics :
*<a class="moz-txt-link-freetext" href="http://www.cdbaby.com/cd/gabrielebulfon">http://www.cdbaby.com/cd/gabrielebulfon</a><br>
> > ><br>
> > ><br>
> ><br>
>
------------------------------------------------------------------------<br>
> > ><br>
> > ><br>
> > > *Da:* Gabriele Bulfon
<a class="moz-txt-link-rfc2396E" href="mailto:gbulfon@sonicle.com"><gbulfon@sonicle.com></a><br>
> > > *A:* <a class="moz-txt-link-abbreviated" href="mailto:kwenning@redhat.com">kwenning@redhat.com</a> Cluster Labs - All
topics related to<br>
> > > open-source clustering welcomed
<a class="moz-txt-link-rfc2396E" href="mailto:users@clusterlabs.org"><users@clusterlabs.org></a><br>
> > > *Data:* 29 agosto 2016 17.37.36 CEST<br>
> > > *Oggetto:* Re: [ClusterLabs] ip clustering
strange behaviour<br>
> > ><br>
> > ><br>
> > > Ok, got it, I hadn't gracefully shut
pacemaker on node2.<br>
> > > Now I restarted, everything was up, stopped
pacemaker service on<br>
> > > host2 and I got host1 with both IPs
configured. ;)<br>
> > ><br>
> > > But, though I understand that if I halt host2
with no grace<br>
> shut of<br>
> > > pacemaker, it will not move the IP2 to Host1,
I don't expect host1<br>
> > > to loose its own IP! Why?<br>
> > ><br>
> > > Gabriele<br>
> > ><br>
> > ><br>
> ><br>
>
----------------------------------------------------------------------------------------<br>
> > > *Sonicle S.r.l. *: <a class="moz-txt-link-freetext" href="http://www.sonicle.com">http://www.sonicle.com</a><br>
> <a class="moz-txt-link-rfc2396E" href="http://www.sonicle.com/"><http://www.sonicle.com/></a><br>
> > > *Music: *<a class="moz-txt-link-freetext" href="http://www.gabrielebulfon.com">http://www.gabrielebulfon.com</a><br>
> > <a class="moz-txt-link-rfc2396E" href="http://www.gabrielebulfon.com/"><http://www.gabrielebulfon.com/></a><br>
> > > *Quantum Mechanics :
*<a class="moz-txt-link-freetext" href="http://www.cdbaby.com/cd/gabrielebulfon">http://www.cdbaby.com/cd/gabrielebulfon</a><br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> ><br>
>
----------------------------------------------------------------------------------<br>
> > ><br>
> > > Da: Klaus Wenninger
<a class="moz-txt-link-rfc2396E" href="mailto:kwenning@redhat.com"><kwenning@redhat.com></a><br>
> > > A: <a class="moz-txt-link-abbreviated" href="mailto:users@clusterlabs.org">users@clusterlabs.org</a><br>
> > > Data: 29 agosto 2016 17.26.49 CEST<br>
> > > Oggetto: Re: [ClusterLabs] ip clustering
strange behaviour<br>
> > ><br>
> > > On 08/29/2016 05:18 PM, Gabriele Bulfon
wrote:<br>
> > > > Hi,<br>
> > > ><br>
> > > > now that I have IPaddr work, I have a
strange behaviour on<br>
> my test<br>
> > > > setup of 2 nodes, here is my
configuration:<br>
> > > ><br>
> > > > ===STONITH/FENCING===<br>
> > > ><br>
> > > > primitive xstorage1-stonith
stonith:external/ssh-sonicle op<br>
> > > monitor<br>
> > > > interval="25" timeout="25"
start-delay="25" params<br>
> > > hostlist="xstorage1"<br>
> > > ><br>
> > > > primitive xstorage2-stonith
stonith:external/ssh-sonicle op<br>
> > > monitor<br>
> > > > interval="25" timeout="25"
start-delay="25" params<br>
> > > hostlist="xstorage2"<br>
> > > ><br>
> > > > location xstorage1-stonith-pref
xstorage1-stonith -inf:<br>
> xstorage1<br>
> > > > location xstorage2-stonith-pref
xstorage2-stonith -inf:<br>
> xstorage2<br>
> > > ><br>
> > > > property stonith-action=poweroff<br>
> > > ><br>
> > > ><br>
> > > ><br>
> > > > ===IP RESOURCES===<br>
> > > ><br>
> > > ><br>
> > > > primitive xstorage1_wan1_IP
ocf:heartbeat:IPaddr params<br>
> > > ip="1.2.3.4"<br>
> > > > cidr_netmask="255.255.255.0"
nic="e1000g1"<br>
> > > > primitive xstorage2_wan2_IP
ocf:heartbeat:IPaddr params<br>
> > > ip="1.2.3.5"<br>
> > > > cidr_netmask="255.255.255.0"
nic="e1000g1"<br>
> > > ><br>
> > > > location xstorage1_wan1_IP_pref
xstorage1_wan1_IP 100: xstorage1<br>
> > > > location xstorage2_wan2_IP_pref
xstorage2_wan2_IP 100: xstorage2<br>
> > > ><br>
> > > > ===================<br>
> > > ><br>
> > > > So I plumbed e1000g1 with unconfigured
IP on both machines and<br>
> > > started<br>
> > > > corosync/pacemaker, and after some time
I got all nodes<br>
> online and<br>
> > > > started, with IP configured as virtual
interfaces (e1000g1:1 and<br>
> > > > e1000g1:2) one in host1 and one in
host2.<br>
> > > ><br>
> > > > Then I halted host2, and I expected to
have host1 started with<br>
> > > both<br>
> > > > IPs configured on host1.<br>
> > > > Instead, I got host1 started with the IP
stopped and removed<br>
> (only<br>
> > > > e1000g1 unconfigured), host2 stopped
saying IP started (!?).<br>
> > > > Not exactly what I expected...<br>
> > > > What's wrong?<br>
> > ><br>
> > > How did you stop host2? Graceful shutdown of
pacemaker? If not ...<br>
> > > Anyway ssh-fencing is just working if the
machine is still<br>
> > > running ...<br>
> > > So it will stay unclean and thus pacemaker is
thinking that<br>
> > > the IP might still be running on it. So this
is actually the<br>
> > > expected<br>
> > > behavior.<br>
> > > You might add a watchdog via sbd if you don't
have other fencing<br>
> > > hardware at hand ...<br>
> > > ><br>
> > > > Here is the crm status after I stopped
host 2:<br>
> > > ><br>
> > > > 2 nodes and 4 resources configured<br>
> > > ><br>
> > > > Node xstorage2: UNCLEAN (offline)<br>
> > > > Online: [ xstorage1 ]<br>
> > > ><br>
> > > > Full list of resources:<br>
> > > ><br>
> > > > xstorage1-stonith
(stonith:external/ssh-sonicle): Started<br>
> > > xstorage2<br>
> > > > (UNCLEAN)<br>
> > > > xstorage2-stonith
(stonith:external/ssh-sonicle): Stopped<br>
> > > > xstorage1_wan1_IP
(ocf::heartbeat:IPaddr): Stopped<br>
> > > > xstorage2_wan2_IP
(ocf::heartbeat:IPaddr): Started xstorage2<br>
> > > (UNCLEAN)<br>
> > > ><br>
> > > ><br>
> > > > Gabriele<br>
> > > ><br>
> > > ><br>
> > ><br>
> ><br>
>
----------------------------------------------------------------------------------------<br>
> > > > *Sonicle S.r.l. *:
<a class="moz-txt-link-freetext" href="http://www.sonicle.com">http://www.sonicle.com</a><br>
> > > <a class="moz-txt-link-rfc2396E" href="http://www.sonicle.com/"><http://www.sonicle.com/></a><br>
> > > > *Music: *<a class="moz-txt-link-freetext" href="http://www.gabrielebulfon.com">http://www.gabrielebulfon.com</a><br>
> > > <a class="moz-txt-link-rfc2396E" href="http://www.gabrielebulfon.com/"><http://www.gabrielebulfon.com/></a><br>
> > > > *Quantum Mechanics :
*<a class="moz-txt-link-freetext" href="http://www.cdbaby.com/cd/gabrielebulfon">http://www.cdbaby.com/cd/gabrielebulfon</a><br>
><br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> Users mailing list: <a class="moz-txt-link-abbreviated" href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
> <a class="moz-txt-link-freetext" href="http://clusterlabs.org/mailman/listinfo/users">http://clusterlabs.org/mailman/listinfo/users</a><br>
><br>
> Project Home: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org">http://www.clusterlabs.org</a><br>
> Getting started:
<a class="moz-txt-link-freetext" href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> Bugs: <a class="moz-txt-link-freetext" href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a><br>
<br>
<br>
_______________________________________________<br>
Users mailing list: <a class="moz-txt-link-abbreviated" href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
<a class="moz-txt-link-freetext" href="http://clusterlabs.org/mailman/listinfo/users">http://clusterlabs.org/mailman/listinfo/users</a><br>
<br>
Project Home: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org">http://www.clusterlabs.org</a><br>
Getting started:
<a class="moz-txt-link-freetext" href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a class="moz-txt-link-freetext" href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a><br>
<br>
<br>
</tt></blockquote>
</div>
</blockquote>
<br>
</body>
</html>