[Pacemaker] Does pingd works on openais?

Wed Mar 19 05:20:46 EDT 2008

-----Original Message-----
From: Lars Marowsky-Bree [mailto:lmb at suse.de] 
Sent: Tuesday, March 18, 2008 4:59 PM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Does pingd works on openais?

On 2008-03-10T11:13:51, Atanas Dyulgerov <atanas.dyulgerov at postpath.com> wrote:

>> STONITH brutally shutdowns a node. To do that you need redundant
>> communication lines, smart power devices and definitely a local
>> cluster. For geographically separated cluster with remote nodes
>> STONITH is not applicable.  The method is called Node Fencing and as I
>> said it has too many obstacles. 
>
>All fencing methods require a means of communicating with the device; in
>case of WAN clusters, the link between the (replicating) storage arrays
>(for example) will also be cut - resource fencing is unavailable for the
>same reasons as STONITH is.

The case when node loses connectivity to the cluster but it still remains
connected to the shared resource. Then the other nodes which retain quorum
can lock the shared storage resource to stop the errant node from accessing
it. This fencing method does not require communication with the failed node.
That's what RHCS do I believe.

Shutting down a node seems inconvenient for me. You have to manually power
it on. What happens if the network link disconnects two times a day?

>WAN clusters require the concept of self-fencing after loss of site
>quorum.

Any self-fencing method for production use implemented so far? I would
like to test...

>All discussions regarding node versus resource level fencing are
>correct and hold true, but as Andrew explained, resource fencing we
>already could support if the RAs implemented it - if you set
>no-quorum-policy=ignore, you've turned the cluster into a
>resource-driven model, like SteelEye's LifeKeeper.
>
>Yet, the wide-area versus metro-area versus local data center cluster
>discussion is quite separate from this.
>
>> As for me a better choice is 'resources locking' option aka Resource
>> Fencing. Instead of killing the errant node the cluster CRM just
>> fence/lock its I/O access to the shared storage resource unit cluster
>> messaging system reports back successful service stop on that node.
>> Perfectly suits DR cluster and no need of additional communication
>> lines. More elegant solution!
>
>Again, that's not quite true; see above. How does the resource itself
>ensure fencing if the links are all cut? 

The sub-cluster that retains quorum order the resource to lock i/o from 
the other sub-cluster node(s) without quorum. For e.g. GNBD exporter 
fence i/o access for nodes failed/disconnected from the cluster. 

>> Heartbeat/Pacemaker does not support resource fencing. To fence
>> resources, the resources have to support locking features by
>> themselves. You cannot lock something that cannot be locked. Software
>> iSCSI targets does not have locking mechanisms whereas GNBD does.
>> However, GNBD locking features work with RedHat ClusterSuite only.
>
>iSCSI targets in theory support SCSI reservations. Yet that only works
>if the target is centralized (and thus a SPoF) - if you replicate it,
>see above.

Which iSCSI target software implementation support reservation? Does the 
iSCSI ocf RA support that reservation?

>True, resource level fencing is desirable, and the RAs should do it
>automatically. Possibly, certain extensions to Pacemaker might also be
>useful here, though I think that's not the biggest obstacle.

I'm looking forward to those extensions in Pacemaker.

>> >Have you _benchmarked_ that for your workload?
>> Yes, I have benchmarked the application performance. With GNBD it had
>> the best score, then comes iSCSI and NFS is at the end. 
>
>That I find quite surprising. I'd like to duplicate that benchmark
>eventually; do you have the benchmark description and results available
>somewhere?

The application I'm running in the cluster is mail server. I benchmarked
the server performance running on GNBD(ext3), iSCSI(ext3) and NFS. 
Performance is measured with Loadsim. http://www.msexchange.org/tutorials/Simulating-Stress-Exchange-2003-LoadSim.html
If Loadsim results will have meaning for I can share them with you.

Regards,
Atanas

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Pacemaker mailing list
Pacemaker at clusterlabs.org
http://list.clusterlabs.org/mailman/listinfo/pacemaker