[Pacemaker] how to mount drive on SAN with pacemakerresourceagent?

Thu Jan 6 10:45:15 EST 2011

This is great information. Thanks.  I was wondering what criteria is
used to determine that a 'sick' node should be killed? If it can't be
contacted over the network for some length of time? If the resources
can't be restarted on the box? What I'm most worried about is the
scenario where my backup loses contact with the primary due to a network
failure and the backup takes over even though the master is still
running.  This would cause both nodes to mount my SAN attached storage
and potentially corrupt it.   I've actually forced this to happen by
disconnecting the master's network adapter on my test cluster.  I wound
up with a split brain situation where both nodes were actively running.
Would a STONITH device kill the master if the master could not be
contacted over the network? Or would the STONITH device indicate that
the master was ok and prevent the unwanted failover from occurring and
thus prevent the split brain scenario I just described?

Thanks for all your help. It is much appreciated!

Mick

________________________________

From: Mike Diehn [mailto:mike.diehn at ansys.com] 
Sent: Thursday, January 06, 2011 9:16 AM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] how to mount drive on SAN with
pacemakerresourceagent?

You want a STONITH tool that will let your nodes positively kill one
another without needing to rely on the "sick" node for anything.  So,
the ideal solution is, yes, a networked power device.  Something that
will let you power-off the sick node remotely.

Lacking that, you could use IPMI tool if your servers have BMCs.  Almost
all server class machines do today.  Things like Sun ILOM, Dell DRAC, HP
iLO.

The modules and scripts in /usr/lib64/stonith/plugins will give you an
idea of what's available already.

Do try to resist the temptation to use ssh to issue a shutdown command.
That's really just not useful and if you implement it, you check off
'implement stonith' on your list and move happily on thinking you're
shared file-system is now safe.  When it isn't.

Does that help?  Oh, one more thing, it took me an embarassingly long
time to discover that there is a "stonith" command and a bunch of
related "stuff."  On my SLES 11 SP1 systems, with the HA Extension
Add-on, the stonith stuff came in as part of RPM package
cluster-glue-1.0.5-0.5.1.

Best,

Mike

On Thu, Jan 6, 2011 at 9:53 AM, Michael Hittesdorf
<michael.hittesdorf at chicagotrading.com> wrote:

Thanks for your reply. I now have the Filesystem resource working on my
test cluster. I've done some reading on STONITH as you suggested and am
now wondering how I determine what STONITH devices are actually
available on my servers and which one I should choose?  The
recommendation I've read suggests the use of an external UPS that can be
monitored over the network. Is this the best approach? Are there other
STONITH devices that are commonly used? Why choose one over the other?

Thanks in advance.  Mick 

________________________________

From: Mike Diehn [mailto:mike.diehn at ansys.com] 
Sent: Tuesday, January 04, 2011 2:54 PM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] how to mount drive on SAN with pacemaker
resourceagent?

To make sure the failed server is actually dead, you want to use
STONITH.  So read about that.  Here are examples from our testing
cluster.  These are broken, so don't use them as they are.  That's why
they are set to "Stopped" right now.  I probably have some timing stuff
very wrong:

	primitive ShootLebekmfs1 stonith:external/ipmi \

	        meta target-role="Stopped" \

	        params hostname="lebekmfs1" ipaddr="10.1.1.59"
userid="stonith" passwd="ShootMeInTheHead" interface="lan"

	primitive ShootLebekmfs2 stonith:external/ipmi \

	        meta target-role="Stopped" \

	        params hostname="lebekmfs2" ipaddr="10.1.1.61"
userid="stonith" passwd="ShootMeInTheHead" interface="lan"

You can use the ocf:heartbeat:Filesystem resource to mount any file
system you can mount manually.  Here's one from a config in our test
cluster.  This works:

	primitive lvTest ocf:heartbeat:Filesystem \

	        params device="/dev/EkmCluVG/lvTest"
directory="/srv/test1" fstype="ocfs2" \

	        op monitor interval="10s" timeout="10s"

Make sure you remove the file system from your /etc/fstab if you're
going to do it this way.  During testing, for my convenience, I leave it
in, but add the noauto option to prevent it being mounted on boot.

Best,

Mike

On Tue, Jan 4, 2011 at 2:05 PM, Michael Hittesdorf
<michael.hittesdorf at chicagotrading.com> wrote:

Can I use the Filesystem resource agent to mount a SAN drive in the
event of a failover? How do I ensure that the failed server no longer
has the drive mounted so as to prevent storage corruption? Having read
several of the tutorials, I'm aware of DRBD and the clustered file
systems GFS2 and OCFS2.  However, I don't need simultaneous access to
the disk from both of my cluster nodes. I just want to make the shared
SAN storage available to the primary, active server only as my cluster
is active-passive.  Is there a recommended way to accomplish this?

Thanks for your help!

This message is intended only for the personal and confidential use of
the recipients named above. If the reader of this email is not the
intended recipient, you have received this email in error and any
review, dissemination, distribution or copying is strictly prohibited.
If you have received this email in error, please notify the sender
immediately by return email and permanently delete the copy you
received. This message is provided for informational purposes and should
not be construed as a solicitation or offer to buy or sell any
securities or related financial instruments. Neither CTC Holdings nor
any affiliates (CTC) are responsible for any recommendation,
solicitation, offer or agreement or any information about any
transaction, customer account or account activity that may be attached
to or contained in this communication. CTC accepts no liability for any
content contained in the email, or any errors or omissions arising as a
result of e-mail transmission. Any opinions contained in this email
constitute the sender's best judgment at this time and are subject to
change without notice. CTC London Limited is authorized and regulated by
the Financial Services Authority.

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake
r

-- 
Mike Diehn
Senior Systems Administrator
ANSYS, Inc - Lebanon, NH Office
mike.diehn at ansys.com, (603) 727-5492

This message is intended only for the personal and confidential use of
the recipients named above. If the reader of this email is not the
intended recipient, you have received this email in error and any
review, dissemination, distribution or copying is strictly prohibited.
If you have received this email in error, please notify the sender
immediately by return email and permanently delete the copy you
received. This message is provided for informational purposes and should
not be construed as a solicitation or offer to buy or sell any
securities or related financial instruments. Neither CTC Holdings nor
any affiliates (CTC) are responsible for any recommendation,
solicitation, offer or agreement or any information about any
transaction, customer account or account activity that may be attached
to or contained in this communication. CTC accepts no liability for any
content contained in the email, or any errors or omissions arising as a
result of e-mail transmission. Any opinions contained in this email
constitute the sender's best judgment at this time and are subject to
change without notice. CTC London Limited is authorized and regulated by
the Financial Services Authority.

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake
r

-- 
Mike Diehn
Senior Systems Administrator
ANSYS, Inc - Lebanon, NH Office
mike.diehn at ansys.com, (603) 727-5492

This message is intended only for the personal and confidential use of the recipients named above.  If the reader of this email is not the intended recipient, you have received this email in error and any review, dissemination, distribution or copying is strictly prohibited.  If you have received this email in error, please notify the sender immediately by return email and permanently delete the copy you received.  This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.  Neither CTC Holdings nor any affiliates (CTC) are responsible for any recommendation, solicitation, offer or agreement or any information about any transaction, customer account or account activity that may be attached to or contained in this communication. CTC accepts no liability for any content contained in the email, or any errors or omissions arising as a result of e-mail transmission.  Any opinions contained in this email constitute the sender's best judgment at this time and are subject to change without notice. CTC London Limited is authorized and regulated by the Financial Services Authority.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110106/509b544a/attachment-0001.html>