[ClusterLabs] [Linux-HA] fence_ec2 agent

Dejan Muhamedagic dejanmm at fastmail.fm
Fri Mar 20 04:52:07 EDT 2015


Hi Markus,

On Wed, Mar 18, 2015 at 04:03:18PM +0000, Markus Guertler wrote:
> Hi Kazuhiko, Dejan,
> 
> the new resource agent is very good. Since there were a couple of days between my original question and the answer from
> Kazuhiko, I also have written a stonith agent proof of concept (attached to this email) in order to continue in my
> project. However, I think that your fence_ec2 agent is better from a development perspective and it doesn't make sense
> to have two different agents for the same use case.
> 
> Nevertheless, I've implemented an idea, that is very useful in EC2 environments with clusters that have more than two
> nodes: All EC2 instances that belong to a cluster get a unique cluster name via an EC2 instance tag. The agent uses this
> tag to determine all cluster nodes that belong to his own cluster
> 
> --- SNIP ---
>     gethosts)
>         # List of hostnames of this cluster
>         init_agent
>         ec2-describe-instances --filter "tag-key=Clustername" --filter "tag-value=$clustername" | grep "^TAG" |grep
> "Hostname" | awk '{ print $5 }' | sort -u
> --- SNIP ---
> 
> The advantage of this method is, that you just need one configuration snippet for all nodes. This allows to dynamically
> add or remove EC2 instances / cluster nodes to/from a cluster without having to need to touch the cluster configuration.
> Dynamically adding or removing nodes (compute instances) is a very common scenario in a cloud.
> 
> Would it be possible, to implement this idea as an additional configuration method to the fence_ec2 agent?

It sounds like a good idea to me.

Cheers,

Dejan

P.S. CC-ing Kazuhiko-san too, as this discussion took place at
linux-ha ML.

> Cheers,
> Markus
>  
> >>> 東一彦 <higashi.kazuhiko at lab.ntt.co.jp> 3/12/2015 10:44 AM >>> 
> Hi Dejan
> 
> Thank you for add it and the fix some issues !
> 
> 
>  > I was not able to test it, hope it works :)
> I confirmed that it works fine in my AWS environment :)
> 
> 
> Regards,
> Kazuhiko Higashi
> 
> On 2015/03/11 21:27, Dejan Muhamedagic wrote:
> > Hi Kazuhiko-san,
> >
> > On Wed, Mar 11, 2015 at 02:36:43PM +0900, 東一彦 wrote:
> >> Hi, Dejan
> >>
> >> Thank you for the comment.
> >>
> >> I'd like to contribute it as glue stonith agents.
> >>
> >> So, I rename it to just "ec2".
> >>
> >> Would you please add it to glue repository (http://hg.linux-ha.org/glue/) ?
> >
> > I just added your stonith agent. There were this change in the
> > initial changeset:
> >
> > - replaced '-' which is not allowed in identifiers with '_' in
> >    function getinfo_xml().
> >
> > There were other smaller changes. You can find them in the
> > repository.
> >
> > I was not able to test it, hope it works :)
> >
> > Many thanks for the contribution.
> >
> > Cheers,
> >
> > Dejan
> >
> >> Regards,
> >> Kazuhiko Higashi
> >>
> >> On 2015/03/06 2:38, Dejan Muhamedagic wrote:
> >>> Hi,
> >>>
> >>> On Tue, Mar 03, 2015 at 05:13:49PM +0900, 東一彦 wrote:
> >>>> Dear Markus,
> >>>>
> >>>> I was also thinking the same thing.
> >>>> So, Already I've created a new one.
> >>>
> >>> Perhaps you'd like to then contribute it upstream? Either to
> >>> glue stonith agents or RHT fencing agents. It appears that the
> >>> agent is using the stonith interface, but the name reflects the
> >>> fencing agents naming scheme.
> >>>
> >>> Cheers,
> >>>
> >>> Dejan
> >>>
> >>>> [ChangeSet]
> >>>> - An API to be used was changed from "Amazon EC2 CLI" to "AWS CLI".
> >>>>    -- "AWS CLI" is based Python. So, CPU load might be reduced.
> >>>>
> >>>> - The "--private-key" and "--cert" options are deprecated in AWS CLI.
> >>>>    So, I add a new option "--profile". Use a specific profile from that credential file.
> >>>>    default is ""
> >>>>
> >>>>
> >>>> [How to use]
> >>>> - Plaese install the "AWS CLI".
> >>>>    http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html
> >>>>
> >>>> - Please copy the fence_ec2 in /usr/lib64/stonith/plugins/external/.
> >>>>    And , Please set the permissions to 755.
> >>>>
> >>>> - Please set crm settings as in this example.
> >>>>    - The instance that have been set as "node01" in the "Name" tag are fence.
> >>>>    ------
> >>>>    primitive prmStonith1-2 stonith:external/fence_ec2 \		
> >>>> 	params \	
> >>>> 		pcmk_off_timeout="300s" \
> >>>> 		port="node01" \
> >>>> 		tag="Name"
> \
> >>>> 	op start interval="0s" timeout="60s" \	
> >>>> 	op monitor interval="3600s" timeout="60s" \	
> >>>> 	op stop interval="0s" timeout="60s"	
> >>>>    ------
> >>>>
> >>>>
> >>>> Regards,
> >>>> Kazuhiko Higashi
> >>>>
> >>>> On 2015/02/25 7:22, Markus Guertler wrote:
> >>>>> Dear list,
> >>>>> I was just trying to configure the fence_ec2 stonith agent from 2012, written by Andrew Beekhof. It looks like,
> that this one not working anymore with newer stonith / cluster versions. Is there any other EC2 agent, that is still
> maintained?
> >>>>>
> >>>>> If not, I'll write one myself. However, I'd like to check all options first.
> >>>>>
> >>>>> Cheers,
> >>>>> Markus
> >>>>>
> >>>>> _______________________________________________
> >>>>> Linux-HA mailing list
> >>>>> Linux-HA at lists.linux-ha.org
> >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>>> See also: http://linux-ha.org/ReportingProblems
> >>>>>
> >>>>
> >>>>
> >
> >> #!/bin/bash
> >>
> >> description="
> >> fence_ec2 is an I/O Fencing agent which can be used with Amazon EC2 instances.
> >>
> >> API functions used by this agent:
> >> - aws ec2 describe-tags
> >> - aws ec2 describe-instances
> >> - aws ec2 stop-instances
> >> - aws ec2 start-instances
> >> - aws ec2 reboot-instances
> >>
> >> If the uname used by the cluster node is any of:
> >>   - Public DNS name (or part there of),
> >>   - Private DNS name (or part there of),
> >>   - Instance ID (eg. i-4f15a839)
> >>   - Contents of tag associated with the instance
> >> then the agent should be able to automatically discover the instances it can control.
> >>
> >> If the tag containing the uname is not [Name], then it will need to be specified using the [tag] option.
> >> "
> >>
> >> #
> >> # Copyright (c) 2011-2013 Andrew Beekhof
> >> # Copyright (c) 2014 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> >> #                    All Rights Reserved.
> >> #
> >> # This program is free software; you can redistribute it and/or modify
> >> # it under the terms of version 2 of the GNU General Public License as
> >> # published by the Free Software Foundation.
> >> #
> >> # This program is distributed in the hope that it would be useful, but
> >> # WITHOUT ANY WARRANTY; without even the implied warranty of
> >> # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> >> #
> >> # Further, this software is distributed without any warranty that it is
> >> # free of the rightful claim of any third person regarding infringement
> >> # or the like.  Any license provided herein, whether implied or
> >> # otherwise, applies only to this software file.  Patent licenses, if
> >> # any, provided herein do not apply to combinations of this program with
> >> # other software, or any other product whatsoever.
> >> #
> >> # You should have received a copy of the GNU General Public License
> >> # along with this program; if not, write the Free Software Foundation,
> >> # Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
> >> #
> >> #######################################################################
> >>
> >> quiet=0
> >> port_default=""
> >>
> >> instance_not_found=0
> >> unknown_are_stopped=0
> >>
> >> action_default="reset"         # Default fence action
> >> ec2_tag_default="Name"	       # EC2 Tag containing the instance's uname
> >>
> >> sleep_time="1"
> >>
> >> ec2_tag=${tag}
> >>
> >> : ${ec2_tag=${ec2_tag_default}}
> >> : ${port=${port_default}}
> >>
> >> function usage()
> >> {
> >> cat <<EOF
> >> `basename $0` - A fencing agent for Amazon EC2 instances
> >>
> >> $description
> >>
> >> Usage: `basename $0` -o|--action [-n|--port] [options]
> >> Options:
> >>   -h, --help 		This text
> >>   -V, --version		Version information
> >>   -q, --quiet 		Reduced output mode
> >>
> >> Commands:
> >>   -o, --action		Action to perform: on|off|reboot|status|monitor
> >>   -n, --port 		The name of a machine/instance to control/check
> >>
> >> Additional Options:
> >>   -p, --profile		Use a specific profile from your credential file.
> >>   -t, --tag 		Name of the tag containing the instance's uname
> >>
> >> Dangerous options:
> >>   -U, --unknown-are-stopped 	Assume any unknown instance is safely stopped
> >>
> >> EOF
> >>
>    exit 0;
> >> }
> >>
> >> function getinfo-xml()
> >> {
> >> 	cat <<EOF
> >> <parameters>
> >> 	<parameter name="port" unique="1" required="1">
> >> 		<content type="string" />
> >> 		<shortdesc lang="en">The name/id/tag of a instance to control/check</shortdesc>
> >> 	</parameter>
> >> 	<parameter name="profile" unique="0" required="0">
> >> 		<content type="string" default="default" />
> >> 		<shortdesc lang="en">Use a specific profile from your credential file.</shortdesc>
> >> 	</parameter>
> >> 	<parameter name="tag" unique="0" required="1">
> >> 		<content type="string" default="Name" />
> >> 		<shortdesc lang="en">Name of the tag containing the instances uname</shortdesc>
> >> 	</parameter>
> >> 	<parameter name="unknown_are_stopped" unique="0" required="0">
> >> 		<content type="string" default="false" />
> >> 		<shortdesc lang="en">DANGER: Assume any unknown instance is safely stopped</shortdesc>
> >> 	</parameter>
> >> </parameters>
> >> EOF
> >> 	exit 0;
> >> }
> >>
> >> function metadata()
> >> {
> >> 	cat <<EOF
> >> <?xml version="1.0" ?>
> >> <resource-agent name="fence_ec2" shortdesc="Fencing agent for Amazon EC2 instances" >
> >> 	<longdesc>
> >> $description
> >> 	</longdesc>
> >> 	<parameters>
> >> 	<parameter name="action" unique="0" required="1">
> >> 		<getopt mixed="-o, --action=[action]" />
> >> 		<content type="string" default="reboot" />
> >> 		<shortdesc lang="en">Fencing Action</shortdesc>
> >> 	</parameter>
> >> 	<parameter name="port" unique="1" required="1">
> >> 		<getopt mixed="-n, --port=[port]" />
> >> 		<content type="string" />
> >> 		<shortdesc lang="en">The name/id/tag of a instance to control/check</shortdesc>
> >> 	</parameter>
> >> 	<parameter name="profile" unique="0" required="0">
> >> 		<getopt mixed="-p, --profile=[profile]" />
> >> 		<content type="string" default="default" />
> >> 		<shortdesc lang="en">Use a specific profile from your credential file.</shortdesc>
> >> 	</parameter>
> >> 	<parameter name="tag" unique="0" required="1">
> >> 		<getopt mixed="-t, --tag=[tag]" />
> >> 		<content type="string" default="Name" />
> >> 		<shortdesc lang="en">Name of the tag containing the instances uname</shortdesc>
> >> 	</parameter>
> >> 	<parameter name="unknown-are-stopped" unique="0" required="0">
> >> 		<getopt mixed="-U, --unknown-are-stopped" />
> >> 		<content type="string" default="false" />
> >> 		<shortdesc lang="en">DANGER: Assume any unknown instance is safely stopped</shortdesc>
> >> 	</parameter>
> >> 	</parameters>
> >> 	<actions>
> >> 	<action name="on" />
> >> 	<action name="off" />
> >> 	<action name="reboot" />
> >> 	<action name="status" />
> >> 	<action name="list" />
> >> 	<action name="monitor" />
> >> 	<action name="metadata" />
> >> 	</actions>
> >> </resource-agent>
> >> EOF
> >> 	exit 0;
> >> }
> >>
> >> function instance_for_port()
> >> {
> >> 	local port=$1
> >> 	local instance=""
> >>
> >> 	# Look for port name -n in the INSTANCE data
> >> 	instance=`aws ec2 describe-instances $options | grep "^INSTANCES[[:space:]].*[[:space:]]$port[[:space:]]" | awk
> '{print $8}'`
> >> 	if [ -z $instance ]; then
> >> 		# Look for port name -n in the Name TAG
> >> 		instance=`aws ec2 describe-tags $options | grep
> "^TAGS[[:space:]]$ec2_tag[[:space:]].*[[:space:]]instance[[:space:]]$port$" | awk '{print $3}'`
> >> 	fi
> >>
> >> 	if [ -z $instance ]; then
> >> 		instance_not_found=1
> >> 		instance=$port
> >> 	fi
> >>
> >> 	echo $instance
> >> }
> >>
> >> function instance_on()
> >> {
> >> 	aws ec2 start-instances $options --instance-ids $instance
> >> }
> >>
> >> function instance_off()
> >> {
> >> 	if [ $unknown_are_stopped = 1 -a $instance_not_found ]; then
> >> 		: nothing to do
> >> 		ha_log.sh info "Assuming unknown instance $instance is already off"
> >> 	else
> >> 		aws ec2 stop-instances $options --instance-ids $instance --force
> >> 	fi
> >> }
> >>
> >> function instance_status()
> >> {
> >> 	local instance=$1
> >> 	local status="unknown"
> >> 	local rc=1
> >>
> >> 	# List of instances and their current status
> >> 	if [ $unknown_are_stopped = 1 -a $instance_not_found ]; then
> >> 		ha_log.sh info "$instance stopped (unknown)"
> >> 	else
> >> 		status=`aws ec2 describe-instance
> s $options --instance-ids $instance | awk '{
> >> 			if (/^STATE¥t/) { printf "%s", $3 }
> >> 			}'`
> >> 		rc=$?
> >> 	fi
> >> 	ha_log.sh info "status check for $instance is $status"
> >> 	echo $status
> >> 	return $rc
> >> }
> >>
> >>
> >> TEMP=`getopt -o qVho:e:p:n:t:U --long version,help,action:,port:,option:,profile:,tag:,quiet,unknown-are-stopped ¥
> >>       -n 'fence_ec2' -- "$@"`
> >>
> >> if [ $? != 0 ];then
> >>      usage
> >>      exit 1
> >> fi
> >>
> >> # Note the quotes around `$TEMP': they are essential!
> >> eval set -- "$TEMP"
> >>
> >> if [ -z $1 ]; then
> >> 	# If there are no command line args, look for options from stdin
> >> 	while read line; do
> >> 		case $line in
> >> 			option=*|action=*) action=`echo $line | sed s/.*=//`;;
> >> 			port=*)        port=`echo $line | sed s/.*=//`;;
> >> 			profile=*)     ec2_profile=`echo $line | sed s/.*=//`;;
> >> 			tag=*)         ec2_tag=`echo $line | sed s/.*=//`;;
> >> 			quiet*)        quiet=1;;
> >> 			unknown-are-stopped*) unknown_are_stopped=1;;
> >> 			--);;
> >> 			*) ha_log.sh err "Invalid command: $line";;
> >> 		esac
> >> 	done
> >> fi
> >>
> >> while true ; do
> >> 	case "$1" in
> >> 		-o|--action|--option) action=$2;   shift; shift;;
> >> 		-n|--port)            port=$2;     shift; shift;;
> >> 		-p|--profile)         ec2_profile=$2; shift; shift;;
> >> 		-t|--tag)	      ec2_tag=$2; shift; shift;;
> >> 		-U|--unknown-are-stopped) unknown_are_stopped=1; shift;;
> >> 		-q|--quiet) quiet=1; shift;;
> >> 		-V|--version) echo "1.0.0"; exit 0;;
> >> 		--help|-h)
> >> 			usage;
> >> 			exit 0;;
> >> 		--) shift ; break ;;
> >> 		*) ha_log.sh err "Unknown option: $1. See --help for details."; exit 1;;
> >> 	esac
> >> done
> >>
> >> [ -n "$1" ] && action=$1
> >>
> >> if [ -z "$ec2_profile"]; then
> >> 	options="--output text --profile default"
> >> else
> >> 	options="--output text --profile $ec2_profile "
> >> fi
> >>
> >> action=`echo $action | tr 'A-Z' 'a-z'`
> >>
> >> case $action in
> >> 	metadata)
> >> 		metadata
> >> 	;;
> >> 	getinfo-xml)
> >> 		getinfo-xml
> >> 	;;
> >> 	getconfignames)
> >> 		for i in profile port tag
> >> 		do
> >> 			echo $i
> >> 		done
> >> 		exit 0
> >> 	;;
> >> 	getinfo-devid)
> >> 		echo "EC2 STONITH device"
> >> 		exit 0
> >> 	;;
> >> 	getinfo-devname)
> >> 		echo "EC2 STONITH external device"
> >> 		exit 0
> >> 	;;
> >> 	getinfo-devdescr)
> >> 		echo "fence_ec2 is an I/O Fencing agent which can be used with Amazon EC2 instances."
> >> 		exit 0
> >> 	;;
> >> 	getinfo-devurl)
> >> 		echo ""
> >> 		exit 0
> >> 	;;
> >> esac
> >>
> >> # get my instance id
> >> myinstance=`curl http://169.254.169.254/latest/meta-data/instance-id`
> >>
> >> # check my status.
> >> # When the EC2 instance be stopped by the "aws ec2 stop-instances" , the stop processing of the OS is executed.
> >> # While the OS stop processing, Pacemaker can execute the STONITH processing.
> >> # So, If my status is not "running", it determined that I was already fenced. And to prevent fencing each other
> >> # in split-brain, I don't fence other node.
> >> if [ -z "$myinstance" ]; then
> >> 	ha_log.sh err "Failed to get My Instance ID. so can not check my status."
> >> 	exit 1
> >> fi
> >> mystatus=`instance_status $myinstance`
> >> if [ "$mystatus" != "running" ]; then #do not fence
> >> 	ha_log.sh warn "I was already fenced (My instance status=$mystatus). I don't fence other node."
> >> 	exit 1
> >> fi
> >>
> >> # get target's instance id
> >> instance=""
> >> if [ ! -z "$port" ]; then
> >> 	instance=`instance_for_port $port $options`
> >> fi
> >>
> >> case $action in
> >> 	reboot|reset)
> >> 		status=`instance_status $instance`
> >> 		if [ "$status" != "stopped" ]; then
> >> 			instance_off
> >> 		fi
> >> 		while true;
> >> 		do
> >> 			status=`instance_status $instance`
> >> 			if [ "$status" = "stopped" ]; then
> >> 				break
> >> 			fi
> >> 			sleep $sleep_time
> >> 		done
> >> 		instance_on
> >> 		while true;
> >> 		do
> >> 			status=`instance_status $instance`
> >> 			if [ "$status" = "running" ]; then
> >> 				break
> >> 			fi
> >> 			sleep $sleep_time
> >> 		done
> >> 	;;
> >> 	poweron|on)
> >> 		instance_on
> >> 		while true;
> >> 		do
> >> 			status=`instance_status $instance`
> >> 			if [ "$
> status" = "running" ]; then
> >> 				break
> >> 			fi
> >> 		done
> >> 	;;
> >> 	poweroff|off)
> >> 		instance_off
> >> 		while true;
> >> 		do
> >> 			status=`instance_status $instance`
> >> 			if [ "$status" = "stopped" ]; then
> >> 				break
> >> 			fi
> >> 			sleep $sleep_time
> >> 		done
> >> 	;;
> >> 	monitor)
> >> 		# Is the device ok?
> >> 		aws ec2 describe-instances $options | grep INSTANCES &> /dev/null
> >> 	;;
> >> 	gethosts|hostlist|list)
> >> 		# List of names we know about
> >> 		a=`aws ec2 describe-instances $options | awk -v tag_pat="^TAGS¥t$ec2_tag¥t" -F '¥t' '{
> >> 			if (/^INSTANCES/) { printf "%s¥n", $8 }
> >> 			else if ( $1"¥t"$2"¥t" ‾ tag_pat ) { printf "%s¥n", $3 }
> >> 			}' | sort -u`
> >> 		echo $a
> >> 	;;
> >> 	stat|status)
> >> 		instance_status $instance > /dev/null
> >> 	;;
> >> 	*) ha_log.sh err "Unknown action: $action"; exit 1;;
> >> esac
> >>
> >> status=$?
> >>
> >> if [ $quiet -eq 1 ]; then
> >> 	: nothing
> >> elif [ $status -eq 0 ]; then
> >> 	ha_log.sh info "Operation $action passed"
> >> else
> >> 	ha_log.sh err "Operation $action failed: $status"
> >> fi
> >> exit $status
> >
> >> _______________________________________________
> >> Linux-HA mailing list
> >> Linux-HA at lists.linux-ha.org
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> >
> 
> 
> -- 
> ----------------------------------------------------
>   東 一彦
>    NTT OSSセンタ 基盤技術ユニット 高信頼担当
>    (SV総研 ソフトウェアイノベーションセンタ OSS推進PJ)
>   Mail:higashi.kazuhiko at lab.ntt.co.jp
>   Tel :03-5860-5135 (直通:5111)
>   〒108-8019 東京都港区港南1-9-1 NTT品川TWINSビル11階
> ----------------------------------------------------
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 


> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Users mailing list