[ClusterLabs] [Linux-HA] fence_ec2 agent

東一彦 higashi.kazuhiko at lab.ntt.co.jp
Wed Mar 25 01:47:01 UTC 2015


Hi Markus,

I implemented it for trial.

[diff from http://hg.linux-ha.org/glue/rev/9da0680bc9c0 ]
50d49
< port_default=""
60c59
< ec2_tag=${tag}
---
 > [ -n "$tag" ] && ec2_tag="$tag"
63d61
< : ${port=${port_default}}
97c95
<       <parameter name="port" unique="1" required="1">
---
 >       <parameter name="port" unique="1" required="0">
105c103
<       <parameter name="tag" unique="0" required="1">
---
 >       <parameter name="tag" unique="0" required="0">
132c130
<       <parameter name="port" unique="1" required="1">
---
 >       <parameter name="port" unique="1" required="0">
142c140
<       <parameter name="tag" unique="0" required="1">
---
 >       <parameter name="tag" unique="0" required="0">
221a220,224
 > function monitor()
 > {
 >               # Is the device ok?
 >               aws ec2 describe-instances $options | grep INSTANCES &> /dev/null
 > }
267a271
 > [ -n "$2" ] && node_to_fence=$2
326a331,334
 > if [ -z "$port" ]; then
 >       port="$node_to_fence"
 > fi
 >
379,380c387
<               # Is the device ok?
<               aws ec2 describe-instances $options | grep INSTANCES &> /dev/null
---
 >               monitor
391c398
<               instance_status $instance > /dev/null
---
 >               monitor



It works fine on my environment with 2 patterns settings below.

[pattern No.1]
Without "port" and "tag" parameters.
And instances has "Name=<uname>" tag.

----
primitive prmStonith1-2 stonith:external/ec2 \
          params \
                  pcmk_off_timeout="120s" \
          op start interval="0s" timeout="60s" \
          op monitor interval="3600s" timeout="60s" \
          op stop interval="0s" timeout="60s"
----


[pattern No.2]
With only "tag" parameter.(Without "port" parameter.)
And, The 1st instance(node01) has "Cluster1=node01" tag.
The 2nd instance(node02) has "Cluster1=node02" tag.

----
primitive prmStonith1-2 stonith:external/ec2 \
          params \
                  pcmk_off_timeout="120s" \
                  tag="Cluster1" \
          op start interval="0s" timeout="60s" \
          op monitor interval="3600s" timeout="60s" \
          op stop interval="0s" timeout="60s"
----


Regards,
Kazuhiko Higashi


On 2015/03/24 20:48, 東一彦 wrote:
> Hi Markus,
>
> Thank you for the comment.
>
>  > Would it be possible, to implement this idea as an additional configuration method to the fence_ec2 agent?
> I think that your idea is good.
>
> So, I tries to implement it.
> I'm going to change the fence_ec2(ec2) the following points.
>
>   - the "tag" and the "port" options will be "not" required.
>
>   - if the "port" option is not set, the 2nd argument of ec2 will use as the "port".
>     - the 2nd argument of ec2 is "node to fence".
>
>   - the "stat" and "status" action will be same the "monitor" action.
>     (for do not use the "port" parameter in "stat" action.)
>
>
> By the above modifications, If it is described uname in the Name tag,
> the setting of the "tag" and "port" parameters are no longer necessary.
>
> ----
> primitive prmStonith1-2 stonith:external/ec2 \
>          params \
>                  pcmk_off_timeout="120s" \
>          op start interval="0s" timeout="60s" \
>          op monitor interval="3600s" timeout="60s" \
>          op stop interval="0s" timeout="60s"
> ----
>
>
> You can use "tag" parameter like your "Clustername" tag.
> If cluster nodes(instances) have "Cluster1" tag, and uname is described in that tag,
> it works just like you to expect.
>
> ----
> primitive prmStonith1-2 stonith:external/ec2 \
>          params \
>                  pcmk_off_timeout="120s" \
>                  tag="Cluster1" \
>          op start interval="0s" timeout="60s" \
>          op monitor interval="3600s" timeout="60s" \
>          op stop interval="0s" timeout="60s"
> ----
>
> The 1st instance have "Cluster1=node01" tag-key.
> The 2nd instance have "Cluster1=node02" tag-key.
> The 3rd instance have "Cluster1=node03" tag-key.
> ...
> The prmStonith1-2 can fence node01 , node02 and node03.
>
>
> If you like above, I will implement that.
>
>
> Regards,
> Kazuhiko Higashi
>
>
> On 2015/03/19 1:03, Markus Guertler wrote:
>> Hi Kazuhiko, Dejan,
>>
>> the new resource agent is very good. Since there were a couple of days between my original question and the answer from
>> Kazuhiko, I also have written a stonith agent proof of concept (attached to this email) in order to continue in my
>> project. However, I think that your fence_ec2 agent is better from a development perspective and it doesn't make sense
>> to have two different agents for the same use case.
>>
>> Nevertheless, I've implemented an idea, that is very useful in EC2 environments with clusters that have more than two
>> nodes: All EC2 instances that belong to a cluster get a unique cluster name via an EC2 instance tag. The agent uses this
>> tag to determine all cluster nodes that belong to his own cluster
>>
>> --- SNIP ---
>>      gethosts)
>>          # List of hostnames of this cluster
>>          init_agent
>>          ec2-describe-instances --filter "tag-key=Clustername" --filter "tag-value=$clustername" | grep "^TAG" |grep
>> "Hostname" | awk '{ print $5 }' | sort -u
>> --- SNIP ---
>>
>> The advantage of this method is, that you just need one configuration snippet for all nodes. This allows to dynamically
>> add or remove EC2 instances / cluster nodes to/from a cluster without having to need to touch the cluster configuration.
>> Dynamically adding or removing nodes (compute instances) is a very common scenario in a cloud.
>>
>> Would it be possible, to implement this idea as an additional configuration method to the fence_ec2 agent?
>>
>> Cheers,
>> Markus
>>
>>>>> 東一彦 <higashi.kazuhiko at lab.ntt.co.jp> 3/12/2015 10:44 AM >>>
>> Hi Dejan
>>
>> Thank you for add it and the fix some issues !
>>
>>
>>   > I was not able to test it, hope it works :)
>> I confirmed that it works fine in my AWS environment :)
>>
>>
>> Regards,
>> Kazuhiko Higashi
>>
>> On 2015/03/11 21:27, Dejan Muhamedagic wrote:
>>> Hi Kazuhiko-san,
>>>
>>> On Wed, Mar 11, 2015 at 02:36:43PM +0900, 東一彦 wrote:
>>>> Hi, Dejan
>>>>
>>>> Thank you for the comment.
>>>>
>>>> I'd like to contribute it as glue stonith agents.
>>>>
>>>> So, I rename it to just "ec2".
>>>>
>>>> Would you please add it to glue repository (http://hg.linux-ha.org/glue/) ?
>>>
>>> I just added your stonith agent. There were this change in the
>>> initial changeset:
>>>
>>> - replaced '-' which is not allowed in identifiers with '_' in
>>>     function getinfo_xml().
>>>
>>> There were other smaller changes. You can find them in the
>>> repository.
>>>
>>> I was not able to test it, hope it works :)
>>>
>>> Many thanks for the contribution.
>>>
>>> Cheers,
>>>
>>> Dejan
>>>
>>>> Regards,
>>>> Kazuhiko Higashi
>>>>
>>>> On 2015/03/06 2:38, Dejan Muhamedagic wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, Mar 03, 2015 at 05:13:49PM +0900, 東一彦 wrote:
>>>>>> Dear Markus,
>>>>>>
>>>>>> I was also thinking the same thing.
>>>>>> So, Already I've created a new one.
>>>>>
>>>>> Perhaps you'd like to then contribute it upstream? Either to
>>>>> glue stonith agents or RHT fencing agents. It appears that the
>>>>> agent is using the stonith interface, but the name reflects the
>>>>> fencing agents naming scheme.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Dejan
>>>>>
>>>>>> [ChangeSet]
>>>>>> - An API to be used was changed from "Amazon EC2 CLI" to "AWS CLI".
>>>>>>     -- "AWS CLI" is based Python. So, CPU load might be reduced.
>>>>>>
>>>>>> - The "--private-key" and "--cert" options are deprecated in AWS CLI.
>>>>>>     So, I add a new option "--profile". Use a specific profile from that credential file.
>>>>>>     default is ""
>>>>>>
>>>>>>
>>>>>> [How to use]
>>>>>> - Plaese install the "AWS CLI".
>>>>>>     http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html
>>>>>>
>>>>>> - Please copy the fence_ec2 in /usr/lib64/stonith/plugins/external/.
>>>>>>     And , Please set the permissions to 755.
>>>>>>
>>>>>> - Please set crm settings as in this example.
>>>>>>     - The instance that have been set as "node01" in the "Name" tag are fence.
>>>>>>     ------
>>>>>>     primitive prmStonith1-2 stonith:external/fence_ec2 \
>>>>>>     params \
>>>>>>         pcmk_off_timeout="300s" \
>>>>>>         port="node01" \
>>>>>>         tag="Name"
>> \
>>>>>>     op start interval="0s" timeout="60s" \
>>>>>>     op monitor interval="3600s" timeout="60s" \
>>>>>>     op stop interval="0s" timeout="60s"
>>>>>>     ------
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Kazuhiko Higashi
>>>>>>
>>>>>> On 2015/02/25 7:22, Markus Guertler wrote:
>>>>>>> Dear list,
>>>>>>> I was just trying to configure the fence_ec2 stonith agent from 2012, written by Andrew Beekhof. It looks like,
>> that this one not working anymore with newer stonith / cluster versions. Is there any other EC2 agent, that is still
>> maintained?
>>>>>>>
>>>>>>> If not, I'll write one myself. However, I'd like to check all options first.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Markus
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Linux-HA mailing list
>>>>>>> Linux-HA at lists.linux-ha.org
>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>
>>>>>>
>>>>>>
>>>
>>>> #!/bin/bash
>>>>
>>>> description="
>>>> fence_ec2 is an I/O Fencing agent which can be used with Amazon EC2 instances.
>>>>
>>>> API functions used by this agent:
>>>> - aws ec2 describe-tags
>>>> - aws ec2 describe-instances
>>>> - aws ec2 stop-instances
>>>> - aws ec2 start-instances
>>>> - aws ec2 reboot-instances
>>>>
>>>> If the uname used by the cluster node is any of:
>>>>    - Public DNS name (or part there of),
>>>>    - Private DNS name (or part there of),
>>>>    - Instance ID (eg. i-4f15a839)
>>>>    - Contents of tag associated with the instance
>>>> then the agent should be able to automatically discover the instances it can control.
>>>>
>>>> If the tag containing the uname is not [Name], then it will need to be specified using the [tag] option.
>>>> "
>>>>
>>>> #
>>>> # Copyright (c) 2011-2013 Andrew Beekhof
>>>> # Copyright (c) 2014 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
>>>> #                    All Rights Reserved.
>>>> #
>>>> # This program is free software; you can redistribute it and/or modify
>>>> # it under the terms of version 2 of the GNU General Public License as
>>>> # published by the Free Software Foundation.
>>>> #
>>>> # This program is distributed in the hope that it would be useful, but
>>>> # WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>>> #
>>>> # Further, this software is distributed without any warranty that it is
>>>> # free of the rightful claim of any third person regarding infringement
>>>> # or the like.  Any license provided herein, whether implied or
>>>> # otherwise, applies only to this software file.  Patent licenses, if
>>>> # any, provided herein do not apply to combinations of this program with
>>>> # other software, or any other product whatsoever.
>>>> #
>>>> # You should have received a copy of the GNU General Public License
>>>> # along with this program; if not, write the Free Software Foundation,
>>>> # Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
>>>> #
>>>> #######################################################################
>>>>
>>>> quiet=0
>>>> port_default=""
>>>>
>>>> instance_not_found=0
>>>> unknown_are_stopped=0
>>>>
>>>> action_default="reset"         # Default fence action
>>>> ec2_tag_default="Name"           # EC2 Tag containing the instance's uname
>>>>
>>>> sleep_time="1"
>>>>
>>>> ec2_tag=${tag}
>>>>
>>>> : ${ec2_tag=${ec2_tag_default}}
>>>> : ${port=${port_default}}
>>>>
>>>> function usage()
>>>> {
>>>> cat <<EOF
>>>> `basename $0` - A fencing agent for Amazon EC2 instances
>>>>
>>>> $description
>>>>
>>>> Usage: `basename $0` -o|--action [-n|--port] [options]
>>>> Options:
>>>>    -h, --help         This text
>>>>    -V, --version        Version information
>>>>    -q, --quiet         Reduced output mode
>>>>
>>>> Commands:
>>>>    -o, --action        Action to perform: on|off|reboot|status|monitor
>>>>    -n, --port         The name of a machine/instance to control/check
>>>>
>>>> Additional Options:
>>>>    -p, --profile        Use a specific profile from your credential file.
>>>>    -t, --tag         Name of the tag containing the instance's uname
>>>>
>>>> Dangerous options:
>>>>    -U, --unknown-are-stopped     Assume any unknown instance is safely stopped
>>>>
>>>> EOF
>>>>
>>     exit 0;
>>>> }
>>>>
>>>> function getinfo-xml()
>>>> {
>>>>     cat <<EOF
>>>> <parameters>
>>>>     <parameter name="port" unique="1" required="1">
>>>>         <content type="string" />
>>>>         <shortdesc lang="en">The name/id/tag of a instance to control/check</shortdesc>
>>>>     </parameter>
>>>>     <parameter name="profile" unique="0" required="0">
>>>>         <content type="string" default="default" />
>>>>         <shortdesc lang="en">Use a specific profile from your credential file.</shortdesc>
>>>>     </parameter>
>>>>     <parameter name="tag" unique="0" required="1">
>>>>         <content type="string" default="Name" />
>>>>         <shortdesc lang="en">Name of the tag containing the instances uname</shortdesc>
>>>>     </parameter>
>>>>     <parameter name="unknown_are_stopped" unique="0" required="0">
>>>>         <content type="string" default="false" />
>>>>         <shortdesc lang="en">DANGER: Assume any unknown instance is safely stopped</shortdesc>
>>>>     </parameter>
>>>> </parameters>
>>>> EOF
>>>>     exit 0;
>>>> }
>>>>
>>>> function metadata()
>>>> {
>>>>     cat <<EOF
>>>> <?xml version="1.0" ?>
>>>> <resource-agent name="fence_ec2" shortdesc="Fencing agent for Amazon EC2 instances" >
>>>>     <longdesc>
>>>> $description
>>>>     </longdesc>
>>>>     <parameters>
>>>>     <parameter name="action" unique="0" required="1">
>>>>         <getopt mixed="-o, --action=[action]" />
>>>>         <content type="string" default="reboot" />
>>>>         <shortdesc lang="en">Fencing Action</shortdesc>
>>>>     </parameter>
>>>>     <parameter name="port" unique="1" required="1">
>>>>         <getopt mixed="-n, --port=[port]" />
>>>>         <content type="string" />
>>>>         <shortdesc lang="en">The name/id/tag of a instance to control/check</shortdesc>
>>>>     </parameter>
>>>>     <parameter name="profile" unique="0" required="0">
>>>>         <getopt mixed="-p, --profile=[profile]" />
>>>>         <content type="string" default="default" />
>>>>         <shortdesc lang="en">Use a specific profile from your credential file.</shortdesc>
>>>>     </parameter>
>>>>     <parameter name="tag" unique="0" required="1">
>>>>         <getopt mixed="-t, --tag=[tag]" />
>>>>         <content type="string" default="Name" />
>>>>         <shortdesc lang="en">Name of the tag containing the instances uname</shortdesc>
>>>>     </parameter>
>>>>     <parameter name="unknown-are-stopped" unique="0" required="0">
>>>>         <getopt mixed="-U, --unknown-are-stopped" />
>>>>         <content type="string" default="false" />
>>>>         <shortdesc lang="en">DANGER: Assume any unknown instance is safely stopped</shortdesc>
>>>>     </parameter>
>>>>     </parameters>
>>>>     <actions>
>>>>     <action name="on" />
>>>>     <action name="off" />
>>>>     <action name="reboot" />
>>>>     <action name="status" />
>>>>     <action name="list" />
>>>>     <action name="monitor" />
>>>>     <action name="metadata" />
>>>>     </actions>
>>>> </resource-agent>
>>>> EOF
>>>>     exit 0;
>>>> }
>>>>
>>>> function instance_for_port()
>>>> {
>>>>     local port=$1
>>>>     local instance=""
>>>>
>>>>     # Look for port name -n in the INSTANCE data
>>>>     instance=`aws ec2 describe-instances $options | grep "^INSTANCES[[:space:]].*[[:space:]]$port[[:space:]]" | awk
>> '{print $8}'`
>>>>     if [ -z $instance ]; then
>>>>         # Look for port name -n in the Name TAG
>>>>         instance=`aws ec2 describe-tags $options | grep
>> "^TAGS[[:space:]]$ec2_tag[[:space:]].*[[:space:]]instance[[:space:]]$port$" | awk '{print $3}'`
>>>>     fi
>>>>
>>>>     if [ -z $instance ]; then
>>>>         instance_not_found=1
>>>>         instance=$port
>>>>     fi
>>>>
>>>>     echo $instance
>>>> }
>>>>
>>>> function instance_on()
>>>> {
>>>>     aws ec2 start-instances $options --instance-ids $instance
>>>> }
>>>>
>>>> function instance_off()
>>>> {
>>>>     if [ $unknown_are_stopped = 1 -a $instance_not_found ]; then
>>>>         : nothing to do
>>>>         ha_log.sh info "Assuming unknown instance $instance is already off"
>>>>     else
>>>>         aws ec2 stop-instances $options --instance-ids $instance --force
>>>>     fi
>>>> }
>>>>
>>>> function instance_status()
>>>> {
>>>>     local instance=$1
>>>>     local status="unknown"
>>>>     local rc=1
>>>>
>>>>     # List of instances and their current status
>>>>     if [ $unknown_are_stopped = 1 -a $instance_not_found ]; then
>>>>         ha_log.sh info "$instance stopped (unknown)"
>>>>     else
>>>>         status=`aws ec2 describe-instance
>> s $options --instance-ids $instance | awk '{
>>>>             if (/^STATE¥t/) { printf "%s", $3 }
>>>>             }'`
>>>>         rc=$?
>>>>     fi
>>>>     ha_log.sh info "status check for $instance is $status"
>>>>     echo $status
>>>>     return $rc
>>>> }
>>>>
>>>>
>>>> TEMP=`getopt -o qVho:e:p:n:t:U --long version,help,action:,port:,option:,profile:,tag:,quiet,unknown-are-stopped ¥
>>>>        -n 'fence_ec2' -- "$@"`
>>>>
>>>> if [ $? != 0 ];then
>>>>       usage
>>>>       exit 1
>>>> fi
>>>>
>>>> # Note the quotes around `$TEMP': they are essential!
>>>> eval set -- "$TEMP"
>>>>
>>>> if [ -z $1 ]; then
>>>>     # If there are no command line args, look for options from stdin
>>>>     while read line; do
>>>>         case $line in
>>>>             option=*|action=*) action=`echo $line | sed s/.*=//`;;
>>>>             port=*)        port=`echo $line | sed s/.*=//`;;
>>>>             profile=*)     ec2_profile=`echo $line | sed s/.*=//`;;
>>>>             tag=*)         ec2_tag=`echo $line | sed s/.*=//`;;
>>>>             quiet*)        quiet=1;;
>>>>             unknown-are-stopped*) unknown_are_stopped=1;;
>>>>             --);;
>>>>             *) ha_log.sh err "Invalid command: $line";;
>>>>         esac
>>>>     done
>>>> fi
>>>>
>>>> while true ; do
>>>>     case "$1" in
>>>>         -o|--action|--option) action=$2;   shift; shift;;
>>>>         -n|--port)            port=$2;     shift; shift;;
>>>>         -p|--profile)         ec2_profile=$2; shift; shift;;
>>>>         -t|--tag)          ec2_tag=$2; shift; shift;;
>>>>         -U|--unknown-are-stopped) unknown_are_stopped=1; shift;;
>>>>         -q|--quiet) quiet=1; shift;;
>>>>         -V|--version) echo "1.0.0"; exit 0;;
>>>>         --help|-h)
>>>>             usage;
>>>>             exit 0;;
>>>>         --) shift ; break ;;
>>>>         *) ha_log.sh err "Unknown option: $1. See --help for details."; exit 1;;
>>>>     esac
>>>> done
>>>>
>>>> [ -n "$1" ] && action=$1
>>>>
>>>> if [ -z "$ec2_profile"]; then
>>>>     options="--output text --profile default"
>>>> else
>>>>     options="--output text --profile $ec2_profile "
>>>> fi
>>>>
>>>> action=`echo $action | tr 'A-Z' 'a-z'`
>>>>
>>>> case $action in
>>>>     metadata)
>>>>         metadata
>>>>     ;;
>>>>     getinfo-xml)
>>>>         getinfo-xml
>>>>     ;;
>>>>     getconfignames)
>>>>         for i in profile port tag
>>>>         do
>>>>             echo $i
>>>>         done
>>>>         exit 0
>>>>     ;;
>>>>     getinfo-devid)
>>>>         echo "EC2 STONITH device"
>>>>         exit 0
>>>>     ;;
>>>>     getinfo-devname)
>>>>         echo "EC2 STONITH external device"
>>>>         exit 0
>>>>     ;;
>>>>     getinfo-devdescr)
>>>>         echo "fence_ec2 is an I/O Fencing agent which can be used with Amazon EC2 instances."
>>>>         exit 0
>>>>     ;;
>>>>     getinfo-devurl)
>>>>         echo ""
>>>>         exit 0
>>>>     ;;
>>>> esac
>>>>
>>>> # get my instance id
>>>> myinstance=`curl http://169.254.169.254/latest/meta-data/instance-id`
>>>>
>>>> # check my status.
>>>> # When the EC2 instance be stopped by the "aws ec2 stop-instances" , the stop processing of the OS is executed.
>>>> # While the OS stop processing, Pacemaker can execute the STONITH processing.
>>>> # So, If my status is not "running", it determined that I was already fenced. And to prevent fencing each other
>>>> # in split-brain, I don't fence other node.
>>>> if [ -z "$myinstance" ]; then
>>>>     ha_log.sh err "Failed to get My Instance ID. so can not check my status."
>>>>     exit 1
>>>> fi
>>>> mystatus=`instance_status $myinstance`
>>>> if [ "$mystatus" != "running" ]; then #do not fence
>>>>     ha_log.sh warn "I was already fenced (My instance status=$mystatus). I don't fence other node."
>>>>     exit 1
>>>> fi
>>>>
>>>> # get target's instance id
>>>> instance=""
>>>> if [ ! -z "$port" ]; then
>>>>     instance=`instance_for_port $port $options`
>>>> fi
>>>>
>>>> case $action in
>>>>     reboot|reset)
>>>>         status=`instance_status $instance`
>>>>         if [ "$status" != "stopped" ]; then
>>>>             instance_off
>>>>         fi
>>>>         while true;
>>>>         do
>>>>             status=`instance_status $instance`
>>>>             if [ "$status" = "stopped" ]; then
>>>>                 break
>>>>             fi
>>>>             sleep $sleep_time
>>>>         done
>>>>         instance_on
>>>>         while true;
>>>>         do
>>>>             status=`instance_status $instance`
>>>>             if [ "$status" = "running" ]; then
>>>>                 break
>>>>             fi
>>>>             sleep $sleep_time
>>>>         done
>>>>     ;;
>>>>     poweron|on)
>>>>         instance_on
>>>>         while true;
>>>>         do
>>>>             status=`instance_status $instance`
>>>>             if [ "$
>> status" = "running" ]; then
>>>>                 break
>>>>             fi
>>>>         done
>>>>     ;;
>>>>     poweroff|off)
>>>>         instance_off
>>>>         while true;
>>>>         do
>>>>             status=`instance_status $instance`
>>>>             if [ "$status" = "stopped" ]; then
>>>>                 break
>>>>             fi
>>>>             sleep $sleep_time
>>>>         done
>>>>     ;;
>>>>     monitor)
>>>>         # Is the device ok?
>>>>         aws ec2 describe-instances $options | grep INSTANCES &> /dev/null
>>>>     ;;
>>>>     gethosts|hostlist|list)
>>>>         # List of names we know about
>>>>         a=`aws ec2 describe-instances $options | awk -v tag_pat="^TAGS¥t$ec2_tag¥t" -F '¥t' '{
>>>>             if (/^INSTANCES/) { printf "%s¥n", $8 }
>>>>             else if ( $1"¥t"$2"¥t" ‾ tag_pat ) { printf "%s¥n", $3 }
>>>>             }' | sort -u`
>>>>         echo $a
>>>>     ;;
>>>>     stat|status)
>>>>         instance_status $instance > /dev/null
>>>>     ;;
>>>>     *) ha_log.sh err "Unknown action: $action"; exit 1;;
>>>> esac
>>>>
>>>> status=$?
>>>>
>>>> if [ $quiet -eq 1 ]; then
>>>>     : nothing
>>>> elif [ $status -eq 0 ]; then
>>>>     ha_log.sh info "Operation $action passed"
>>>> else
>>>>     ha_log.sh err "Operation $action failed: $status"
>>>> fi
>>>> exit $status
>>>
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> Linux-HA at lists.linux-ha.org
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>>
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA at lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>


-------------- next part --------------
#!/bin/bash

description="
fence_ec2 is an I/O Fencing agent which can be used with Amazon EC2 instances.

API functions used by this agent:
- aws ec2 describe-tags
- aws ec2 describe-instances
- aws ec2 stop-instances
- aws ec2 start-instances
- aws ec2 reboot-instances

If the uname used by the cluster node is any of:
 - Public DNS name (or part there of),
 - Private DNS name (or part there of),
 - Instance ID (eg. i-4f15a839)
 - Contents of tag associated with the instance
then the agent should be able to automatically discover the instances it can control.

If the tag containing the uname is not [Name], then it will need to be specified using the [tag] option.
"

#
# Copyright (c) 2011-2013 Andrew Beekhof
# Copyright (c) 2014 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
#                    All Rights Reserved.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like.  Any license provided herein, whether implied or
# otherwise, applies only to this software file.  Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
#
#######################################################################

quiet=0

instance_not_found=0
unknown_are_stopped=0

action_default="reset"         # Default fence action
ec2_tag_default="Name"	       # EC2 Tag containing the instance's uname

sleep_time="1"

[ -n "$tag" ] && ec2_tag="$tag"

: ${ec2_tag=${ec2_tag_default}}

function usage()
{
cat <<EOF
`basename $0` - A fencing agent for Amazon EC2 instances
 
$description
 
Usage: `basename $0` -o|--action [-n|--port] [options]
Options:
 -h, --help 		This text
 -V, --version		Version information
 -q, --quiet 		Reduced output mode
 
Commands:
 -o, --action		Action to perform: on|off|reboot|status|monitor
 -n, --port 		The name of a machine/instance to control/check

Additional Options:
 -p, --profile		Use a specific profile from your credential file.
 -t, --tag 		Name of the tag containing the instance's uname

Dangerous options:
 -U, --unknown-are-stopped 	Assume any unknown instance is safely stopped

EOF
    exit 0;
}

function getinfo_xml()
{
	cat <<EOF
<parameters>
	<parameter name="port" unique="1" required="0">
		<content type="string" />
		<shortdesc lang="en">The name/id/tag of a instance to control/check</shortdesc>
	</parameter>
	<parameter name="profile" unique="0" required="0">
		<content type="string" default="default" />
		<shortdesc lang="en">Use a specific profile from your credential file.</shortdesc>
	</parameter>
	<parameter name="tag" unique="0" required="0">
		<content type="string" default="Name" />
		<shortdesc lang="en">Name of the tag containing the instances uname</shortdesc>
	</parameter>
	<parameter name="unknown_are_stopped" unique="0" required="0">
		<content type="string" default="false" />
		<shortdesc lang="en">DANGER: Assume any unknown instance is safely stopped</shortdesc>
	</parameter>
</parameters>
EOF
	exit 0;
}

function metadata()
{
	cat <<EOF
<?xml version="1.0" ?>
<resource-agent name="fence_ec2" shortdesc="Fencing agent for Amazon EC2 instances" >
	<longdesc>
$description
	</longdesc>
	<parameters>
	<parameter name="action" unique="0" required="1">
		<getopt mixed="-o, --action=[action]" />
		<content type="string" default="reboot" />
		<shortdesc lang="en">Fencing Action</shortdesc>
	</parameter>
	<parameter name="port" unique="1" required="0">
		<getopt mixed="-n, --port=[port]" />
		<content type="string" />
		<shortdesc lang="en">The name/id/tag of a instance to control/check</shortdesc>
	</parameter>
	<parameter name="profile" unique="0" required="0">
		<getopt mixed="-p, --profile=[profile]" />
		<content type="string" default="default" />
		<shortdesc lang="en">Use a specific profile from your credential file.</shortdesc>
	</parameter>
	<parameter name="tag" unique="0" required="0">
		<getopt mixed="-t, --tag=[tag]" />
		<content type="string" default="Name" />
		<shortdesc lang="en">Name of the tag containing the instances uname</shortdesc>
	</parameter>
	<parameter name="unknown-are-stopped" unique="0" required="0">
		<getopt mixed="-U, --unknown-are-stopped" />
		<content type="string" default="false" />
		<shortdesc lang="en">DANGER: Assume any unknown instance is safely stopped</shortdesc>
	</parameter>
	</parameters>
	<actions>
	<action name="on" />
	<action name="off" />
	<action name="reboot" />
	<action name="status" />
	<action name="list" />
	<action name="monitor" />
	<action name="metadata" />
	</actions>
</resource-agent>
EOF
	exit 0;
}

function instance_for_port()
{
	local port=$1
	local instance=""

	# Look for port name -n in the INSTANCE data
	instance=`aws ec2 describe-instances $options | grep "^INSTANCES[[:space:]].*[[:space:]]$port[[:space:]]" | awk '{print $8}'`
	if [ -z $instance ]; then
		# Look for port name -n in the Name TAG
		instance=`aws ec2 describe-tags $options | grep "^TAGS[[:space:]]$ec2_tag[[:space:]].*[[:space:]]instance[[:space:]]$port$" | awk '{print $3}'`
	fi

	if [ -z $instance ]; then
		instance_not_found=1
		instance=$port
	fi

	echo $instance
}

function instance_on()
{
	aws ec2 start-instances $options --instance-ids $instance
}

function instance_off()
{
	if [ "$unknown_are_stopped" = 1 -a $instance_not_found ]; then
		: nothing to do
		ha_log.sh info "Assuming unknown instance $instance is already off"
	else
		aws ec2 stop-instances $options --instance-ids $instance --force
	fi
}

function instance_status()
{
	local instance=$1
	local status="unknown"
	local rc=1

	# List of instances and their current status
	if [ "$unknown_are_stopped" = 1 -a $instance_not_found ]; then
		ha_log.sh info "$instance stopped (unknown)"
	else
		status=`aws ec2 describe-instances $options --instance-ids $instance | awk '{ 
			if (/^STATE\t/) { printf "%s", $3 }
			}'`
		rc=$?
	fi
	ha_log.sh info "status check for $instance is $status"
	echo $status
	return $rc
}

function monitor()
{
		# Is the device ok?
		aws ec2 describe-instances $options | grep INSTANCES &> /dev/null
}

TEMP=`getopt -o qVho:e:p:n:t:U --long version,help,action:,port:,option:,profile:,tag:,quiet,unknown-are-stopped \
     -n 'fence_ec2' -- "$@"`

if [ $? != 0 ];then 
    usage
    exit 1
fi

# Note the quotes around `$TEMP': they are essential!
eval set -- "$TEMP"

if [ -z $1 ]; then
	# If there are no command line args, look for options from stdin
	while read line; do
		case $line in 
			option=*|action=*) action=`echo $line | sed s/.*=//`;;
			port=*)        port=`echo $line | sed s/.*=//`;;
			profile=*)     ec2_profile=`echo $line | sed s/.*=//`;;
			tag=*)         ec2_tag=`echo $line | sed s/.*=//`;;
			quiet*)        quiet=1;;
			unknown-are-stopped*) unknown_are_stopped=1;;
			--);;
			*) ha_log.sh err "Invalid command: $line";;
		esac
	done
fi

while true ; do
	case "$1" in
		-o|--action|--option) action=$2;   shift; shift;;
		-n|--port)            port=$2;     shift; shift;;
		-p|--profile)         ec2_profile=$2; shift; shift;;
		-t|--tag)	      ec2_tag=$2; shift; shift;;
		-U|--unknown-are-stopped) unknown_are_stopped=1; shift;;
		-q|--quiet) quiet=1; shift;;
		-V|--version) echo "1.0.0"; exit 0;;
		--help|-h) 
			usage;
			exit 0;;
		--) shift ; break ;;
		*) ha_log.sh err "Unknown option: $1. See --help for details."; exit 1;;
	esac
done

[ -n "$1" ] && action=$1
[ -n "$2" ] && node_to_fence=$2

if [ -z "$ec2_profile"]; then
	options="--output text --profile default"
else
	options="--output text --profile $ec2_profile "
fi

action=`echo $action | tr 'A-Z' 'a-z'`

case $action in 
	metadata)
		metadata
	;;
	getinfo-xml)
		getinfo_xml
	;;
	getconfignames)
		for i in profile port tag unknown_are_stopped
		do
			echo $i
		done
		exit 0
	;;
	getinfo-devid)
		echo "EC2 STONITH device"
		exit 0
	;;
	getinfo-devname)
		echo "EC2 STONITH external device"
		exit 0
	;;
	getinfo-devdescr)
		echo "ec2 is an I/O Fencing agent which can be used with Amazon EC2 instances."
		exit 0
	;;
	getinfo-devurl)
		echo ""
		exit 0
	;;
esac

# get my instance id
myinstance=`curl http://169.254.169.254/latest/meta-data/instance-id`

# check my status.
# When the EC2 instance be stopped by the "aws ec2 stop-instances" , the stop processing of the OS is executed.
# While the OS stop processing, Pacemaker can execute the STONITH processing.
# So, If my status is not "running", it determined that I was already fenced. And to prevent fencing each other
# in split-brain, I don't fence other node.
if [ -z "$myinstance" ]; then
	ha_log.sh err "Failed to get My Instance ID. so can not check my status."
	exit 1
fi
mystatus=`instance_status $myinstance`
if [ "$mystatus" != "running" ]; then #do not fence
	ha_log.sh warn "I was already fenced (My instance status=$mystatus). I don't fence other node."
	exit 1
fi

if [ -z "$port" ]; then
	port="$node_to_fence"
fi

# get target's instance id
instance=""
if [ ! -z "$port" ]; then
	instance=`instance_for_port $port $options`
fi

case $action in 
	reboot|reset)
		status=`instance_status $instance`
		if [ "$status" != "stopped" ]; then
			instance_off
		fi
		while true;
		do
			status=`instance_status $instance`
			if [ "$status" = "stopped" ]; then
				break
			fi
			sleep $sleep_time
		done
		instance_on
		while true;
		do
			status=`instance_status $instance`
			if [ "$status" = "running" ]; then
				break
			fi
			sleep $sleep_time
		done
	;;
	poweron|on)
		instance_on
		while true;
		do
			status=`instance_status $instance`
			if [ "$status" = "running" ]; then
				break
			fi
		done
	;;
	poweroff|off)
		instance_off
		while true;
		do
			status=`instance_status $instance`
			if [ "$status" = "stopped" ]; then
				break
			fi
			sleep $sleep_time
		done
	;;
	monitor)
		monitor
	;;
	gethosts|hostlist|list)
		# List of names we know about
		a=`aws ec2 describe-instances $options | awk -v tag_pat="^TAGS\t$ec2_tag\t" -F '\t' '{ 
			if (/^INSTANCES/) { printf "%s\n", $8 }
			else if ( $1"\t"$2"\t" ~ tag_pat ) { printf "%s\n", $3 }
			}' | sort -u`
		echo $a
	;;
	stat|status)
		monitor
	;;
	*) ha_log.sh err "Unknown action: $action"; exit 1;;
esac

status=$?

if [ $quiet -eq 1 ]; then
	: nothing
elif [ $status -eq 0 ]; then
	ha_log.sh info "Operation $action passed"
else
	ha_log.sh err "Operation $action failed: $status"
fi
exit $status


More information about the Users mailing list