[ClusterLabs] Fence agent definition under Centos7.6

Michael Powell Michael.Powell at harmonicinc.com
Tue Jun 18 09:08:46 EDT 2019


Thanks for the help.  I'm not sure how I missed the earlier reply, but this was most helpful.  I elected to follow the 2nd option suggested:  rewrite the custom fence agent to conform to the FenceAgentAPI.md document.  I do have a couple of comments/questions regarding the FenceAgentsAPI.md document, however. 

* The document does not specify where the agent should be installed.  (On /usr/sbin.)
* The document does not mention that, if the agent requires root permissions (a not-unlikely scenario, I think), it needs the setuid bit set (e.g. chmod u+s /usr/sbin/fence-agent), or some similar mechanism, since crmd and stonith-ng apparently do not execute the fence agent with root privileges.
* The document does not specify support for "action=metadata".  I got this from https://github.com/ClusterLabs/resource-agents/blob/master/doc/dev-guides/ra-dev-guide.asc and by looking at the output of the fence-ipmilan agent.

Another issue I found is that I needed to use "pcs create stonith" rather than "crm configure primitive" to create the agent.  Our old code was based upon crmsh, and my on-line reading suggested that the choice of crmsh vs pcs was up to the user.  This is apparently not true.  For the most part, it looks as though transitioning our scripts from crmsh to pcs will be pretty straightforward, though.

Finally, I have two questions:  
* The FenceAgentsAPI.md doc describes the cluster.conf file (though, again, it does not indicate where it should be installed.)  AFAICT, this isn't really necessary for the fence agent to reboot another node.  Am I missing something?
* In some cases, the fence agent receives "action=reboot" followed by "nodename=<name of the node running the agent>".  I.e. it's asked to reboot itself.  This doesn't make sense to me.  Again, what am I missing?

Regards,
  Michael



-----Original Message-----
From: Users <users-bounces at clusterlabs.org> On Behalf Of users-request at clusterlabs.org
Sent: Friday, June 14, 2019 12:50 AM
To: users at clusterlabs.org
Subject: [EXTERNAL] Users Digest, Vol 53, Issue 14

Send Users mailing list submissions to
	users at clusterlabs.org

To subscribe or unsubscribe via the World Wide Web, visit
	https://lists.clusterlabs.org/mailman/listinfo/users
or, via email, send a message with subject or body 'help' to
	users-request at clusterlabs.org

You can reach the person managing the list at
	users-owner at clusterlabs.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of Users digest..."


Today's Topics:

   1. FW: Fence agent definition under Centos7.6 (Michael Powell)
   2. Re: FW: Fence agent definition under Centos7.6 (Ken Gaillot)
   3. resource-agents v4.3.0 rc1 (Oyvind Albrigtsen)


----------------------------------------------------------------------

Message: 1
Date: Thu, 13 Jun 2019 19:58:54 +0000
From: Michael Powell <Michael.Powell at harmonicinc.com>
To: "users at clusterlabs.org" <users at clusterlabs.org>
Subject: [ClusterLabs] FW: Fence agent definition under Centos7.6
Message-ID:
	<MWHPR11MB14862E6BBD50EC69EF23304388EF0 at MWHPR11MB1486.namprd11.prod.outlook.com>
	
Content-Type: text/plain; charset="us-ascii"

I'm basically re-posting this request again, since I've gotten no response over the last two weeks.  If someone can take pity on a newbie, I'd sure appreciate it.

In the interim, I've done some experiments, trying to use fence-ipmilan in lieu of the mgpstonith fence agent described in the previous e-mail.  Without going into a lot of details, the results have been unsatisfactory, so I've renewed my efforts to get the in-house mgpstonith fence agent to work.

I'm still not sure about the specific question of where the mgpstonith executable needs to reside.  By moving it from /usr/lib64/stonith/plugins/external to /usr/lib64/stonith/plugins, and /usr/sbin,  I was able to eliminate the "Unknown fence agent" error.   That said, the following commands produce the subsequent log error messages:

        crm configure primitive mgraid-stonith stonith:mgpstonith \
            params hostlist="mgraid-canister" \
            meta requires="quorum" \
            op monitor interval="0" timeout="20s"

This produces the following messages to stderr:

ERROR: stonith:mgpstonith: got no meta-data, does this RA exist?
ERROR: stonith:mgpstonith: got no meta-data, does this RA exist?
ERROR: stonith:mgpstonith: no such resource agent


What would be most helpful at this point is a full description of the Fence Agent API.

Regards,
  Michael Powell

From: Michael Powell
Sent: Friday, May 31, 2019 3:33 PM
To: users at clusterlabs.org
Subject: Fence agent definition under Centos7.6

Although I am personally a novice wrt cluster operation, several years ago my company developed a product that used Pacemaker.  I've been charged with porting that product to a platform running Centos 7.6.  The old product ran Pacemaker 1.1.13 and heartbeat.  For the most part, the transition to Pacemaker 1.1.19 and Corosync has gone pretty well, but there's one aspect that I'm struggling with: fence-agents.

The old product used a fence agent developed in house to implement STONITH.  While it was no trouble to compile and install the code, named mgpstonith, I see lots of messages like the following in the system log -

stonith-ng[31120]:    error: Unknown fence agent: external/mgpstonith
stonith-ng[31120]:    error: Agent external/mgpstonith not found or does not support meta-data: Invalid argument (22)
stonith-ng[31120]:    error: Could not retrieve metadata for fencing agent external/mgpstonith

I've put debug messages in mgpstonith, and as they do not appear in the system log, I've inferred that it is in fact never executed.

Initially, I installed mgpstonith on /lib64/stonith/plugins/external, which is where it was located on the old product.  I've copied it to other locations, e.g. /usr/sbin, with no better luck.  I've searched the web and while I've found lots of information about using the available fence agents, I've not turned up any information on how to create one "from scratch".

Specifically, I need to know where to put mgpstonith on the target system(s).  Generally, I'd appreciate a pointer to any documentation/specification relevant to writing code for a fence agent.

Thanks,
  Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190613/c35fc22d/attachment-0001.html>

------------------------------

Message: 2
Date: Thu, 13 Jun 2019 15:39:15 -0500
From: Ken Gaillot <kgaillot at redhat.com>
To: Cluster Labs - All topics related to open-source clustering
	welcomed <users at clusterlabs.org>
Subject: Re: [ClusterLabs] FW: Fence agent definition under Centos7.6
Message-ID:
	<6a21a0729d034e4584c8dea76667691ffb733260.camel at redhat.com>
Content-Type: text/plain; charset="UTF-8"

Maybe you weren't subscribed to the list when you posted? There was a
reply:

https://lists.clusterlabs.org/pipermail/users/2019-May/025847.html

On Thu, 2019-06-13 at 19:58 +0000, Michael Powell wrote:
> I?m basically re-posting this request again, since I?ve gotten no 
> response over the last two weeks.  If someone can take pity on a 
> newbie, I?d sure appreciate it.
>  
> In the interim, I?ve done some experiments, trying to use fence- 
> ipmilan in lieu of the mgpstonith fence agent described in the 
> previous e-mail.  Without going into a lot of details, the results 
> have been unsatisfactory, so I?ve renewed my efforts to get the in- 
> house mgpstonith fence agent to work.
>  
> I?m still not sure about the specific question of where the mgpstonith 
> executable needs to reside.  By moving it from 
> /usr/lib64/stonith/plugins/external to /usr/lib64/stonith/plugins, and 
> /usr/sbin,  I was able to eliminate the ?Unknown fence agent?
> error.   That said, the following commands produce the subsequent log
> error messages:
>  
>         crm configure primitive mgraid-stonith stonith:mgpstonith \
>             params hostlist="mgraid-canister" \
>             meta requires=?quorum? \
>             op monitor interval="0" timeout="20s" 
>  
> This produces the following messages to stderr:
>  
> ERROR: stonith:mgpstonith: got no meta-data, does this RA exist?
> ERROR: stonith:mgpstonith: got no meta-data, does this RA exist?
> ERROR: stonith:mgpstonith: no such resource agent
>  
>  
> What would be most helpful at this point is a full description of the 
> Fence Agent API.
>  
> Regards,
>   Michael Powell
>  
> From: Michael Powell
> Sent: Friday, May 31, 2019 3:33 PM
> To: users at clusterlabs.org
> Subject: Fence agent definition under Centos7.6
>  
> Although I am personally a novice wrt cluster operation, several years 
> ago my company developed a product that used Pacemaker.  I?ve been 
> charged with porting that product to a platform running Centos 7.6.  
> The old product ran Pacemaker 1.1.13 and heartbeat.  For the most 
> part, the transition to Pacemaker 1.1.19 and Corosync has gone pretty 
> well, but there?s one aspect that I?m struggling with: fence- agents.
>  
> The old product used a fence agent developed in house to implement 
> STONITH.  While it was no trouble to compile and install the code, 
> named mgpstonith, I see lots of messages like the following in the 
> system log ?
>  
> stonith-ng[31120]:    error: Unknown fence agent:
> external/mgpstonith                                                
> stonith-ng[31120]:    error: Agent external/mgpstonith not found or
> does not support meta-data: Invalid argument (22)
> stonith-ng[31120]:    error: Could not retrieve metadata for fencing
> agent external/mgpstonith                       
>  
> I?ve put debug messages in mgpstonith, and as they do not appear in 
> the system log, I?ve inferred that it is in fact never executed.
>  
> Initially, I installed mgpstonith on /lib64/stonith/plugins/external, 
> which is where it was located on the old product.  I?ve copied it to 
> other locations, e.g. /usr/sbin, with no better luck.  I?ve searched 
> the web and while I?ve found lots of information about using the 
> available fence agents, I?ve not turned up any information on how to 
> create one ?from scratch?.
>  
> Specifically, I need to know where to put mgpstonith on the target 
> system(s).  Generally, I?d appreciate a pointer to any 
> documentation/specification relevant to writing code for a fence 
> agent.
>  
> Thanks,
>   Michael
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot <kgaillot at redhat.com>



------------------------------

Message: 3
Date: Fri, 14 Jun 2019 09:50:19 +0200
From: Oyvind Albrigtsen <oalbrigt at redhat.com>
To: developers at clusterlabs.org, users at clusterlabs.org
Subject: [ClusterLabs] resource-agents v4.3.0 rc1
Message-ID: <20190614075019.44mhx33w2g4ctlcl at redhat.com>
Content-Type: text/plain; charset=us-ascii; format=flowed

ClusterLabs is happy to announce resource-agents v4.3.0 rc1.
Source code is available at:
https://github.com/ClusterLabs/resource-agents/releases/tag/v4.3.0rc1

The most significant enhancements in this release are:
- new resource agents:
  - dovecot
  - vdo-vol

- bugfixes and enhancements:
  - Build: improve to be able to build and install on RHEL 6
  - CTDB: add ctdb_max_open_files parameter
  - CTDB: fix version string with vendor trailer comparison
  - Filesystem: Fix missing mount point due to corrupted mount list
  - Filesystem: fix umount not executed in the event of a disk failure
  - IPaddr2: add network namespace support
  - IPsrcaddr: make proto optional to fix regression when used without NetworkManager
  - LVM-activate: align dmsetup report command to standard
  - LVM-activate: dont count "No devices" as device in dm_count
  - LVM-activate: dont fail initial probe
  - LVM-activate: make vgname not uniqe
  - LVM-activate: only check locking_type when LVM < v2.03
  - LVM-activate: return OCF_NOT_RUNNING on initial probe
  - LVM: return $OCF_ERR_GENERIC when start fails
  - Maint: introduce optional spellchecking for {short,long}desc (make spellcheck)
  - Route: make family parameter optional
  - SAPDatabase: metadata: add HANA usage example and improved the Monitor Services defaults documentation
  - Squid: fix PID file issue w/newer Squid versions
  - aws-vpc-move-ip: add support for multiple network interfaces
  - aws-vpc-move-ip: add support for multiple routing tables
  - aws-vpc-move-ip: get NETWORK_INTERFACE_ID from metadata instead of using awscli
  - aws-vpc-move-ip: improve MAC address detection
  - aws-vpc-move-ip: use --query to avoid possible race condition w/old grep implementation
  - azure-events: fix implicit bytes conversion that breaks Python 3
  - clvm: support exclusive mode
  - configure: add Python library detection
  - dhcpd: keep SELinux context when copying to chroot
  - docker: fail gracefully when command not found
  - docker: use --type=container to avoid matches from other types
  - ethmonitor: check if interface exists by link
  - galera: Allow empty password for "check_passwd" parameter
  - galera: Log message when changing content of grastate.dat file
  - galera: ignore safe_to_bootstrap in grastate.dat in some cases
  - gcp-vpc-move-route/gcp-vpc-move-vip: fix Python 3 encoding issue
  - lxc: add support for lxc-stop
  - named: add host_options parameter
  - ocf-distro: add regex for RedHat version
  - ocf.py: add support for role argument to actions
  - ocf: do not log at debug log level when HA_debug is unset (e.g. w/Pacemaker remote)
  - openstack*: add support for re-attaching volumes, v3 API
  - pgsql: enhance checks in pgsql_real_start to prevent incorrect status
  - pgsql: set initial score for primary and hot standby in probe
  - podman: avoid double call to podman inspect
  - ra-dev-guide: correct notify action documentation
  - rabbitmq-cluster: always use quiet flag for eval calls
  - rabbitmq-cluster: debug log detailed output when mnesia query fails
  - rabbitmq-cluster: ensure node attributes are removed
  - rabbitmq-cluster: fix regression in rmq_stop
  - redis: Filter warning from stderr when calling 'redis-cli -a'
  - tomcat: use systemd on RHEL when catalina.sh is unavailable
  - vsftpd: fix missing $ on invalid exit code detected by CI

The full list of changes for resource-agents is available at:
https://github.com/ClusterLabs/resource-agents/blob/v4.3.0rc1/ChangeLog

Everyone is encouraged to download and test the new release candidate.
We do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

Many thanks to all the contributors to this release.


Best,
The resource-agents maintainers


------------------------------

Subject: Digest Footer

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

------------------------------

End of Users Digest, Vol 53, Issue 14
*************************************


More information about the Users mailing list