[ClusterLabs] Information regrading pacemaker integration with user application

Thu Mar 19 12:07:41 UTC 2015

Hello Ken

I am trying to write a sample C based OCF agent to see how to create a resource using my script 
I have taken the code base from Ipv6addr.c and I have added sample prints inside start,stop,monitor
I compiled using gcc and added the executable underpath :/usr/lib/ocf/resource.d/heartbeat/sample(name of executable)
Now I used pcs resource create command to create a sample resource called sample_test .
On checking the pcs status it shows resource is added but in Failed actions it shows 
-----------------------------------------------------------------------------------------------------------------------------------------------------
pcs status
Last updated: Thu Mar 19 11:30:52 2015
Last change: Thu Mar 19 11:07:38 2015 via cibadmin on VM-15
Stack: cman
Current DC: VM4 - partition with quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, unknown expected votes
2 Resources configured.

Online: [ VM-15 VM4 ]

Full list of resources:

 my_first_svc   (ocf::pacemaker:Dummy): Started VM4
 sample_test    (ocf::heartbeat:sample):        Started VM4

Failed actions:
    my_first_svc_monitor_120000 (node=VM-15, call=11, rc=7, status=complete): not running
    ipv6_svc_monitor_0 (node=VM-15, call=22, rc=2, status=complete): invalid parameter
    sample_test_monitor_0 (node=VM-15, call=49, rc=5, status=complete): not installed

    ipv6_svc_monitor_0 (node=VM4, call=16, rc=2, status=complete): invalid parameter
    sample_test1_monitor_0 (node=VM4, call=31, rc=5, status=complete): not installed

I am not sure if Iam doing the right steps to integrate a sample to pacemaker  
Can you please help on how to proceed 

Thanks
ramya

-----Original Message-----
From: Ken Gaillot [mailto:kgaillot at redhat.com] 
Sent: 17 March 2015 21:51
To: Ramya Ramadurai; Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] Information regrading pacemaker integration with user application

On 03/17/2015 11:20 AM, Ramya Ramadurai wrote:
> Hi Ken
> 
> Thanks a lot for clarifying the doubts .
> Can you share any sample C based resource agent code which we can use as reference for an application .
> I have seen the LSB /OCF based scripts but in our application there 
> can be various scenarios where we need to monitor various Faults(other 
> than process kill it can be related hardware related faults or process 
> itself is not able to process due to CPU overload/memory threshold etc 
> ) In this case we can write some specific monitoring function in 
> resource agent to communicate the fault error codes to cluster 
> management to take Appropriate actions
> 
> Thanks
> ramya

The resource agent interface is really simple:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf

The IPv6addr resource is the only one I know of that is written in C:
https://github.com/ClusterLabs/resource-agents/tree/master/heartbeat

For monitoring you have a few options.

You could implement the monitoring checks inside your resource agent, as you suggest. If these checks will always be the same, this might be the simplest option.

Another way to do monitoring would be to implement a separate resource agent for "system health", and make your application resource depend on it. This would keep your monitoring within the cluster stack, and it would logically separate application monitoring from system monitoring.

The most flexible option would be to use a separate software package for system health monitoring (such as icinga, nagios, monit, etc.). These have extensive built-in features so you don't have to implement them yourself. You can configure those systems to take action (such as trigger a resource migration) when some condition is met. Madison Kelly of Alteeve's Niche! has mentioned she has a custom monitoring system targeted specially to pacemaker clusters.

It is certainly feasible to do this on your own, but if you are interested in commercial support to get help with pacemaker cluster design, hardware requirements, etc., several companies offer that, including at least Alteeve's Niche!, LINBIT, SUSE and Red Hat.
(Disclaimer, I'm a developer at Red Hat.)

> -----Original Message-----
> From: Ken Gaillot [mailto:kgaillot at redhat.com]
> Sent: 17 March 2015 19:08
> To: users at clusterlabs.org
> Subject: Re: [ClusterLabs] Information regrading pacemaker integration 
> with user application
> 
> On 03/16/2015 12:56 AM, Ramya Ramadurai wrote:
>> We are evaluating pacemaker HA provided by linux to integrate with 
>> the user application being developed for a telecom system The 
>> requirement of this application in a high level is 1)The application 
>> is to be run on a ATCA chassis 46XX which is a 6 slot chassis consisting of 2 SCM (shelf management cards) and 4 CPM -9 blades The application will be sitting on CPM-9 blades .
>> 2)In order to support the HA for this application ,we are planning to make one of the CPM-9 cards as standby card which will take over once the active goes faulty.Active unit can go fault either when application goes down or due to ATCA hardware ,any ATCA related faults etc.
>> 3)The application after switchover to standby unit should function as 
>> if there should be no downtime(probably in the order of millisec) and it should maintain all the Real time data.
>>
>> We have gone through the Clusters from scratch document and basic 
>> information on how pacemaker works .But we did not find any example 
>> where pacemaker exposes some API For any user application to be integrated From the doxygen related documentation also ,we are unable to map on what is the way the functions are called,what is the purpose of functions etc ..
>> Can you please help us in getting some details and in depth on 
>> pacemaker internals.Also it would be of great help if you can suggest 
>> on how to integrate the above application to pacemaker,the components 
>> to be built around pacemaker to interact with ATCA
> 
> Hi Ramya,
> 
> I don't think you need the pacemaker API; configuration is likely all you need. At most, you may need to write custom resource agent(s) and/or fence agent(s) if ones are not already available for your needs.
> 
> Your active and standby CPM-9s would be cluster nodes in the pacemaker configuration. You might want to consider adding a third card to the cluster to support quorum (the quorum node doesn't have to be allowed to run the application, and it doesn't have to be exclusively dedicated to the cluster -- it's just a tiebreaker vote if communication fails between the other nodes).
> 
> Resource agents and fence agents are simple programs (usually shell
> scripts) that take commands from pacemaker via the command line and return status via the exit code.
> 
> So whatever your user application is, it would have a resource agent.
> The agent accepts commands such as start and stop, and returns particular exit codes for success, failure, etc.
> 
> The fence agent is, as a last resort, to kill a CPM-9 if it is no longer responding and the cluster can't guarantee otherwise that it's not accessing shared resources. The fence agent would accept certain commands from pacemaker, and then talk to the ATCA controller to power down or otherwise reset the targeted CPM-9.
> 
> Maintaining identical user data between the active and standby nodes is straightforward and a common cluster design. A common setup is using DRBD to mirror the data (and a clustered file system on top of that if both nodes need to access the data at the same time). "Clusters From Scratch" has a good example of this. Other setups are possible, e.g.
> using network storage such as a SAN. All of this is done via pacemaker configuration.
> 
> Your downtime requirements will be much trickier. Pacemaker detects outages by periodically polling the resource agent for status. So you have the polling interval (at least 1 second), then a timeout for receiving the status (dependent on your application and resource agent), then the startup time for the application on the other node (dependent on your application).
> 
> Depending on your application, you may be able to run it as a "clone", so that it is always running on both nodes, and then use a floating IP address to talk to it. Moving an IP address is quick, so that would eliminate the need to wait for application startup. However you still have the polling interval and status timeout.
> 
> -- Ken Gaillot <kgaillot at redhat.com>
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>