[ClusterLabs] Information regrading pacemaker integration with user application

Mon Mar 23 01:56:27 EDT 2015

Hello Ken

Thanks for your reply. Please find my answers below(marked starting Ramya@) 

-----Original Message-----
From: Ken Gaillot [mailto:kgaillot at redhat.com] 
Sent: 19 March 2015 22:24
To: Ramya Ramadurai; Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] Information regrading pacemaker integration with user application

On 03/19/2015 08:07 AM, Ramya Ramadurai wrote:
> Hello Ken
> 
> I am trying to write a sample C based OCF agent to see how to create a 
> resource using my script I have taken the code base from Ipv6addr.c 
> and I have added sample prints inside start,stop,monitor I compiled 
> using gcc and added the executable underpath 
> :/usr/lib/ocf/resource.d/heartbeat/sample(name of executable)

This isn't causing any problem, but for best practices, create a directory /usr/lib/ocf/resource.d/radisys and put your executable there.
You can then refer to it in the configuration as ocf:radisys:sample.
This allows you to keep your code separate from the upstream to avoid any issues with package updates etc.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Ramya@ I added the sample script  under /usr/lib/ocf/resource.d/SAMPLE/sample (sample is the executable name) on both the nodes forming the cluster
Somehow this time when I did pcs resource create I see that the sample resource -sample_test5 got created without any errors
Please see the output 

Command ->pcs resource create sample_test5 ocf:SAMPLE:sample ipv="1.2.2.2"

pcs status
Last updated: Mon Mar 23 05:06:47 2015
Last change: Mon Mar 23 05:06:45 2015 via cibadmin on VM4
Stack: cman
Current DC: VM4 - partition with quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, unknown expected votes
2 Resources configured.

Online: [ VM-15 VM4 ]

Full list of resources:

 my_first_svc   (ocf::pacemaker:Dummy): Started VM4
 sample_test5   (ocf::SAMPLE:sample):   Started VM-15
 sample_test    (ocf::heartbeat:sample):         ORPHANED Started VM4 (unmanaged) FAILED
 sample_test2   (ocf::SAMPLE:sample):    ORPHANED Started VM4 (unmanaged) FAILED

Failed actions:
    my_first_svc_monitor_120000 (node=VM-15, call=11, rc=7, status=complete): not running
    ipv6_svc_monitor_0 (node=VM-15, call=22, rc=2, status=complete): invalid parameter
    sample_test_monitor_0 (node=VM-15, call=49, rc=5, status=complete): not installed
    sample_test1_monitor_0 (node=VM-15, call=54, rc=5, status=complete): not installed
    sample_test2_monitor_0 (node=VM-15, call=64, rc=5, status=complete): not installed
    sample_test4_monitor_0 (node=VM-15, call=116, rc=5, status=complete): not installed
    ipv6_svc_monitor_0 (node=VM4, call=16, rc=2, status=complete): invalid parameter
    sample_test_stop_0 (node=VM4, call=34, rc=2, status=complete): invalid parameter
    sample_test1_monitor_0 (node=VM4, call=31, rc=5, status=complete): not installed
    sample_test2_stop_0 (node=VM4, call=42, rc=2, status=complete): invalid parameter
    sample_test4_monitor_0 (node=VM4, call=55, rc=4, status=complete): insufficient privileges

[root at VM4 ~]#

Please see my attached sample.c which I am running. Now I have two issues
1)I need to check the prints inside by program to be printed in /var/log/messages  or syslog 
When I run the command "pcs create" I see none of the prints of any functions are seen in /var/log.
So Iam not sure what is being executed whether the functionality which I have written is being called/executed?

Can you see what might be the issue and how can we test this resource agent integrated with pacemaker 

2)With this script Iam trying to do the below things
- Iam writing a "example" application  which will be invoked using this sample during the call to start function
-I will monitor the application "example" by sending some messages to the "example" program and once it responds back to sample 
I will return the OCF status 
Again as Iam not able to see the start function being called so if I try to run the example program using system command it is not running

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

> Now I used pcs resource create command to create a sample resource called sample_test .
> On checking the pcs status it shows resource is added but in Failed 
> actions it shows
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ---------
> pcs status
> Last updated: Thu Mar 19 11:30:52 2015 Last change: Thu Mar 19 
> 11:07:38 2015 via cibadmin on VM-15
> Stack: cman
> Current DC: VM4 - partition with quorum
> Version: 1.1.8-7.el6-394e906
> 2 Nodes configured, unknown expected votes
> 2 Resources configured.
> 
> 
> Online: [ VM-15 VM4 ]
> 
> Full list of resources:
> 
>  my_first_svc   (ocf::pacemaker:Dummy): Started VM4
>  sample_test    (ocf::heartbeat:sample):        Started VM4
> 
> Failed actions:
>     my_first_svc_monitor_120000 (node=VM-15, call=11, rc=7, status=complete): not running
>     ipv6_svc_monitor_0 (node=VM-15, call=22, rc=2, status=complete): invalid parameter
>     sample_test_monitor_0 (node=VM-15, call=49, rc=5, 
> status=complete): not installed
> 
>     ipv6_svc_monitor_0 (node=VM4, call=16, rc=2, status=complete): invalid parameter
>     sample_test1_monitor_0 (node=VM4, call=31, rc=5, status=complete): 
> not installed
> 
> I am not sure if Iam doing the right steps to integrate a sample to 
> pacemaker Can you please help on how to proceed

It's difficult to say without seeing the configuration and agents.
Pacemaker is reporting back the status it's getting from the resource agent. For example, "not installed" means the resource agent is exiting with status code 5.

For a list of the status codes and what they mean see http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-ocf-return-codes

> -----Original Message-----
> From: Ken Gaillot [mailto:kgaillot at redhat.com]
> Sent: 17 March 2015 21:51
> To: Ramya Ramadurai; Cluster Labs - All topics related to open-source 
> clustering welcomed
> Subject: Re: [ClusterLabs] Information regrading pacemaker integration 
> with user application
> 
> On 03/17/2015 11:20 AM, Ramya Ramadurai wrote:
>> Hi Ken
>>
>> Thanks a lot for clarifying the doubts .
>> Can you share any sample C based resource agent code which we can use as reference for an application .
>> I have seen the LSB /OCF based scripts but in our application there 
>> can be various scenarios where we need to monitor various 
>> Faults(other than process kill it can be related hardware related 
>> faults or process itself is not able to process due to CPU 
>> overload/memory threshold etc
>> ) In this case we can write some specific monitoring function in 
>> resource agent to communicate the fault error codes to cluster 
>> management to take Appropriate actions
>>
>> Thanks
>> ramya
> 
> The resource agent interface is really simple:
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemak
> er_Explained/index.html#ap-ocf
> 
> The IPv6addr resource is the only one I know of that is written in C:
> https://github.com/ClusterLabs/resource-agents/tree/master/heartbeat
> 
> For monitoring you have a few options.
> 
> You could implement the monitoring checks inside your resource agent, as you suggest. If these checks will always be the same, this might be the simplest option.
> 
> Another way to do monitoring would be to implement a separate resource agent for "system health", and make your application resource depend on it. This would keep your monitoring within the cluster stack, and it would logically separate application monitoring from system monitoring.
> 
> The most flexible option would be to use a separate software package for system health monitoring (such as icinga, nagios, monit, etc.). These have extensive built-in features so you don't have to implement them yourself. You can configure those systems to take action (such as trigger a resource migration) when some condition is met. Madison Kelly of Alteeve's Niche! has mentioned she has a custom monitoring system targeted specially to pacemaker clusters.
> 
> It is certainly feasible to do this on your own, but if you are interested in commercial support to get help with pacemaker cluster design, hardware requirements, etc., several companies offer that, including at least Alteeve's Niche!, LINBIT, SUSE and Red Hat.
> (Disclaimer, I'm a developer at Red Hat.)
> 
>> -----Original Message-----
>> From: Ken Gaillot [mailto:kgaillot at redhat.com]
>> Sent: 17 March 2015 19:08
>> To: users at clusterlabs.org
>> Subject: Re: [ClusterLabs] Information regrading pacemaker 
>> integration with user application
>>
>> On 03/16/2015 12:56 AM, Ramya Ramadurai wrote:
>>> We are evaluating pacemaker HA provided by linux to integrate with 
>>> the user application being developed for a telecom system The 
>>> requirement of this application in a high level is 1)The application 
>>> is to be run on a ATCA chassis 46XX which is a 6 slot chassis consisting of 2 SCM (shelf management cards) and 4 CPM -9 blades The application will be sitting on CPM-9 blades .
>>> 2)In order to support the HA for this application ,we are planning to make one of the CPM-9 cards as standby card which will take over once the active goes faulty.Active unit can go fault either when application goes down or due to ATCA hardware ,any ATCA related faults etc.
>>> 3)The application after switchover to standby unit should function 
>>> as if there should be no downtime(probably in the order of millisec) and it should maintain all the Real time data.
>>>
>>> We have gone through the Clusters from scratch document and basic 
>>> information on how pacemaker works .But we did not find any example 
>>> where pacemaker exposes some API For any user application to be integrated From the doxygen related documentation also ,we are unable to map on what is the way the functions are called,what is the purpose of functions etc ..
>>> Can you please help us in getting some details and in depth on 
>>> pacemaker internals.Also it would be of great help if you can 
>>> suggest on how to integrate the above application to pacemaker,the 
>>> components to be built around pacemaker to interact with ATCA
>>
>> Hi Ramya,
>>
>> I don't think you need the pacemaker API; configuration is likely all you need. At most, you may need to write custom resource agent(s) and/or fence agent(s) if ones are not already available for your needs.
>>
>> Your active and standby CPM-9s would be cluster nodes in the pacemaker configuration. You might want to consider adding a third card to the cluster to support quorum (the quorum node doesn't have to be allowed to run the application, and it doesn't have to be exclusively dedicated to the cluster -- it's just a tiebreaker vote if communication fails between the other nodes).
>>
>> Resource agents and fence agents are simple programs (usually shell
>> scripts) that take commands from pacemaker via the command line and return status via the exit code.
>>
>> So whatever your user application is, it would have a resource agent.
>> The agent accepts commands such as start and stop, and returns particular exit codes for success, failure, etc.
>>
>> The fence agent is, as a last resort, to kill a CPM-9 if it is no longer responding and the cluster can't guarantee otherwise that it's not accessing shared resources. The fence agent would accept certain commands from pacemaker, and then talk to the ATCA controller to power down or otherwise reset the targeted CPM-9.
>>
>> Maintaining identical user data between the active and standby nodes is straightforward and a common cluster design. A common setup is using DRBD to mirror the data (and a clustered file system on top of that if both nodes need to access the data at the same time). "Clusters From Scratch" has a good example of this. Other setups are possible, e.g.
>> using network storage such as a SAN. All of this is done via pacemaker configuration.
>>
>> Your downtime requirements will be much trickier. Pacemaker detects outages by periodically polling the resource agent for status. So you have the polling interval (at least 1 second), then a timeout for receiving the status (dependent on your application and resource agent), then the startup time for the application on the other node (dependent on your application).
>>
>> Depending on your application, you may be able to run it as a "clone", so that it is always running on both nodes, and then use a floating IP address to talk to it. Moving an IP address is quick, so that would eliminate the need to wait for application startup. However you still have the polling interval and status timeout.
>>
>> -- Ken Gaillot <kgaillot at redhat.com>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sample
Type: application/octet-stream
Size: 15442 bytes
Desc: sample
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150323/bf03d8d6/attachment-0003.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sample.c
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150323/bf03d8d6/attachment-0003.c>