[Pacemaker] questions about the booth

Mon May 27 03:26:29 EDT 2013

Hi, Jiaju

I made the daemon who supervises the resource depending on a ticket, in
order to solve this problem.

I have sent the following "pull request".
https://github.com/jjzhang/booth/pull/52

The feature is as follows.
 - The information on the ticket to supervise is acquired from the
configuration file of booth.
 - A ticket becomes "grant", and if a resource start(s), surveillance
will start.
 - booth_resource_monitord moves a ticket to other sites using booth,
when it becomes impossible for a resource to work in a site.
 - booth_resource_monitord will be installed when the configure option
was with the "--enable-resource-monitor".

How to use:
Usually, booth_resource_monitord is added to the composition which is
using booth as follows.
===================================================================
group grpBooth prmIpBooth prmApBooth prmApBooth_rsc_mond
primitive prmIpBooth ocf:heartbeat:IPaddr2 \
        params ip="***.***.***.***" nic="eth*" cidr_netmask="24" \
        op start interval="0s" timeout="60s" on-fail="restart" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0s" timeout="60s" on-fail="fence"
primitive prmApBooth ocf:pacemaker:booth-site \
        op start interval="0s" timeout="90s" on-fail="restart" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0s" timeout="100s" on-fail="fence"
primitive prmApBooth_rsc_mond ocf:heartbeat:anything \
        params binfile="booth_resource_monitord" \
        op start interval="0s" timeout="90s" on-fail="restart" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0s" timeout="100s" on-fail="fence"
--------------------------------------------------------------------

limitation:
The target resource cannot be read when "rsc_ticket" is described by
"resource_set".

I want me to merge this function into the sauce tree of booth by all means.

Best Regards,
Yusuke

(2012/03/08 11:37), Yuusuke Iida wrote:
> Hi, Jiaju
> 
> Thank you for reply.
> 
> (2012/03/05 14:00), Jiaju Zhang wrote:
>> Hi Yuusuke,
>>
>> On Mon, 2012-03-05 at 11:49 +0900, Yuusuke Iida wrote:
>>> Hi, Jiaju
>>>
>>> I thought about a plan to deal when a resource did not change in sites.
>>> I think that I make daemon working outside booth.
>>>
>>> This daemon watches it whether a resource can work in sites.
>>> And it executes revoke command for booth when the state that a resource
>>> cannot manage was confirmed.
>>> booth catches revoke and thinks that I move a ticket to another site.
>>
>> If I understand it correctly, the daemon you mentioned automated some of
>> the admin's behaviors, if the resources cannot be managed by one site,
>> revoke the ticket and move the ticket to another site. I have no
>> objection if the admin has this requirement;)
> Thank you for agreeing.
> The summary of the processing is just what you think.
> admin may not necessarily need this function.
> However, I think that admin which wants to automate processing as much
> as possible exists.
> 
>> The only thing I'm not sure is if the admin really want to do this? My
>> assumption is if the local site is alive the admin will be inclined to
>> keep the ticket stay in this site, if the site is totally down, we have
>> no choice, the ticket has to move to another site to keep the service
>> available.
>> However, that is just one using scenario in my mind, booth should
>> support the using scenario that you mentioned;)
>>
>>>
>>> I think that the continuity of the resource is kept in this movement.
>>>
>>> I analyze CIB and intend to perform the state confirmation of the
>>> resource using score.
>>
>> I'm not quite understand here, do you mean that if the resource usually
>> being un-managed by this site, we'd better move it to another site, so
>> your daemon will depends on this value to decide whether it would move
>> the ticket another site, right?
> When a resource failed, I think that the score of the resource becomes
> less than 0.
> When the resource was not able to start in all nodes in the site, I
> think that score becomes less than 0 in all nodes.
> I want to judge the state that a resource was not able to operate from
> this score.
> 
> When a ticket does not become grant, the score of the resource becomes
> less than 0.
> Therefore, I want to monitor the resource while a ticket becomes grant.
> 
>>
>> Well, I think you raised another using scenario which I has not thought
>> of before;) And I agree with you to setup such a daemon to do this work
>> if the admin need.
> I want you to confirm it again when you were completed.
> 
> Thanks,
> Yuusuke
>>
>> Thanks,
>> Jiaju
>>
>>
> 

-- 
----------------------------------------
METRO SYSTEMS CO., LTD

Yuusuke Iida
Mail: iidayuus at intellilink.co.jp
----------------------------------------