[Pacemaker] questions about the booth

Wed May 29 02:38:41 EDT 2013

Hi, Jiaju

Thank you for merging it!

Thanks,
Yusuke

(2013/05/29 14:52), Jiaju Zhang wrote:
> Hi Yuusuke,
>
> Merged, thanks!
>
> Regards,
> Jiaju
>
> On Mon, 2013-05-27 at 16:26 +0900, Yuusuke Iida wrote:
>> Hi, Jiaju
>>
>> I made the daemon who supervises the resource depending on a ticket, in
>> order to solve this problem.
>>
>> I have sent the following "pull request".
>> https://github.com/jjzhang/booth/pull/52
>>
>> The feature is as follows.
>>   - The information on the ticket to supervise is acquired from the
>> configuration file of booth.
>>   - A ticket becomes "grant", and if a resource start(s), surveillance
>> will start.
>>   - booth_resource_monitord moves a ticket to other sites using booth,
>> when it becomes impossible for a resource to work in a site.
>>   - booth_resource_monitord will be installed when the configure option
>> was with the "--enable-resource-monitor".
>>
>> How to use:
>> Usually, booth_resource_monitord is added to the composition which is
>> using booth as follows.
>> ===================================================================
>> group grpBooth prmIpBooth prmApBooth prmApBooth_rsc_mond
>> primitive prmIpBooth ocf:heartbeat:IPaddr2 \
>>          params ip="***.***.***.***" nic="eth*" cidr_netmask="24" \
>>          op start interval="0s" timeout="60s" on-fail="restart" \
>>          op monitor interval="10s" timeout="60s" on-fail="restart" \
>>          op stop interval="0s" timeout="60s" on-fail="fence"
>> primitive prmApBooth ocf:pacemaker:booth-site \
>>          op start interval="0s" timeout="90s" on-fail="restart" \
>>          op monitor interval="10s" timeout="60s" on-fail="restart" \
>>          op stop interval="0s" timeout="100s" on-fail="fence"
>> primitive prmApBooth_rsc_mond ocf:heartbeat:anything \
>>          params binfile="booth_resource_monitord" \
>>          op start interval="0s" timeout="90s" on-fail="restart" \
>>          op monitor interval="10s" timeout="60s" on-fail="restart" \
>>          op stop interval="0s" timeout="100s" on-fail="fence"
>> --------------------------------------------------------------------
>>
>> limitation:
>> The target resource cannot be read when "rsc_ticket" is described by
>> "resource_set".
>>
>> I want me to merge this function into the sauce tree of booth by all means.
>>
>> Best Regards,
>> Yusuke
>>
>>
>> (2012/03/08 11:37), Yuusuke Iida wrote:
>>> Hi, Jiaju
>>>
>>> Thank you for reply.
>>>
>>> (2012/03/05 14:00), Jiaju Zhang wrote:
>>>> Hi Yuusuke,
>>>>
>>>> On Mon, 2012-03-05 at 11:49 +0900, Yuusuke Iida wrote:
>>>>> Hi, Jiaju
>>>>>
>>>>> I thought about a plan to deal when a resource did not change in sites.
>>>>> I think that I make daemon working outside booth.
>>>>>
>>>>> This daemon watches it whether a resource can work in sites.
>>>>> And it executes revoke command for booth when the state that a resource
>>>>> cannot manage was confirmed.
>>>>> booth catches revoke and thinks that I move a ticket to another site.
>>>>
>>>> If I understand it correctly, the daemon you mentioned automated some of
>>>> the admin's behaviors, if the resources cannot be managed by one site,
>>>> revoke the ticket and move the ticket to another site. I have no
>>>> objection if the admin has this requirement;)
>>> Thank you for agreeing.
>>> The summary of the processing is just what you think.
>>> admin may not necessarily need this function.
>>> However, I think that admin which wants to automate processing as much
>>> as possible exists.
>>>
>>>> The only thing I'm not sure is if the admin really want to do this? My
>>>> assumption is if the local site is alive the admin will be inclined to
>>>> keep the ticket stay in this site, if the site is totally down, we have
>>>> no choice, the ticket has to move to another site to keep the service
>>>> available.
>>>> However, that is just one using scenario in my mind, booth should
>>>> support the using scenario that you mentioned;)
>>>>
>>>>>
>>>>> I think that the continuity of the resource is kept in this movement.
>>>>>
>>>>> I analyze CIB and intend to perform the state confirmation of the
>>>>> resource using score.
>>>>
>>>> I'm not quite understand here, do you mean that if the resource usually
>>>> being un-managed by this site, we'd better move it to another site, so
>>>> your daemon will depends on this value to decide whether it would move
>>>> the ticket another site, right?
>>> When a resource failed, I think that the score of the resource becomes
>>> less than 0.
>>> When the resource was not able to start in all nodes in the site, I
>>> think that score becomes less than 0 in all nodes.
>>> I want to judge the state that a resource was not able to operate from
>>> this score.
>>>
>>> When a ticket does not become grant, the score of the resource becomes
>>> less than 0.
>>> Therefore, I want to monitor the resource while a ticket becomes grant.
>>>
>>>>
>>>> Well, I think you raised another using scenario which I has not thought
>>>> of before;) And I agree with you to setup such a daemon to do this work
>>>> if the admin need.
>>> I want you to confirm it again when you were completed.
>>>
>>> Thanks,
>>> Yuusuke
>>>>
>>>> Thanks,
>>>> Jiaju
>>>>
>>>>
>>>
>>
>
>
>

-- 
----------------------------------------
METRO SYSTEMS CO., LTD

Yuusuke Iida
Mail: iidayuus at intellilink.co.jp
----------------------------------------