[ClusterLabs] [Question] About a change of crm_failcount.

Fri Feb 3 23:35:41 EST 2017

Hi Ken,
Hi Jehan,

>> 1. Return a "hard" error such as OCF_ERR_ARGS or OCF_ERR_PERM. 
> When
>> Pacemaker gets one of these errors from an agent, it will ban the
>> resource from that node (until the failure is cleared).

Okay!

I will test it about this correction next week.

Best Regards,
Hideo Yamauchi.

----- Original Message -----
> From: Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
> To: Ken Gaillot <kgaillot at redhat.com>
> Cc: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> Date: 2017/2/4, Sat 01:02
> Subject: Re: [ClusterLabs] [Question] About a change of crm_failcount.
> 
> On Fri, 3 Feb 2017 09:45:18 -0600
> Ken Gaillot <kgaillot at redhat.com> wrote:
> 
>>  On 02/02/2017 12:33 PM, Ken Gaillot wrote:
>>  > On 02/02/2017 12:23 PM, renayama19661014 at ybb.ne.jp wrote:  
>>  >> Hi All,
>>  >>
>>  >> By the next correction, the user was not able to set a value 
> except zero
>>  >> in crm_failcount.
>>  >>
>>  >>  - [Fix: tools: implement crm_failcount command-line options 
> correctly]
>>  >>    -
>>  >> 
> https://github.com/ClusterLabs/pacemaker/commit/95db10602e8f646eefed335414e40a994498cafd#diff-6e58482648938fd488a920b9902daac4
>>  >>
>>  >> However, pgsql RA sets INFINITY in a script.
>>  >>
>>  >> ```
>>  >> (snip)
>>  >>     CRM_FAILCOUNT="${HA_SBIN_DIR}/crm_failcount"
>>  >> (snip)
>>  >>     ocf_exit_reason "My data is newer than new master's 
> one. New
>>  >> master's location : $master_baseline" exec_with_retry 0 
> $CRM_FAILCOUNT -r
>>  >> $OCF_RESOURCE_INSTANCE -U $NODENAME -v INFINITY return 
> $OCF_ERR_GENERIC
>>  >> (snip)
>>  >> ```
>>  >>
>>  >> There seems to be the influence only in pgsql somehow or other.
>>  >>
>>  >> Can you revise it to set a value except zero in crm_failcount?
>>  >> We make modifications to use crm_attribute in pgsql RA if we 
> cannot revise
>>  >> it.
>>  >>
>>  >> Best Regards,
>>  >> Hideo Yamauchi.  
>>  > 
>>  > Hmm, I didn't realize that was used. I changed it because it's 
> not a
>>  > good idea to set fail-count without also changing last-failure and
>>  > having a failed op in the LRM history. I'll have to think about 
> what the
>>  > best alternative is.  
>> 
>>  Having a resource agent modify its own fail count is not a good idea,
>>  and could lead to unpredictable behavior. I didn't realize the pgsql
>>  agent did that.
>> 
>>  I don't want to re-enable the functionality, because I don't want 
> to
>>  encourage more agents doing this.
>> 
>>  There are two alternatives the pgsql agent can choose from:
>> 
>>  1. Return a "hard" error such as OCF_ERR_ARGS or OCF_ERR_PERM. 
> When
>>  Pacemaker gets one of these errors from an agent, it will ban the
>>  resource from that node (until the failure is cleared).
>> 
>>  2. Use crm_resource --ban instead. This would ban the resource from that
>>  node until the user removes the ban with crm_resource --clear (or by
>>  deleting the ban consraint from the configuration).
>> 
>>  I'd recommend #1 since it does not require any pacemaker-specific 
> tools.
>> 
>>  We can make sure resource-agents has a fix for this before we release a
>>  new version of Pacemaker. We'll have to publicize as much as possible 
> to
>>  pgsql users that they should upgrade resource-agents before or at the
>>  same time as pacemaker. I see the alternative PAF agent has the same
>>  usage, so it will need to be updated, too.
> 
> Yes, I was following this conversation.
> 
> I'll do the fix on our side.
> 
> Thank you!
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>