<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><br><pre>>Also, I forgot about the undocumented/unsupported start-delay operation

>attribute, that you can put on the status operation to delay the first

>monitor. That may give you the behavior you want.</pre><div>I have try to add "start-delay=60s" to monitor operation. The first monitor was really delayed as 60s. But in the 60s, it will block other resources too! The result is the same to sleeping in monitor.<br>So, I think the best method for me,  is to judge whether need to return success in monitor function by timestamp.<br>Thank you very much!<br></div><br><br><br><div style="position:relative;zoom:1"></div><div id="divNeteaseMailCard"></div><br><pre><br>At 2017-11-06 21:53:53, "Ken Gaillot" <kgaillot@redhat.com> wrote:

>On Sat, 2017-11-04 at 22:46 +0800, lkxjtu wrote:

>> 

>> 

>> >Another possibility would be to have the start return immediately,

>> and

>> >make the monitor artificially return success for the first 10

>> minutes

>> >after starting. It's hacky, and it depends on your situation whether

>> >the behavior is acceptable.

>> I tried to put the sleep into the monitor function,( I add a “sleep

>> 60” at the monitor entry for debug),  the start function returns

>> immediately. I found an interesting thing that is, at the first time

>> of monitor after start, it will block other resource too, but from

>> the second time, it won't block other resources! Is this normal?

>

>Yes, the first result is for an unknown status, but after that, the

>cluster assumes the resource is OK unless/until the monitor says

>otherwise.

>

>However, I wasn't suggesting putting a sleep inside the monitor -- I

>was just thinking of having the monitor check the time, and if it's

>within 10 minutes of start, return success.

>

>> >My first thought on how to implement this

>> >would be to have the start action set a private node attribute

>> >(attrd_updater -p) with a timestamp. When the monitor runs, it could

>> do

>> >its usual check, and if it succeeds, remove that node attribute, but

>> if

>> >it fails, check the node attribute to see whether it's within the

>> >desired delay.

>> This means that if it is in the desired delay， monitor should return

>> success even if healthcheck failed？

>> I think this can solve my problem except "crm status" show

>

>Yes, that's what I had in mind. The status would show "running", which

>may or may not be what you want in this case.

>

>Also, I forgot about the undocumented/unsupported start-delay operation

>attribute, that you can put on the status operation to delay the first

>monitor. That may give you the behavior you want.

>

>> At 2017-11-01 21:20:50, "Ken Gaillot" <kgaillot@redhat.com> wrote:

>> >On Sat, 2017-10-28 at 01:11 +0800, lkxjtu wrote:

>> >> 

>> >> Thank you for your response! This means that there shoudn't be

>> long

>> >> "sleep" in ocf script.

>> >> If my service takes 10 minite from service starting to healthcheck

>> >> normally, then what shoud I do?

>> >

>> >That is a tough situation with no great answer.

>> >

>> >You can leave it as it is, and live with the delay. Note that it

>> only

>> >happens if a resource fails after the slow resource has already

>> begun

>> >starting ... if they fail at the same time (as with a node failure),

>> >the cluster will schedule recovery for both at the same time.

>> >

>> >Another possibility would be to have the start return immediately,

>> and

>> >make the monitor artificially return success for the first 10

>> minutes

>> >after starting. It's hacky, and it depends on your situation whether

>> >the behavior is acceptable. My first thought on how to implement

>> this

>> >would be to have the start action set a private node attribute

>> >(attrd_updater -p) with a timestamp. When the monitor runs, it could

>> do

>> >its usual check, and if it succeeds, remove that node attribute, but

>> if

>> >it fails, check the node attribute to see whether it's within the

>> >desired delay.

>> >

>> >> Thank you very much!

>> >>  

>> >> > Hi,

>> >> > If I remember correctly, any pending actions from a previous

>> >> transition

>> >> > must be completed before a new transition can be calculated.

>> >> Otherwise,

>> >> > there's the possibility that the pending action could change the

>> >> state

>> >> > in a way that makes the second transition's decisions harmful.

>> >> > Theoretically (and ideally), pacemaker could figure out whether

>> >> some of

>> >> > the actions in the second transition would be needed regardless

>> of

>> >> > whether the pending actions succeeded or failed, but in

>> practice,

>> >> that

>> >> > would be difficult to implement (and possibly take more time to

>> >> > calculate than is desirable in a recovery situation).

>> >>  

>> >> > On Fri, 2017-10-27 at 23:54 +0800, lkxjtu wrote:

>> >> 

>> >> > I have two clone resources in my corosync/pacemaker cluster.

>> They

>> >> are

>> >> > fm_mgt and logserver. Both of their RA is ocf. fm_mgt takes 1

>> >> minute

>> >> > to start the

>> >> > service(calling ocf start function for 1 minite). Configured as

>> >> > below：

>> >> > # crm configure show

>> >> > node 168002177: 192.168.2.177

>> >> > node 168002178: 192.168.2.178

>> >> > node 168002179: 192.168.2.179

>> >> > primitive fm_mgt fm_mgt \

>> >> >         op monitor interval=20s timeout=120s \

>> >> >         op stop interval=0 timeout=120s on-fail=restart \

>> >> >         op start interval=0 timeout=120s on-fail=restart \

>> >> >         meta target-role=Started

>> >> > primitive logserver logserver \

>> >> >         op monitor interval=20s timeout=120s \

>> >> >         op stop interval=0 timeout=120s on-fail=restart \

>> >> >         op start interval=0 timeout=120s on-fail=restart \

>> >> >         meta target-role=Started

>> >> > clone fm_mgt_replica fm_mgt

>> >> > clone logserver_replica logserver

>> >> > property cib-bootstrap-options: \

>> >> >         have-watchdog=false \

>> >> >         dc-version=1.1.13-10.el7-44eb2dd \

>> >> >         cluster-infrastructure=corosync \

>> >> >         stonith-enabled=false \

>> >> >         start-failure-is-fatal=false

>> >> > When I kill fm_mgt service on one node，pacemaker will

>> immediately

>> >> > recover it after monitor failed. This looks perfectly normal.

>> But

>> >> in

>> >> > this 1 minite

>> >> > of fm_mgt starting, if I kill logserver service on any node, the

>> >> > monitor will catch the fail normally too，but pacemaker will not

>> >> > restart it

>> >> > immediately but waiting for fm_mgt starting finished. After

>> fm_mgt

>> >> > starting finished, pacemaker begin restarting logserver. It

>> seems

>> >> > that there are

>> >> > some dependency between pacemaker resource.

>> >> > # crm status

>> >> > Last updated: Thu Oct 26 06:40:24 2017          Last change: Thu

>> >> Oct

>> >> > 26     06:36:33 2017 by root via crm_resource on 192.168.2.177

>> >> > Stack: corosync

>> >> > Current DC: 192.168.2.179 (version 1.1.13-10.el7-44eb2dd) -

>> >> partition

>> >> > with quorum

>> >> > 3 nodes and 6 resources configured

>> >> > Online: [ 192.168.2.177 192.168.2.178 192.168.2.179 ]

>> >> > Full list of resources:

>> >> >  Clone Set: logserver_replica [logserver]

>> >> >      logserver  (ocf::heartbeat:logserver):     FAILED

>> >> 192.168.2.177

>> >> >      Started: [ 192.168.2.178 192.168.2.179 ]

>> >> >  Clone Set: fm_mgt_replica [fm_mgt]

>> >> >      Started: [ 192.168.2.178 192.168.2.179 ]

>> >> >      Stopped: [ 192.168.2.177 ]

>> >> > I am confusing very much. Is there something wrong

>> configure?Thank

>> >> > you very much!

>> >> > James

>> >> > best regards

>> >>  

>> >> 

>> >> 

>> >> 【网易自营】好吃到爆！鲜香弹滑加热即食，经典13香/麻辣小龙虾仅75元3斤>>      

>> >> 

>> >> 

>> >> 【网易自营】好吃到爆！鲜香弹滑加热即食，经典13香/麻辣小龙虾仅75元3斤>>      

>> >> 

>> >> 

>> >> 【网易自营|30天无忧退货】仅售同款价1/4！MUJI制造商“2017秋冬舒适家居拖鞋系列”限时仅34.9元>>      

>> >-- 

>> >Ken Gaillot <kgaillot@redhat.com>

>> 

>> 

>>  

>-- 

>Ken Gaillot <kgaillot@redhat.com>

</pre></div><br><br><span title="neteasefooter"><p> </p></span>