<table cellspacing="0" cellpadding="0" border="0" ><tr><td valign="top" style="font: inherit;"><BR>

<BLOCKQUOTE style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: rgb(16,16,255) 2px solid">

<DIV class=plainMail><BR><BR>> Hi<BR>> I reboot my node, and it appears<BR>> node2 pingd: [3932]: info: stand_alone_ping: Node 192.168.10.100 is<BR>> unreachable (read)<BR>> and the node could not start<BR>><BR>>  192.168.10.100  is ib network I will start ib after the node start, so do<BR>> you have any idea let the node start first?Thanks very much.:-)<BR>><BR>><BR><BR>Don't use IP resources as ping nodes.<BR>You should use the IP of something outside of your cluster, like an external<BR>router<BR><BR>stand_alone_ping is start automatically, I have never start it by hand, so how to set it ping external router.</DIV>

<DIV class=plainMail>Thanks <BR>><BR>><BR>> _______________________________________________<BR>> Pacemaker mailing list: <A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org" ymailto="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</A><BR>> <A href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target=_blank>http://oss.clusterlabs.org/mailman/listinfo/pacemaker</A><BR>><BR>> Project Home: <A href="http://www.clusterlabs.org/" target=_blank>http://www.clusterlabs.org</A><BR>> Getting started: <A href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target=_blank>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</A><BR>> Bugs:<BR>> <A href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target=_blank>http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</A><BR>><BR>><BR>-------------- next part

 --------------<BR>An HTML attachment was scrubbed...<BR>URL: <<A href="http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20101111/4d7f3ea1/attachment-0001.htm" target=_blank>http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20101111/4d7f3ea1/attachment-0001.htm</A>><BR><BR>------------------------------<BR><BR>Message: 3<BR>Date: Thu, 11 Nov 2010 11:38:24 +0100<BR>From: Simon Jansen <<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=simon.jansen1@googlemail.com" ymailto="mailto:simon.jansen1@googlemail.com">simon.jansen1@googlemail.com</A>><BR>To: <A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=pacemaker@oss.clusterlabs.org" ymailto="mailto:pacemaker@oss.clusterlabs.org">pacemaker@oss.clusterlabs.org</A><BR>Subject: Re: [Pacemaker] Multistate Resources is not promoted<BR>    automatically<BR>Message-ID:<BR>    <AANLkTikwgMy4nutZ4807vv2x=nN_sMj+<A

 href="http://cn.mc157.mail.yahoo.com/mc/compose?to=E8Y1PRu6X1eT@mail.gmail.com" ymailto="mailto:E8Y1PRu6X1eT@mail.gmail.com">E8Y1PRu6X1eT@mail.gmail.com</A>><BR>Content-Type: text/plain; charset="iso-8859-1"<BR><BR>Hi Andrew,<BR><BR>thank you for your answer.<BR><BR>Does the ocf:heartbeat:Rsyslog script call crm_master?<BR>> It needs to to tell pacemaker which instance to promote.<BR>><BR>Yes it does. But I forgot to call crm_master with the option -D in the stop<BR>action. I think that this was the error. After correcting this issue the ra<BR>starts as expected.<BR><BR>Two questions though...<BR>> 1) Why use master/slave for rsyslog?<BR>><BR>In the master role the rsyslog daemon should function as central log server<BR>and write the entries received on UDP port 514 into a MySQL database.<BR>On the passive node the rsyslog service should be started with the standard<BR>config.<BR>Do you think there is a better solution to solve this

 requirement?<BR><BR><BR>> 2) Is this an upstream RA? If not, you shouldn't be using the<BR>> ocf:heartbeat namespace.<BR>><BR>Ok thank you for the advice. Should I use the pacemaker class instead or<BR>should I define a custom namespace?<BR><BR>--<BR><BR>Regards,<BR><BR>Simon Jansen<BR><BR><BR>---------------------------<BR>Simon Jansen<BR>64291 Darmstadt<BR>-------------- next part --------------<BR>An HTML attachment was scrubbed...<BR>URL: <<A href="http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20101111/9e6d50bf/attachment-0001.htm" target=_blank>http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20101111/9e6d50bf/attachment-0001.htm</A>><BR><BR>------------------------------<BR><BR>Message: 4<BR>Date: Thu, 11 Nov 2010 11:44:47 +0100<BR>From: Andrew Beekhof <<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=andrew@beekhof.net" ymailto="mailto:andrew@beekhof.net">andrew@beekhof.net</A>><BR>To: The

 Pacemaker cluster resource manager<BR>    <<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=pacemaker@oss.clusterlabs.org" ymailto="mailto:pacemaker@oss.clusterlabs.org">pacemaker@oss.clusterlabs.org</A>><BR>Subject: Re: [Pacemaker] Infinite fail-count and migration-threshold<BR>    after node fail-back<BR>Message-ID:<BR>    <<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=AANLkTimmLWZMhKxSZCHu95x0d2WGnJcujN-B7eowbFXE@mail.gmail.com" ymailto="mailto:AANLkTimmLWZMhKxSZCHu95x0d2WGnJcujN-B7eowbFXE@mail.gmail.com">AANLkTimmLWZMhKxSZCHu95x0d2WGnJcujN-B7eowbFXE@mail.gmail.com</A>><BR>Content-Type: text/plain; charset=ISO-8859-1<BR><BR>On Mon, Oct 11, 2010 at 9:40 AM, Dan Frincu <<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=dfrincu@streamwide.ro" ymailto="mailto:dfrincu@streamwide.ro">dfrincu@streamwide.ro</A>> wrote:<BR>> Hi all,<BR>><BR>> I've managed to make this

 setup work, basically the issue with a<BR>> symmetric-cluster="false" and specifying the resources' location manually<BR>> means that the resources will always obey the location constraint, and (as<BR>> far as I could see) disregard the rsc_defaults resource-stickiness values.<BR><BR>This definitely should not be the case.<BR>Possibly your stickiness setting is being eclipsed by the combination<BR>of the location constraint scores.<BR>Try INFINITY instead.<BR><BR>> This behavior is not the expected one, in theory, setting<BR>> symmetric-cluster="false" should affect whether resources are allowed to run<BR>> anywhere by default and the resource-stickiness should lock in place the<BR>> resources so they don't bounce from node to node. Again, this didn't happen,<BR>> but by setting symmetric-cluster="true", using the same ordering and<BR>> collocation constraints and the resource-stickiness, the behavior is the<BR>> expected

 one.<BR>><BR>> I don't remember seeing anywhere in the docs from clusterlabs.org being<BR>> mentioned that the resource-stickiness only works on<BR>> symmetric-cluster="true", so for anyone that also stumbles upon this issue,<BR>> I hope this helps.<BR>><BR>> Regards,<BR>><BR>> Dan<BR>><BR>> Dan Frincu wrote:<BR>>><BR>>> Hi,<BR>>><BR>>> Since it was brought to my attention that I should upgrade from<BR>>> openais-0.80 to a more recent version of corosync, I've done just that,<BR>>> however I'm experiencing a strange behavior on the cluster.<BR>>><BR>>> The same setup was used with the below packages:<BR>>><BR>>> # rpm -qa | grep -i "(openais|cluster|heartbeat|pacemaker|resource)"<BR>>> openais-0.80.5-15.2<BR>>> cluster-glue-1.0-12.2<BR>>> pacemaker-1.0.5-4.2<BR>>> cluster-glue-libs-1.0-12.2<BR>>> resource-agents-1.0-31.5<BR>>>

 pacemaker-libs-1.0.5-4.2<BR>>> pacemaker-mgmt-1.99.2-7.2<BR>>> libopenais2-0.80.5-15.2<BR>>> heartbeat-3.0.0-33.3<BR>>> pacemaker-mgmt-client-1.99.2-7.2<BR>>><BR>>> Now I've migrated to the most recent stable packages I could find (on the<BR>>> clusterlabs.org website) for RHEL5:<BR>>><BR>>> # rpm -qa | grep -i "(openais|cluster|heartbeat|pacemaker|resource)"<BR>>> cluster-glue-1.0.6-1.6.el5<BR>>> pacemaker-libs-1.0.9.1-1.el5<BR>>> pacemaker-1.0.9.1-1.el5<BR>>> heartbeat-libs-3.0.3-2.el5<BR>>> heartbeat-3.0.3-2.el5<BR>>> openaislib-1.1.3-1.6.el5<BR>>> resource-agents-1.0.3-2.el5<BR>>> cluster-glue-libs-1.0.6-1.6.el5<BR>>> openais-1.1.3-1.6.el5<BR>>><BR>>> Expected behavior:<BR>>> - all the resources the in group should go (based on location preference)<BR>>> to bench1<BR>>> - if bench1 goes down, resources migrate to

 bench2<BR>>> - if bench1 comes back up, resources stay on bench2, unless manually told<BR>>> otherwise.<BR>>><BR>>> On the previous incantation, this worked, by using the new packages, not<BR>>> so much. Now if bench1 goes down (crm node standby `uname -n`), failover<BR>>> occurs, but when bench1 comes backup up, resources migrate back, even if<BR>>> default-resource-stickiness is set, and more than that, 2 drbd block devices<BR>>> reach infinite metrics, most notably because they try to promote the<BR>>> resources to a Master state on bench1, but fail to do so due to the resource<BR>>> being held open (by some process, I could not identify it).<BR>>><BR>>> Strangely enough, the resources (drbd) fail to be promoted to a Master<BR>>> status on bench1, so they fail back to bench2, where they are mounted<BR>>> (functional), but crm_mon shows:<BR>>><BR>>>

 Migration summary:<BR>>> * Node bench2.streamwide.ro:<BR>>> ?drbd_mysql:1: migration-threshold=1000000 fail-count=1000000<BR>>> ?drbd_home:1: migration-threshold=1000000 fail-count=1000000<BR>>> * Node bench1.streamwide.ro:<BR>>><BR>>> .... infinite metrics on bench2, while the drbd resources are available<BR>>><BR>>> version: 8.3.2 (api:88/proto:86-90)<BR>>> GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by<BR>>> <A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=mockbuild@v20z-x86-64.home.local" ymailto="mailto:mockbuild@v20z-x86-64.home.local">mockbuild@v20z-x86-64.home.local</A>, 2009-08-29 14:07:55<BR>>> 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----<BR>>> ? ns:1632 nr:1864 dw:3512 dr:3933 al:11 bm:19 lo:0 pe:0 ua:0 ap:0 ep:1<BR>>> wo:b oos:0<BR>>> 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----<BR>>> ? ns:4

 nr:24 dw:28 dr:25 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0<BR>>> 2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----<BR>>> ? ns:4 nr:24 dw:28 dr:85 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0<BR>>><BR>>> and mounted<BR>>><BR>>> /dev/drbd1 on /home type ext3 (rw,noatime,nodiratime)<BR>>> /dev/drbd0 on /mysql type ext3 (rw,noatime,nodiratime)<BR>>> /dev/drbd2 on /storage type ext3 (rw,noatime,nodiratime)<BR>>><BR>>> Attached is the hb_report.<BR>>><BR>>> Thank you in advance.<BR>>><BR>>> Best regards<BR>>><BR>><BR>> --<BR>> Dan FRINCU<BR>> Systems Engineer<BR>> CCNA, RHCE<BR>> Streamwide Romania<BR>><BR>><BR>> _______________________________________________<BR>> Pacemaker mailing list: <A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org"

 ymailto="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</A><BR>> <A href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target=_blank>http://oss.clusterlabs.org/mailman/listinfo/pacemaker</A><BR>><BR>> Project Home: <A href="http://www.clusterlabs.org/" target=_blank>http://www.clusterlabs.org</A><BR>> Getting started: <A href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target=_blank>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</A><BR>> Bugs:<BR>> <A href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target=_blank>http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</A><BR>><BR><BR><BR><BR>------------------------------<BR><BR>Message: 5<BR>Date: Thu, 11 Nov 2010 11:46:42 +0100<BR>From: Andrew Beekhof <<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=andrew@beekhof.net"

 ymailto="mailto:andrew@beekhof.net">andrew@beekhof.net</A>><BR>To: The Pacemaker cluster resource manager<BR>    <<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=pacemaker@oss.clusterlabs.org" ymailto="mailto:pacemaker@oss.clusterlabs.org">pacemaker@oss.clusterlabs.org</A>><BR>Subject: Re: [Pacemaker] Multistate Resources is not promoted<BR>    automatically<BR>Message-ID:<BR>    <<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=AANLkTinAqC-vNWHYCDRrRhgyh5JUtcJux5E9YBvrXZc6@mail.gmail.com" ymailto="mailto:AANLkTinAqC-vNWHYCDRrRhgyh5JUtcJux5E9YBvrXZc6@mail.gmail.com">AANLkTinAqC-vNWHYCDRrRhgyh5JUtcJux5E9YBvrXZc6@mail.gmail.com</A>><BR>Content-Type: text/plain; charset=ISO-8859-1<BR><BR>On Thu, Nov 11, 2010 at 11:38 AM, Simon Jansen<BR><<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=simon.jansen1@googlemail.com"

 ymailto="mailto:simon.jansen1@googlemail.com">simon.jansen1@googlemail.com</A>> wrote:<BR>> Hi Andrew,<BR>><BR>> thank you for your answer.<BR>><BR>>> Does the ocf:heartbeat:Rsyslog script call crm_master?<BR>>> It needs to to tell pacemaker which instance to promote.<BR>><BR>> Yes it does. But I forgot to call crm_master with the option -D in the stop<BR>> action. I think that this was the error. After correcting this issue the ra<BR>> starts as expected.<BR>><BR>>> Two questions though...<BR>>> 1) Why use master/slave for rsyslog?<BR>><BR>> In the master role the rsyslog daemon should function as central log server<BR>> and write the entries received on UDP port 514 into a MySQL database.<BR>> On the passive node the rsyslog service should be started with the standard<BR>> config.<BR><BR>Interesting<BR><BR>> Do you think there is a better solution to solve this

 requirement?<BR><BR>No, I'd just never heard rsyslog being used in this way.<BR><BR>>><BR>>> 2) Is this an upstream RA? If not, you shouldn't be using the<BR>>> ocf:heartbeat namespace.<BR>><BR>> Ok thank you for the advice. Should I use the pacemaker class instead or<BR>> should I define a custom namespace?<BR><BR>Custom.<BR><BR>><BR>> --<BR>><BR>> Regards,<BR>><BR>> Simon Jansen<BR>><BR>><BR>> ---------------------------<BR>> Simon Jansen<BR>> 64291 Darmstadt<BR>><BR>><BR>> _______________________________________________<BR>> Pacemaker mailing list: <A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org" ymailto="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</A><BR>> <A href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target=_blank>http://oss.clusterlabs.org/mailman/listinfo/pacemaker</A><BR>><BR>> Project

 Home: <A href="http://www.clusterlabs.org/" target=_blank>http://www.clusterlabs.org</A><BR>> Getting started: <A href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target=_blank>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</A><BR>> Bugs:<BR>> <A href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target=_blank>http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</A><BR>><BR>><BR><BR><BR><BR>------------------------------<BR><BR>Message: 6<BR>Date: Thu, 11 Nov 2010 11:47:35 +0100<BR>From: Andrew Beekhof <<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=andrew@beekhof.net" ymailto="mailto:andrew@beekhof.net">andrew@beekhof.net</A>><BR>To: The Pacemaker cluster resource manager<BR>    <<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=pacemaker@oss.clusterlabs.org"

 ymailto="mailto:pacemaker@oss.clusterlabs.org">pacemaker@oss.clusterlabs.org</A>><BR>Subject: Re: [Pacemaker] start error because "not installed" - stop<BR>    fails with "not installed" - stonith<BR>Message-ID:<BR>    <AANLkTikXwe6wS2F-LtLF3dvKjEt1gvPZ=<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=5BSVNj1eZ2q@mail.gmail.com" ymailto="mailto:5BSVNj1eZ2q@mail.gmail.com">5BSVNj1eZ2q@mail.gmail.com</A>><BR>Content-Type: text/plain; charset=ISO-8859-1<BR><BR>On Sat, Oct 9, 2010 at 12:36 AM, Andreas Kurz <<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=andreas.kurz@linbit.com" ymailto="mailto:andreas.kurz@linbit.com">andreas.kurz@linbit.com</A>> wrote:<BR>> Hello,<BR>><BR>> if a resource has encounters a start error with rc=5 "not installed" the<BR>> stop action is not skipped before a restart is tried.<BR><BR>I'd not expect a stop action at all.  What version?<BR><BR>><BR>>

 Typically in such a situation the stop will also fail with the same<BR>> error and the node will be fenced ?... even worse there is a good change<BR>> this happens on all remaining nodes e.g. if there is a typo in a parameter.<BR>><BR>> I would expect the cluster to skip the stop action after a "not<BR>> installed" start failure followed by a start retry on a different node.<BR>><BR>> So ... is this a feature or a bug? ;-)<BR>><BR>> Regards,<BR>> Andreas<BR>><BR>> _______________________________________________<BR>> Pacemaker mailing list: <A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org" ymailto="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</A><BR>> <A href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target=_blank>http://oss.clusterlabs.org/mailman/listinfo/pacemaker</A><BR>><BR>> Project Home: <A href="http://www.clusterlabs.org/"

 target=_blank>http://www.clusterlabs.org</A><BR>> Getting started: <A href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target=_blank>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</A><BR>> Bugs: <A href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target=_blank>http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</A><BR>><BR><BR><BR><BR>------------------------------<BR><BR>Message: 7<BR>Date: Thu, 11 Nov 2010 11:48:59 +0100<BR>From: Andrew Beekhof <<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=andrew@beekhof.net" ymailto="mailto:andrew@beekhof.net">andrew@beekhof.net</A>><BR>To: The Pacemaker cluster resource manager<BR>    <<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=pacemaker@oss.clusterlabs.org" ymailto="mailto:pacemaker@oss.clusterlabs.org">pacemaker@oss.clusterlabs.org</A>><BR>Subject: Re: [Pacemaker] [Problem]Number of

 times control of the<BR>    fail-count    is late.<BR>Message-ID:<BR>    <AANLkTinMfWBqmW_jcA8a+ic7zmfb6HMiEfBD1_SuEe=<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=G@mail.gmail.com" ymailto="mailto:G@mail.gmail.com">G@mail.gmail.com</A>><BR>Content-Type: text/plain; charset=ISO-8859-1<BR><BR>On Wed, Nov 10, 2010 at 5:20 AM,  <<A href="http://cn.mc157.mail.yahoo.com/mc/compose?to=renayama19661014@ybb.ne.jp" ymailto="mailto:renayama19661014@ybb.ne.jp">renayama19661014@ybb.ne.jp</A>> wrote:<BR>> Hi,<BR>><BR>> We constituted a cluster by two node constitution.<BR>> The migration-threshold set it to 2.<BR>><BR>> We confirmed a phenomenon in the next procedure.<BR>><BR>> Step1) Start two nodes and send config5.crm. (The clnDiskd-resources is original.)<BR>><BR>> ============<BR>> Last updated: Tue Nov ?9 21:10:49 2010<BR>> Stack: Heartbeat<BR>> Current

 DC: srv02 (8c93dc22-a27e-409b-8112-4073de622daf) - partition with quorum<BR>> Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438<BR>> 2 Nodes configured, unknown expected votes<BR>> 5 Resources configured.<BR>> ============<BR>><BR>> Online: [ srv01 srv02 ]<BR>><BR>> ?vip ? ?(ocf::heartbeat:IPaddr2): ? ? ? Started srv01<BR>> ?Clone Set: clnDiskd<BR>> ? ? Started: [ srv01 srv02 ]<BR>> ?Clone Set: clnDummy2<BR>> ? ? Started: [ srv01 srv02 ]<BR>> ?Clone Set: clnPingd1<BR>> ? ? Started: [ srv01 srv02 ]<BR>><BR>> Node Attributes:<BR>> * Node srv01:<BR>> ? ?+ default_ping_set1 ? ? ? ? ? ? ? ? : 100<BR>> ? ?+ diskcheck_status_internal ? ? ? ? : normal<BR>> * Node srv02:<BR>> ? ?+ default_ping_set1 ? ? ? ? ? ? ? ? : 100<BR>> ? ?+ diskcheck_status_internal ? ? ? ? : normal<BR>><BR>> Migration summary:<BR>> * Node srv02:<BR>> * Node srv01:<BR>><BR>><BR>> Step2) We edit a

 clnDummy2 resource to raise time-out in start. (add sleep)<BR>><BR>> ?dummy_start() {<BR>> ? ?sleep 180 ----> add sleep<BR>> ? ?dummy_monitor<BR>> ? ?if [ $? = ?$OCF_SUCCESS ]; then<BR>><BR>><BR>> Step3) It causes a monitor error in a clnDummy2 resource.<BR>><BR>> ?# rm -rf /var/run/Dummy-Dummy2.state<BR>><BR>> Step4) clnDummy2 causes time-out by restart.<BR>><BR>> But, as for clnDummy2, a lot of starts are up after time-out once when they watch log.<BR>> In fact, the reason is because pengine does not know that fail-count became INFINITY.<BR>><BR>> Because the reason is because fail-count does not yet become INFINITY in pe-input-2001.bz2.<BR>> In pe-input-2002.bz2, fail-count becomes INFINITY.<BR>><BR>> (snip)<BR>> Nov ?9 21:12:35 srv02 crmd: [5896]: WARN: status_from_rc: Action 25 (Dummy2:0_start_0) on srv01 failed<BR>> (target: 0 vs. rc: -2): Error<BR>> Nov ?9 21:12:35 srv02

 crmd: [5896]: WARN: update_failcount: Updating failcount for Dummy2:0 on srv01<BR>> after failed start: rc=-2 (update=INFINITY, time=1289304755)<BR>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: abort_transition_graph: match_graph_event:272 - Triggered<BR>> transition abort (complete=0, tag=lrm_rsc_op, id=Dummy2:0_start_0,<BR>> magic=2:-2;25:5:0:275da7f9-7f43-43a2-8308-41d0ab78346e, cib=0.9.39) : Event failed<BR>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: match_graph_event: Action Dummy2:0_start_0 (25) confirmed on<BR>> srv01 (rc=4)<BR>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action 29 fired and confirmed<BR>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: run_graph:<BR>> ====================================================<BR>> Nov ?9 21:12:35 srv02 crmd: [5896]: notice: run_graph: Transition 5 (Complete=7, Pending=0, Fired=0,<BR>> Skipped=1, Incomplete=0, Source=/var/lib/pengine/pe-input-2000.bz2):

 Stopped<BR>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: te_graph_trigger: Transition 5 is now complete<BR>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_state_transition: State transition S_TRANSITION_ENGINE -><BR>> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]<BR>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_state_transition: All 2 cluster nodes are eligible to run<BR>> resources.<BR>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_pe_invoke: Query 72: Requesting the current CIB:<BR>> S_POLICY_ENGINE<BR>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_pe_invoke_callback: Invoking the PE: query=72,<BR>> ref=pe_calc-dc-1289304755-58, seq=2, quorate=1<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: unpack_config: On loss of CCM Quorum: Ignore<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' =<BR>> 0, 'green' = 0<BR>> Nov ?9

 21:12:35 srv02 pengine: [7208]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: info: determine_online_status: Node srv02 is online<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: info: determine_online_status: Node srv01 is online<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: WARN: unpack_rsc_op: Processing failed op<BR>> Dummy2:0_monitor_15000 on srv01: not running (7)<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: WARN: unpack_rsc_op: Processing failed op Dummy2:0_start_0 on<BR>> srv01: unknown exec error (-2)<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: native_print: Dummy ? ? ?(ocf::pacemaker:Dummy): Started<BR>> srv01<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: native_print: vip ? ? ? ?(ocf::heartbeat:IPaddr2): ? ? ? Started<BR>> srv01<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: clone_print: ?Clone Set: clnDiskd<BR>> Nov ?9 21:12:35 srv02

 pengine: [7208]: notice: short_print: ? ? ?Started: [ srv01 srv02 ]<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: clone_print: ?Clone Set: clnDummy2<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: native_print: ? ? ?Dummy2:0 ? ? ?(ocf::pacemaker:Dummy2):<BR>> Started srv01 FAILED<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: short_print: ? ? ?Started: [ srv02 ]<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: clone_print: ?Clone Set: clnPingd1<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: short_print: ? ? ?Started: [ srv01 srv02 ]<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: info: get_failcount: clnDummy2 has failed 1 times on srv01<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: common_apply_stickiness: clnDummy2 can fail 1 more<BR>> times on srv01 before being forced off<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: info: get_failcount: clnDummy2 has failed 1 times on srv01<BR>> Nov ?9

 21:12:35 srv02 pengine: [7208]: notice: common_apply_stickiness: clnDummy2 can fail 1 more<BR>> times on srv01 before being forced off<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: ERROR: unpack_operation: Specifying on_fail=fence and<BR>> stonith-enabled=false makes no sense<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: RecurringOp: ?Start recurring monitor (15s) for<BR>> Dummy2:0 on srv01<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource Dummy (Started srv01)<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource vip ? (Started srv01)<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource prmDiskd:0 ? ?(Started srv01)<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource prmDiskd:1 ? ?(Started srv02)<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Recover resource Dummy2:0 ? ?(Started srv01)<BR>>

 Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource Dummy2:1 ? ? ?(Started srv02)<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource prmPingd1:0 ? (Started srv01)<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: notice: LogActions: Leave resource prmPingd1:1 ? (Started srv02)<BR>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_state_transition: State transition S_POLICY_ENGINE -><BR>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]<BR>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: unpack_graph: Unpacked transition 6: 8 actions in 8 synapses<BR>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: do_te_invoke: Processing graph 6<BR>> (ref=pe_calc-dc-1289304755-58) derived from /var/lib/pengine/pe-input-2001.bz2<BR>> Nov ?9 21:12:35 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action 30 fired and confirmed<BR>> Nov ?9 21:12:35 srv02 crmd: [5896]: info:

 te_rsc_command: Initiating action 5: stop Dummy2:0_stop_0 on<BR>> srv01<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: info: process_pe_message: Transition 6: PEngine Input stored<BR>> in: /var/lib/pengine/pe-input-2001.bz2<BR>> Nov ?9 21:12:35 srv02 pengine: [7208]: info: process_pe_message: Configuration ERRORs found during PE<BR>> processing. ?Please run "crm_verify -L" to identify issues.<BR>> Nov ?9 21:12:37 srv02 attrd: [5895]: info: attrd_ha_callback: flush message from srv01<BR>> Nov ?9 21:12:37 srv02 crmd: [5896]: info: match_graph_event: Action Dummy2:0_stop_0 (5) confirmed on<BR>> srv01 (rc=0)<BR>> Nov ?9 21:12:37 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action 31 fired and confirmed<BR>> Nov ?9 21:12:37 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action 8 fired and confirmed<BR>> Nov ?9 21:12:37 srv02 crmd: [5896]: info: te_pseudo_action: Pseudo action 28 fired and confirmed<BR>> Nov ?9

 21:12:37 srv02 crmd: [5896]: info: te_rsc_command: Initiating action 24: start Dummy2:0_start_0<BR>> on srv01<BR>><BR>> ?-----> Must not carry out this start.<BR>><BR>> Nov ?9 21:12:37 srv02 crmd: [5896]: info: abort_transition_graph: te_update_diff:146 - Triggered<BR>> transition abort (complete=0, tag=transient_attributes, id=519bb7a2-3c31-414a-b6b2-eaef36a09fb7,<BR>> magic=NA, cib=0.9.41) : Transient attribute: update<BR>> Nov ?9 21:12:37 srv02 crmd: [5896]: info: update_abort_priority: Abort priority upgraded from 0 to<BR>> 1000000<BR>> Nov ?9 21:12:37 srv02 crmd: [5896]: info: update_abort_priority: Abort action done superceeded by<BR>> restart<BR>> Nov ?9 21:12:37 srv02 crmd: [5896]: info: abort_transition_graph: te_update_diff:146 - Triggered<BR>> transition abort (complete=0, tag=transient_attributes, id=519bb7a2-3c31-414a-b6b2-eaef36a09fb7,<BR>> magic=NA, cib=0.9.42) : Transient attribute:

 update<BR>> (snip)<BR>><BR>> It seems to be a problem that update of fail-count was late.<BR>> But, this problem seems to occur by a timing.<BR>><BR>> It affects it in fail over time of the resource that the control number of times of fail-count is<BR>> wrong.<BR>><BR>> Is this problem already discussed?<BR><BR>Not that I know of<BR><BR>> Is not a delay of the update of fail-count which went by way of attrd a problem?<BR><BR>Indeed.<BR><BR>><BR>> ?* I attach log and some pe-files at Bugzilla.<BR>> ?* <A href="http://developerbugs.linux-foundation.org/show_bug.cgi?id=2520" target=_blank>http://developerbugs.linux-foundation.org/show_bug.cgi?id=2520</A><BR><BR>Ok, I'll follow up there.<BR><BR>><BR>> Best Regards,<BR>> Hideo Yamauchi.<BR>><BR>><BR>><BR>> _______________________________________________<BR>> Pacemaker mailing list: <A

 href="http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org" ymailto="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</A><BR>> <A href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target=_blank>http://oss.clusterlabs.org/mailman/listinfo/pacemaker</A><BR>><BR>> Project Home: <A href="http://www.clusterlabs.org/" target=_blank>http://www.clusterlabs.org</A><BR>> Getting started: <A href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target=_blank>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</A><BR>> Bugs: <A href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target=_blank>http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</A><BR>><BR><BR><BR><BR>------------------------------<BR><BR>_______________________________________________<BR>Pacemaker mailing list<BR><A

 href="http://cn.mc157.mail.yahoo.com/mc/compose?to=Pacemaker@oss.clusterlabs.org" ymailto="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</A><BR><A href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target=_blank>http://oss.clusterlabs.org/mailman/listinfo/pacemaker</A><BR><BR><BR>End of Pacemaker Digest, Vol 36, Issue 34<BR>*****************************************<BR></DIV></BLOCKQUOTE></td></tr></table><br>